An updated version of my PyCon Proposal
1.Introduction (1 min)
2. YouTube and Google Search (3 min Timed)
c) YouTube and Google
3. PyVideo (3 min Timed)
4. PythonLinks.info (10 min )
a) Categories b) Ranking Videos c) Wilson Score d) Votes / View e) GDPR
5. Python Links Architecture (3 min)
6. Climate Change (5 min)
a) The Problem
b) YouTube Search Results
c) Suppression of Information
d) Reddit.com Suppression
7. Seeking Volunteers (1 min Timed)
8. Questions (4 min)
Slides are at PythonLinks.info/presentations/bestvideos.pdf
There is so much excellent knowledge about Python on the internet, but it is hard to find the best relevant content. It is particularly hard to find good videos. With text documents, one can scan them to determine relevance and quality. Certainly it takes a lot of time, but it is doable. With videos, you can’t scan them, you have to watch them, to see if they are good. That is just too slow. Help is needed in selecting the best videos.
2. YouTube and Google
YouTube has quite the monopoly on hosting Python videos. Many conferences, meetups and individuals publish channels or playlists. DjangoCon Europe is the only conference I know of which allows you to watch the videos elsewhere. They also host on the [Chaos Computer Club website](https://media.ccc.de/c/djangocon2018). Good for them. But other than that, all of the Python videos are on YouTube. So the YouTube search engine would seem to be the obvious starting point.
One could search for “Python” on YouTube. Hard to report on the results, because every time I do it I get different results. The first time you do it, you will certainly get something on Monty Python. Often I get ones that are 4 years old. Fine for many topics, but not for Python and certainly not for data science or machine learning. Those subjects are just evolving too fast.
Recently, I did the search again. The first two, (slide: Two Best YouTube Python Videos) at first glance were quite reasonable search results. Let us take a look at the next 4 results. The 3rd one was on Succinct (very space efficient) Data Structures. For the mass market, not a very important topic. Only 111 views. The 4th one was in Polish. I live in Poland so I can understand that. For many topics, you want to watch local videos, but for technical topics geography should not be considered by the recommendation engines. The 5th was on Tkinter, 732K views, but 4 years old. Not only has Python 3 changed since then, but now Machine Learning and Data Science are much more important than native app development. The 6th was in Hindi, with 252 K views. Okay there are a lot of developers in India, so I understand that as well. But my Hindi is limited to a few words.
There must be a better way to discover Python videos.
Instead of YouTube we could search on “Python” with Google.
For me, Google only shows 4 pages (100 results / page) of results. That is less than 400 videos. When I last tried that, 5 of the top 10 results were SEO optimized paid courses. That also makes sense. But I do not want paid courses. I want the good free stuff.
2c) YouTube and Google
The larger problem is that we do not know the algorithm that YouTube and Google use to prioritize videos. We do know that YouTube tracks up votes, down votes, view count and watch time. (Slide: “Watch Time”). Presumably they also track how likely you are to then watch another video. And we believe that they use machine learning to serve the videos that you personally will watch the longest, and thereby click on the most ads. For a profit maximizing corporation driven by ad revenue this is a plausible business model. More importantly they want you to continue watching more videos after the current one ends. Say there is a video that recommends the world’s most useful software, and all the viewers quit watching YouTube and go off and download that piece of software. Well that is not the kind of video that is best for youtube’s profits.
You could also search on [PyVideo.org](PyVideo.org). They currently index 13,245 videos. PyVideo started in 2008 when Python videos were scattered all over the web. They were a central searchable index for the small number of Python videos. The amount of videos grew rapidly until it overwhelmed the authors, and so they open sourced the site. Here are [the enents they index](https://pyvideo.org/events.html). By 2016 PyVideo was indexing 63 annual Python conferences and innumerable other meetups. But they are slowing down. In 2017 they indexed 60 annual conferences. And in 2018 just 53.
There are two parts to PyVideo. The data and the search engine. The data is licensed under creative commons, the search engine is GPL’d.
The data is most useful, PythonLinks.info frequently imports it. The great thing about the data is the rich JSON model. In particular they often link to the slides for a presentation. YouTube and conference websites usually do not have that information.
The PyVideo data is generated by volunteers. They index many conferences, but not all conferences. It all depends on whether someone volunteers or not.
The PyVideo search engine has a number of problems. They limit search results to 100 items. Some topics have more videos than that. They often return really old videos. They do not rank order the search results, so there is no way to tell which is best. They do have [tags](https://pyvideo.org/tags.html), but when last I counted there were over 2000 of them. Very hard to find the relevant tag. Also it is a folksonomy. Everyone uses a different set of tags. Not good.
The newest place to search for Python videos is [PythonLinks.info](https://PythonLinks.info/python).
PythonLinks organizes videos into a tree of categories, and ranks the videos in any branch of the taxonomy using
either the Wilson Score, the votes to views ratio or the most recent ranking.
One problem problem with Google, YouTube, Facebook and Twitter is the they return an infinite list of results. A basic principle in Human Factors is that there should not be more than about 7 items in any list.
So PythonLinks sorts YouTube videos into a tree of categories. At the top level there are categories for Machine Learning, Data Science, Parallelism, People, Skills and Python Software. Let us explore the categories.
(Here is where I demo the software. You can watch a slightly older version of the demo, soon to be updated at
If time allows I would like to talk more about some of the best videos in each category.
Detailed Discussion of Categories.
Depending on the time available I plan on including a more detailed discussion of the videos in each category.
There is an interesting section about applying Data science to improving democracy. I recommend [Small Data](https://pythonlinks.info/small-data?sortBy=bestMyScore) about how to monitor the government.
4b) Ranking Videos
So how should one rank the videos? The YouTube API provides the up votes, the down votes, and in most of the videos, the page views. The first obvious ranking solution is to rank videos based on upvotes - downvotes. Figure 3 shows a table with those numbers. But that approach has an obvious problem. The first row is a terrible video with 500 upvotes and 400 downvotes. The second row is a great video with 48 upvotes and 1 down vote. The simplistic upvotes - downvotes gets the scores all wrong.
The next more sophisticated upvotes divided by total votes is better. It sorts these two videos correctly. (Slide: Scoring by Up Votes / Total Votes) But it is too sensitive. The top videos usually get no downvotes. Their up votes to total votes ratio is a perfect 100%. A single downvote could throw them off the top 10 list. That is not right. It is a symptom of a larger statistical problem with this up votes to total votes ratio. The solution is to use the Wilson score. It is used by reddit and hacker news for ranking comments.
4c) Wilson Score
The next slide “Binomial Distribution” shows the problem.
(Before the conference I will probably do a new image using this software.
A binomial distribution is the probability distribution for the total number of heads when we do n flips of a coin with a probability of p of heads, and q of tails. It is natural to guess that the value of p is the percent of time a head is shown, but the total number of heads is actually a random variable. There are two binomial distributions. There is a good chance that the distribution with lower trials n and lower probability p, actually ends up with the larger number of heads. That is not good. Without going into the math, the Wilson score takes into account confidence interfals, so that you can be sure that the better Wilson score is indeed the better video.
The Wilson score dramatically improved results on my website, but it also has its problems. It favors main stream videos with lots of votes and thus little uncertainty. That penalizes the great talks from the smaller conferences. They may have a much higher rate of upvotes per views, but not enough upvotes to compete using the conservative Wilson score.
The Wilson score also does not take into account that people do not like to criticize. They are more likely to up vote than down vote. Without evidence, I think that down votes should be multiplied by 5 to compensate for people’s reluctance to criticize.
4d) Votes / View
So PythonLinks has started to offer a new scoring option.
PythonLinks now supports a ratio of (upVotes - 5 * downVotes)/viewCount. It recommends some very interesting videos. I particularly like
[Observe All Your Applications](https://pythonlinks.info/observe-all-your-applications?sortBy=bestMyScore). As a developer, I am so focussed on functionality, it is great to be reminded about the importance of making sure the everything is working.
PythonLinks would like to include authors names next to Python talks, but sadly that would be a GDPR violation. So there is now a link next to each talk, where you can grant permission to add your name.
5) PythonLinks Architecture
ZODB is an object-oriented database written in Python and optimized in C. The ZODB makes it really easy to build persistent applications. Just subclass off of class Persistent and your objects, graphs of objects, and applications become persistent. (Slide: How to use the ZODB)
ZODB persistent containers look like dictionaries to the developer, but are stored as BTress on the file system. So it is very easy to implement a tree of objects, a taxonomy using the ZODB. One traverses to an object, (Slide: Traversal) and then displays a view on the object.
PythonLinks also implements canonical URL’s. (Slide Canonical URL’s) From the root of the tree one can either traverse to an object, or use the canonical URL to jump directly to the URL. This way as the tree evolves, and objects are moved, their canonical URL remains the same.
PythonLinks.info is written using the Cromlech framework. Souheil Chelfou was a major Grok contributor, and spent the last 7 years cleaning up those libraries. He has done a gorgeous piece of work, which almost no one knows about.
6) Video Censorship.
“It is very hard to see what is not there.”
When I started running PythonLinks sister site: [ClimateVideos.info](ClimateVideos.info) I was quite surprised at how few votes and page views those videos get. Not on my site, on YouTube. There are six billion people affected by climate change, but at most six million Python developers. 1000 times more people than Python developers, but the Python videos get 10 times more views per month!!!! Something is wrong. I do not think that anyone has any reason to censor Python Videos. So that makes for a good baseline. What is going on here?
We all know that the great wall of China blocks information about Tiananmen Square. We know that in the US, the mainstream media, those owned by the oil lobby did not talk about climate change. There were reports of wildfires, even a show on natural catastrophes, where the words “climate change” were never mentioned. So when Meet the Press hosted Michael Bloomberg, so many were surprised. Up until then climate change information on TV had been censored. I wonder how much he paid for that slot.
6a) The Climate Change Problem
Why is this important? The huge problem with climate change is that the arctic sea ice is disappearing. (Slide: Sea Ice Volume) By 2032, maybe sooner, the September sea ice will be gone, and before then much of the frozen biomass under the arctic and in the tundra will turn to methane, 100 x more climate change gases then we currently have. (Slide Clathrates Bubbling Methane)
6b) You Tube Search Results
Search on “Climate Change” on YouTube, the top result is the recent NBC interview with Micheal Bloomberg. I tried watching it, but it turned my stomach. It sounded like a commercial for a presidential candidate from the 0.001%, There was no sign that it is a paid advertisement.
President Trump is also in the Top 10 YouTube Climate Change videos. Really? Who upvoted his videos? The 11th talk is with Putin, another climate change denier. Climate change deniers would never get added to ClimateVideos.info.
In contrast the excellent COP 24 talk by the Swedish girl Greta ThurnBerg is nowhere to be seen. Nor is the People’s talk by David Attenborough. Nor is the excellent GreenPeace video on Forest Fires. https://www.youtube.com/watch?v=SV86xky2pWs&t=411s
Unless you know to specifically look for them. All three are quite famous. Where are they?
6c) Page Views
The evidence is in the page views. (Slide: Climate Change Suppression) shows how few page views per month the best climate change videos get. And then compare that to how many page views the best Python videos get. The best Python videos get 10 x more views per month than the best climate videos. For 1/1000 as big an audience.
WTF? Why are the statistics so skewed in favor of Python? I do not know. But I think that we all need to be more aware of censorship.
6d) Reddit.com Suppression
The same thing happens on reddit.com Take a look at the upvotes on reddit.com/r/python. Today I found one posting with 2.4K and another with 1.2K. On reddit.com/r/climatechange, the best scores were 45 and 43. For reddit.com/r/climate the best scores were 257 and 214. An order of magnitude less than for
/r/python. Again 1000 times more people affected by climate change.
I could be wrong. Maybe the carbon lobby does not spend money to suppress climate change videos on the web. Maybe they spent all that money buying up TV stations, and they have no money left to spend on web censorship. I could be wrong. Still censorship is a topic which deserves more attention at a Python conference. Has anyone here written censorship software?
7. Seeking Volunteers
(Slide: Seeking Volunteers)
There is just too much great content out there, needing to be organized. Sadly I am not able to watch and review all of the videos listed on PythonLinks.info. let alone on the entire web.
Categorizing videos helps. Ranking them by scores helps. Getting access to the Watch Time would help. But the most valuable service is to have human editors, experts in their field, reviewing and organizing the videos. So I am looking for people to help me curate both PythonLinks.info and ClimateVideos.info. PythonLinks.info is a CMS, first cousin to Plone. So, I am particularly interested in experts who can manage a particular branch of the PythonLinks tree. People who know which are the best videos in their particular subject area. Who care about educating people about the wonderful Python tools and libraries that they use everyday.
I am looking for editors/curators to help with branches of the tree. Please send me an email if you are interested.