Here is the talk I submitted to PyCon USA.
This talk surveys the Python video ecosystem. If you want to watch Python videos, what is the best way to find the good ones? The talk evaluates YouTube search, Google Search, PyVideo and the much newer PythonLinks.info. PythonLinks organizes videos into a tree of categories, a Python taxonomy, and ranks them by Wilson score, votes / view or by most recent. YouTube and PythonLinks search results are compared. The server is implemented in Python using the ZODB object database and the Cromlech framework. The talk is peppered with video recommendations. At the end I speak briefly about PythonLinks younger sibling ClimateVideos.info, and internet censorship of climate change related information.
This talk is primarily targeted at people who want to watch Python videos, but are not sure how to figure out which videos to watch. No prior knowledge is expected. By the end of the talk they should have a much better understanding of the different Python Video search engines, and the strengths and weaknesses of each one.
Slides are at http://PythonLinks.info/presentations/bestvideos.pdf
1.Introduction There is so much excellent knowledge about Python on the internet, but it is hard to find the best relevant content. It is particularly hard to find good videos. With text documents, one can scan them to determine relevance and quality. Certainly it takes a lot of time, but it is doable. With videos, you can’t scan them, you have to watch them, to see if they are good. That is just too slow. Help is needed in selecting the best videos.
2. YouTube and Google
YouTube has quite the monopoly on hosting Python videos. Many conferences, meetups and individuals publish channels or playlists. DjangoCon Europe is the only conference I know of which allows you to watch the videos elsewhere. They also host on the [Chaos Computer Club website](https://media.ccc.de/c/djangocon2018). Good for them. But other than that, all of the Python videos are on YouTube. So the YouTube search engine would seem to be the obvious starting point. One could search for “Python” on YouTube. Hard to report on the results, because every time I do it I get different results. The first time you do it, you will certainly get something on Monty Python. Often I get ones that are 4 years old. Fine for many topics, but not for Python and certainly not for data science or machine learning. Those subjects are just evolving too fast. Recently, I did the search again. The first two, (slide: Two Best YouTube Python Videos) at first glance were quite reasonable search results. Let us take a look at the next 4 results. The 3rd one was on Succinct (very space efficient) Data Structures. For the mass market, not a very important topic. Only 111 views. The 4th one was in Polish. I live in Poland so I can understand that. For many topics, you want to watch local videos, but for technical topics geography should not be considered by the recommendation engines. The 5th was on Tkinter, 732K views, but 4 years old. Not only has Python 3 changed since then, but now Machine Learning and Data Science are much more important than native app development. The 6th was in Hindi, with 252 K views. Okay there are a lot of developers in India, so I understand that as well. But my Hindi is limited to a few words. There must be a better way to discover Python videos.
2b) Google Instead of YouTube we could search on “Python” with Google. For me, Google only shows 4 pages (100 results ?/ page) of results. That is less than 400 videos. When I last tried that, 5 of the top 10 results were SEO optimized paid courses. That also makes sense. But I do not want paid courses. I want the good free stuff.
2c) YouTube and Google The larger problem is that we do not know the algorithm that YouTube and Google use to prioritize videos. We do know that YouTube tracks up votes, down votes, view count and watch time. (Slide: “Watch Time”). Presumably they also track how likely you are to then watch another video. And we believe that they use machine learning to serve the videos that you personally will watch the longest, and thereby click on the most ads. For a profit maximizing corporation driven by ad revenue this is a plausible business model. More importantly they want you to continue watching more videos after the current one ends. Say there is a video that recommends the world’s most useful software, and all the viewers quit watching YouTube and go off and download that piece of software. Well that is not the kind of video that is best for youtube’s profits.
3. PyVideo.org You could also search on [PyVideo.org](PyVideo.org). They currently index 13,245 videos. PyVideo started in 2008 when Python videos were scattered all over the web. They were a central searchable index for the small number of Python videos. The amount of videos grew rapidly until it overwhelmed the authors, and so they open sourced the site. Here are [the enents they index](https://pyvideo.org/events.html). By 2016 PyVideo was indexing 63 annual Python conferences and innumerable other meetups. But they are slowing down. In 2017 they indexed 60 annual conferences. And in 2018 just 53. There are two parts to PyVideo. The data and the search engine. The data is licensed under creative commons, the search engine is GPL’d. The data is most useful, PythonLinks.info frequently imports it. The great thing about the data is the rich JSON model. In particular they often link to the slides for a presentation. YouTube and conference websites usually do not have that information. The PyVideo data is generated by volunteers. They index many conferences, but not all conferences. It all depends on whether someone volunteers or not. The PyVideo search engine has a number of problems. They limit search results to 100 items. Some topics have more videos than that. They often return really old videos. They do not rank order the search results, so there is no way to tell which is best. They do have [tags](https://pyvideo.org/tags.html), but when last I counted there were over 2000 of them. Very hard to find the relevant tag. Also it is a folksonomy. Everyone uses a different set of tags. Not good.
4. PythonLinks.info The newest place to search for Python videos is [PythonLinks.info](https://PythonLinks.info/python). PythonLinks organizes videos into a tree of categories, and ranks the videos in any branch of the taxonomy using either the Wilson Score, the votes to views ratio or the most recent ranking.
4a) Categories One problem problem with Google, YouTube, Facebook and Twitter is the they return an infinite list of results. A basic principle in Human Factors is that there should not be more than about 7 items in any list. So PythonLinks sorts YouTube videos into a tree of categories. At the top level there are categories for Machine Learning, Data Science, Parallelism, People, Skills and Python Software. Let us explore the categories. (Here is where I demo the software. You can watch a slightly older version of the demo, soon to be updated at https://pythonlinks.info/introductory-video If time allows I would like to talk more about some of the best videos in each category. ) tailed Discussion of Categories. Depending on the time available I plan on including a more detailed discussion of the videos in each category.
b) Ranking Videos So how should one rank the videos? The YouTube API provides the up votes, the down votes, and in most of the videos, the page views. The first obvious ranking solution is to rank videos based on upvotes - downvotes. Figure 3 shows a table with those numbers. But that approach has an obvious problem. The first row is a terrible video with 500 upvotes and 400 downvotes. The second row is a great video with 48 upvotes and 1 down vote. The simplistic upvotes - downvotes gets the scores all wrong. The next more sophisticated upvotes divided by total votes is better. It sorts these two videos correctly. (Slide: Scoring by Up Votes / Total Votes) But it is too sensitive. The top videos usually get no downvotes. Their up votes to total votes ratio is a perfect 100%. A single downvote could throw them off the top 10 list. That is not right. It is a symptom of a larger statistical problem with this up votes to total votes ratio. The solution is to use the Wilson score. It is used by reddit and hacker news for ranking comments.
4c) Wilson Score The next slide “Binomial Distribution” shows the problem. (Before the conference I will probably do a new image using this software. http://www.astroml.org/book_figures/chapter3/fig_binomial_distribution.html ) A binomial distribution is the probability distribution for the total number of heads when we do n flips of a coin with a probability of p of heads, and q of tails. It is natural to guess that the value of p is the percent of time a head is shown, but the total number of heads is actually a random variable. There are two binomial distributions. There is a good chance that the distribution with lower trials n and lower probability p, actually ends up with the larger number of heads. That is not good. Without going into the math, the Wilson score takes into account confidence interfals, so that you can be sure that the better Wilson score is indeed the better video. The Wilson score dramatically improved results on my website, but it also has its problems. It favors main stream videos with lots of votes and thus little uncertainty. That penalizes the great talks from the smaller conferences. They may have a much higher rate of upvotes per views, but not enough upvotes to compete using the conservative Wilson score. The Wilson score also does not take into account that people do not like to criticize. They are more likely to up vote than down vote. Without evidence, I think that down votes should be multiplied by 5 to compensate for people’s reluctance to criticize.
4d) Votes / View So PythonLinks has started to offer a new scoring option. PythonLinks now supports a ratio of (upVotes - 5 * downVotes)/viewCount. It recommends some very interesting videos. I particularly like [Observe All Your Applications](https://pythonlinks.info/observe-all-your-applications?sortBy=bestMyScore). As a developer, I am so focussed on functionality, it is great to be reminded about the importance of making sure the everything is working.
4e) Comparison of PythonLinks and YouTube Search Results.
4f) PythonLinks would like to include authors names next to Python talks, but sadly that would be a GDPR violation. So there is now a link next to each talk, where you can grant permission to add your name.
6) Climate Change
ClimateVideos.info is a younger sister site to PythonLinks.info.
6a) The Climate Change Problem
The huge problem with climate change is that the arctic sea ice is disappearing. (Slide: Sea Ice Volume) By 2032, maybe sooner, the September sea ice will be gone, and before than much of the frozen biomass under the arctic and in the tundra will turn to methane, 100 x more climate change gases then we currently have. (Slide Clathrates Bubling Methane)
6b) Early You Tube Search Results
When I first searched on “Climate Change” on YouTube, the top result was the recent NBC interview with Micheal Bloomberg. I tried watching it, but it turned my stomach. It sounded like a commercial for a presidential candidate from the 0.001%, President Trump is also in the Top 10 YouTube Climate Change videos. Really? Who upvoted his videos? The 11th talk is with Putin, another climate change denier. Climate change deniers would never get added to ClimateVideos.info. In contrast the excellent COP 24 talk by the Swedish girl Greta ThurnBerg is nowhere to be seen. Nor is the People’s talk by David Attenborough. Nor is the excellent GreenPeace video on Forest Fires. https://www.youtube.com/watch?v=SV86xky2pWs&t=411s Unless you know to specifically look for them.
6c) More Recent Search Results
When I now look for Climate Change videos on YouTube, I now get great results. My current theory is that based on yrou clicks, the YouTube AI identifies you as a climate change believer or denier, and feeds you the appropriate results. Further testing will test this theory.
6d) Suppression of Climate Change information The more I work with Climate Change videos, the more I suspect that they are suppressed, not censored, just suppressed. We all know that the great wall of China blocks information about Tiananmen Square. We know that in the US, the media, those owned by the oil lobby did not talk about climate change. There were reports of wildfires, even a show on natural catastrophes, where the words “climate change” were never mentioned. So when Meet the Press hosted Michael Bloomberg, so many were surprised. Up until then climate change information on TV had been censored. But the situation on the internet is different. The information is out there, but somehow hard to find. There is a lot of anecdotal information about this. My friends, hardcore internet geeks, smart guys, just do not get good climate change information in their news feeds. A climate change researcher had a hard time finding good videos for his climate-change-denier brother-in-law.
But the real evidence is in the page views. (Slide: Climate Change Suppression) shows how few page views per month the best climate change videos get. And then compare that to how many page views the best Python videos get. The best Python videos get 10 x more views per month than the best climate videos. And remember there are six billion people affected by climate change, but at most six million Python developers. 1000 times more people than Python developers, but the Python videos get 10 times more views per month!!!! WTF? Why are the statistics so skewed in favor of Python? I do not know. But I think that we all need to be more suspicious about suppression of climate change information.
6f) Reddit.com Suppression The same thing happens on reddit.com Take a look at the upvotes on reddit.com/r/python. Today I found one posting with 2.4K and another with 1.2K. On reddit.com/r/climatechange, the best scores were 45 and 43. For reddit.com/r/climate the best scores were 257 and 214. An order of magnitude less than for /r/python. I could be wrong. Maybe the carbon lobby does not spend money to suppress climate change videos on the web. Maybe they spent all that money buying up TV stations, and then spent no money on the web. I could be wrong. Still information suppression is a topic which deserves more attention.
Seeking Volunteers (Slide: Seeking Volunteers)
There is just too much great content out there, needing to be organized. Sadly I am not able to watch and review all of the videos listed on PythonLinks.info. let alone on the entire web. Categorizing videos helps. Ranking them by scores helps. Getting access to the Watch Time would help. But the most valuable service is to have human editors, experts in their field, reviewing and organizing the videos. So I am looking for people to help me curate both PythonLinks.info and ClimateVideos.info. PythonLinks.info is a CMS, first cousin to Plone. So, I am particularly interested in experts who can manage a particular branch of the PythonLinks tree. People who know which are the best videos in their particular subject area. Who care about educating people about the wonderful Python tools and libraries that they use everyday.
In the last few years I have spoken at Python meetups in Warsaw, Katowice, Bialystok, Krakow, Wroclaw, Brno, Prague and Olomouc. I have also spoken at PyCon PL, PyCon DE, PyCon Slovakia PyCon UK and PyCon FR. https://www.youtube.com/watch?v=fdTdcHRJmr4 Prior to PyCon USA, I will be giving (and refining) this talk at multiple Python meetings in Eastern Europe. I am a native English speaker. At the risk of revealing my identity, maybe that is allowed in the second round, I have tremendous knowledge in the area of Python Videos. I wrote the software for PythonLinks.info. I wrote the software to access the YouTube api's. I catalogued the 962 videos on the site. The software is mature. (Even the GUI is starting to look good.) By the May conference, there will be a lot more videos indexed there. I am off to tell the world about it. A lot of people really like what I am doing with PythonLinks.info. It is a very useful and much needed resource for the Python community. It is now a thing. I hope that you will be kind enough to support my efforts in this area. Thank you for taking the time to review my proposal.
I am looking for editors/curators to help with branches of the tree. Please send me an email if you are interested.