— In the News —
OpenAI's revolutionary new language model that's so good, they're "afraid to release it" inspired a lot of discussion around the web last week. New resources are quickly popping up, including an open clone of the unreleased dataset scraper. This article from MIT's Technology Review takes a look at the language philosophies that drive NLP techniques and shows that even though GPT-2 can write like a human, "it doesn't have a clue what it's saying."
Nice thread that debunks a paper that got a lot of attention in the press last year. It's not long but there's a lot to unpack here regarding statistics, bad science and the media.
— Insight —
Short and painful via xkcd!
— Sponsored Link —
Vettery specializes in tech roles and is completely free for job seekers. Interested? Submit your profile, and if accepted onto the platform, you can receive interview requests directly from top companies growing their data science teams.
— Tools and Techniques —
This is a great step-by-step tutorial of a Bayesian data analysis workflow using PyMC3 and ArviZ. Starts with data prep and exploration and then walks through a couple modeling scenarios with comparisons. There are worthwhile linked references in the beginning so even if you're not interested in the tutorial, it's worth skimming.
xaringan is a popular R package for creating slide presentations using R Markdown. This post by Yihui Xie walks through the latest live-preview capabilities, which should make it even more awesome to work with. If you're new to xaringan, the documentation is written as a xaringan presentation called "Presentation Ninja" and is a nice introduction to start with.
Emily Gorcenski explores why versioning machine learning systems is particularly challenging and she offers specific recommendations for simplifying the process.
Nice overview for scaling R applications with Docker and Kubernetes on GCP. Discusses a variety of scenarios and includes lots of sample code.
Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Our Chief Data Scientist specializes in training data bias, the source of most headlines about AI failures. Our overview on the subject – Four Types of AI Bias – offers guidelines for detecting and mitigating bias.
— Resources —
Christoph Molnar recently released the first edition of his guide for making black box models explainable. The HTML version is free and an e-book version is available from Leanpub.
— Data Viz —
This new initiative aims to foster community and shape professional development for data visualization creators of all backgrounds. There's a lot of potential here and it's free to sign up.
Since its release last year, a lot of tutorials, references and examples have been created for the Observable platform. This new "User Manual" collects everything into a single page of organized links. Sections include introductory concepts, Libraries, How To & Techniques, and Tips & Tricks. In case you're not familiar with it, Observable is a platform for creating interactive notebooks to work with data on the web. It's a Mike Bostock creation, of D3js fame.