— Insight —
This new paper explores eight major statistical ideas of the past 50 years, including overviews and discussion of what they each have in common, how they differ and what to expect over the next few decades.
— Tools and Techniques —
Great follow-on article to the recent data quality series from Airbnb. In this article, Jeremy Stanley, founder of Anomalo, shows how to build and maintain high quality data "without raising billions."
The dynamic nature of machine learning makes model governance particularly challenging — especially at scale. This is a best-practices article from Microsoft that explores the issues and approaches.
Distill is a package for R Markdown that makes it easy to create technical articles, websites, and blogs in the style of the Distill Machine Learning Journal. Output is clean, interactive and engaging. Here are the highlights for the new 1.0 release, including links to key resources.
High performance 64 bit python analytics engine for numpy arrays with multi-threaded support. Enhances or replaces numpy or pandas and claims it can crunch numbers 1.5 to 10 times faster.
— Resources —
These short summaries of recent AI and Machine Learning research papers cover a wide variety of authors, topics and venues. Includes key points, diagrams and links for each paper.
— Data Viz —
Awesome ggplot2 tutorial with lots of examples. Includes a linked Table of Contents and useful resources at the end. Worth bookmarking.
Radial visualizations include circular layouts like pie charts, circular trees, sunbursts and weather wheels. They're an efficient way to present data but they can also be counter-productive. Here's a visual orientation of circular visualization options, why you might use them and the cases when clearly, there are better alternatives.