— In the News —
In spite of its widespread use in the media, epidemiologists have been keen to downplay the importance of the reproduction number known as "R". In this article in Nature, David Adam explores what that number really means and what we should be watching instead.
When a self-driving Uber car struck and killed a pedestrian in 2018, the company was cleared of criminal wrongdoing, but the human tasked with monitoring the system faces the prospect of manslaughter charges. The outcome could be a harbinger of what lies ahead.
— Tools and Techniques —
After her first year as a machine learning engineer, Shreya Shankar offers real-world insights into the field and why most machine learning projects don't make it to production — in spite of "crazy advances" in machine learning technology and expertise.
Shopify's Data Team is building for the long term and it's paying off. Here's how they leverage years of work in data warehousing and analysis to support its ecosystem of internal teams and partners.
Paper Projects provides a structure to implement experiments and learn from trending papers. It's intended to be an "opportunity for the community to come together, learn and help each other."
Great guide for setting up a new Python project with an automated workflow of units tests, coverage reports, lint checks, and type checks that’ll catch the the majority of errors and facilitate collaboration.
— Resources —
This is easily the best collection of free R learning resources I've come across. There's a wide variety of topics here, including getting-started guides, statistics, machine learning, text mining, package development, efficient programming, ggplot2, etc. For all levels, beginner to advanced.
This short book explores the options for building analytics and BI stacks for the modern cloud. It's a very well-written, practical tour that walks through the various components and how they fit together. Great read.
— Data Viz —
Vaex and Dash are open-source libraries that make it easy to build interactive dashboards on the web for millions, and even billions, of data samples using just your Python skills. This tutorial shows what you can do with these libraries and how to use them.
COVID-19 data has largely been a success story for data visualization but there are important lessons to be learned. Detailed dashboards, in particular, can be problematic and misleading. This short post outlines some of the key issues, including suggestions for better approaches.