— In the News —
Before public health officials in the U.S. can manage the pandemic, they must deal with a broken data system that sends incomplete results in formats they can’t easily use. It's a messy problem that's not easily fixed.
— Tools and Techniques —
When it's easy to run lots of experiments but expensive to observe, it's best to halt unpromising experiments as soon as possible. Here's how to think through such "experiment-rich" regimes.
The Data Science Lifecycle Process is a git-based framework that helps keep track of business questions, data requirements, experiments and model deployments involved in data science projects.
Here's the thinking behind Netflix's "data efficiency dashboard," which has helped reduce Netflix's overall storage footprint by more than 10%.
Great article about how the team at Mozilla is using machine learning to facilitate continuous integration - at scale - for Firefox development.
Doing machine learning with time series data can get complicated fast and Darts is an open-source library that aims to simplify the process. It's inspired by scikit-learn and uses a consistent API with a powerful set of tools. This announcement explores its capabilities and motivations.
— Resources —
This free, self-paced course is a gold mine for anyone that wants to get started with production ML in the real world. Covers things like infrastructure & tooling, training & debugging, testing & deployment, and data management. Lots of guest lectures by industry leaders too.
— Data Viz —
Great resource for ggplot2 users and/or people interested in learning about ggplot2. This is well-organized and each section includes a short discussion and examples with code walk-throughs.