— In the News —
DeepMind is now far beyond games like Chess and Go. In its latest breakthrough, AlphaFold has been recognized as a solution to one of biology's grand challenges — the “protein folding problem.” Here's a great overview of the accomplishment and why it's such a big deal.
— Tools and Techniques —
In this second post of a series, Vaughn Quoss walks through the approach that Airbnb uses to ensure data quality throughout the organization. This post focuses on identifying data quality issues and the development of a data certification process called "Midas."
'Regression, principal components, machine learning... it's all just springs pulling the model toward the data. Why don't we learn it this way?!'
Replicate is an open-source Python library that helps you keep track of ML experiments. Just add two lines to your training code and then Replicate will start tracking everything: code, hyperparameters, training data, weights, metrics, Python dependencies, etc.
Bodywork is an open-source machine learning operations (MLOps) framework, targeted at automating the training and deployment of Python ML models on Kubernetes.
— Resources —
Figshare's annual report on open data summarizes the survey results of 4500 researchers regarding their practices and challenges in sharing data in 2020. This covers a lot of ground and includes lots of interview snippets and visuals that give context and meaning to the results.
This survey paper extracts practical considerations from recent case studies of a variety of ML applications and is organized into sections that correspond to stages of a typical machine learning workflow: from data management and model learning to verification and deployment.
— Data Viz —
Apparently, "There are angry ladies all over Yankee Candle’s site reporting that none of the candles they just got had any smell at all." Kate Petrova dives in with some fun data exploration.