Data Elixir

ISSUE 314   ·   December 1, 2020        

 

In the News

"It will change everything"

DeepMind is now far beyond games like Chess and Go. In its latest breakthrough, AlphaFold has been recognized as a solution to one of biology's grand challenges — the “protein folding problem.” Here's a great overview of the accomplishment and why it's such a big deal.
Nature

 
 

Sponsored Link

Find Your Next Job Through Vettery

Connect with hiring managers from actively hiring startups and Fortune 500 companies that are using Vettery to grow their tech teams. Create a profile today to get interview requests as early as next week. Oh, and did we mention it’s free for job-seekers?

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

Data Quality at Airbnb: Part 2 - A New Gold Standard

In this second post of a series, Vaughn Quoss walks through the approach that Airbnb uses to ensure data quality throughout the organization. This post focuses on identifying data quality issues and the development of a data certification process called "Midas."
Airbnb Engineering & Data Science

 
 
 

Least squares as springs

'Regression, principal components, machine learning... it's all just springs pulling the model toward the data.  Why don't we learn it this way?!'
Joshua Loftus

 

Code & Tools

Replicate - Version Control for Machine Learning

Replicate is an open-source Python library that helps you keep track of ML experiments. Just add two lines to your training code and then Replicate will start tracking everything: code, hyperparameters, training data, weights, metrics, Python dependencies, etc.
GitHub | replicate

 
 
 

Bodywork Machine Learning

Bodywork is an open-source machine learning operations (MLOps) framework, targeted at automating the training and deployment of Python ML models on Kubernetes.
Bodywork

 

Resources

The State of Open Data 2020

Figshare's annual report on open data summarizes the survey results of 4500 researchers regarding their practices and challenges in sharing data in 2020. This covers a lot of ground and includes lots of interview snippets and visuals that give context and meaning to the results.
Digital Science

 
 
 

Challenges in Deploying ML: a Survey of Case Studies

This survey paper extracts practical considerations from recent case studies of a variety of ML applications and is organized into sections that correspond to stages of a typical machine learning workflow: from data management and model learning to verification and deployment.
arXiv | Andrei Paleyes, Raoul-Gabriel Urma, Neil D. Lawrence

 

Project Pick

Scented Candles: An unexpected victim of COVID-19

Apparently, "There are angry ladies all over Yankee Candle’s site reporting that none of the candles they just got had any smell at all." Kate Petrova dives in with some fun data exploration.
Twitter | @kate_ptrv

 

Data Elixir is curated and maintained by Lon Riesberg. For full-text search of prior issues, visit Data Elixir's Search Page. If you have suggestions or questions for the newsletter, just reply back to this email.

 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
FacebookTwitterLinkedInWebsite
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe