ISSUE 314 · December 1, 2020In the News"It will change everything"DeepMind is now far beyond games like Chess and Go. In its latest breakthrough, AlphaFold has been recognized as a solution to one of biology's grand challenges — the “protein folding problem.” Here's a great overview of the accomplishment and why it's such a big deal. Sponsored LinkFind Your Next Job Through VetteryConnect with hiring managers from actively hiring startups and Fortune 500 companies that are using Vettery to grow their tech teams. Create a profile today to get interview requests as early as next week. Oh, and did we mention it’s free for job-seekers? Tutorials, Projects & OpinionsData Quality at Airbnb: Part 2 - A New Gold StandardIn this second post of a series, Vaughn Quoss walks through the approach that Airbnb uses to ensure data quality throughout the organization. This post focuses on identifying data quality issues and the development of a data certification process called "Midas." Least squares as springs'Regression, principal components, machine learning... it's all just springs pulling the model toward the data. Why don't we learn it this way?!' Code & ToolsReplicate - Version Control for Machine LearningReplicate is an open-source Python library that helps you keep track of ML experiments. Just add two lines to your training code and then Replicate will start tracking everything: code, hyperparameters, training data, weights, metrics, Python dependencies, etc. Bodywork Machine LearningBodywork is an open-source machine learning operations (MLOps) framework, targeted at automating the training and deployment of Python ML models on Kubernetes. ResourcesThe State of Open Data 2020Figshare's annual report on open data summarizes the survey results of 4500 researchers regarding their practices and challenges in sharing data in 2020. This covers a lot of ground and includes lots of interview snippets and visuals that give context and meaning to the results. Challenges in Deploying ML: a Survey of Case StudiesThis survey paper extracts practical considerations from recent case studies of a variety of ML applications and is organized into sections that correspond to stages of a typical machine learning workflow: from data management and model learning to verification and deployment. Project PickScented Candles: An unexpected victim of COVID-19Apparently, "There are angry ladies all over Yankee Candle’s site reporting that none of the candles they just got had any smell at all." Kate Petrova dives in with some fun data exploration. Data Elixir is curated and maintained by Lon Riesberg. For full-text search of prior issues, visit Data Elixir's Search Page. If you have suggestions or questions for the newsletter, just reply back to this email. Sign up to get Data Elixir's data science newsletter in your Inbox >> |