Data Elixir logo

ISSUE 376  ·   March 1, 2022

 

In the News

If you've been watching the news and wondering how a data/tech person can best help the situation in Ukraine, this Twitter discussion that Andrew Therriault started is a good place to start. Andrew is a data scientist and the founder of the data-for-good organization called Civin.

@therriaultphd thread
 

Insight

Why becoming a data-driven organization is so hard

A new survey of executives explores the challenges that many organizations face when trying to become "data-driven." In this article, Randy Bean considers the results from the survey and distills key steps that organizations can take to succeed. For a detailed report, follow the link to the "NewVantage Partners annual survey."
Harvard Business Review | Randy Bean

 

Sponsored Link

30-Day Trial of Innodata’s New Data Annotation Platform

30-Day Trial of Innodata’s New Data Annotation Platform

This new web-based annotation platform reduces the cost of AI/ML projects while enabling users to develop more accurate models. With easy-to-use workflows, customizable workbenches, real-time KPIs, and auto-annotation capabilities, your team can create high-quality training data on-prem or on the cloud. Sign up for a free 30-day trial today!

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 

Tutorials, Projects & Opinions

Notebooks In Production With Metaflow

Notebooks in production?! Here's how — and why — new Metaflow features enable you to orchestrate notebook execution in your DAGs and visualize/debug production workflows. 
Outerbounds | Hamel Husain

 

Learning with limited data - part 2: Active Learning

Here's part 2 of what to do when faced with a limited amount of labeled data for supervised learning tasks. When the labeling budget is limited or the labeling cost is high, active learning is useful for selecting the most valuable samples to label next.
Lilian Weng

 

Web Scraping with R

This step-by-step tutorial shows how to scrape web pages using R, starting with simple web scraping tasks and then continuing with scraping multiple pages.
Scrapingdog

 

It's Time to Rethink Your Media Diet

Founded by investment bankers, The Daily Upside is our favorite source for premium business news that’s actually worth reading. The Daily Upside delivers crisp insights on market-moving stories and teases out nuances you won't read elsewhere. It's completely free, and the sole purpose is to make you a sharp, well-informed investor.
// sponsored

 

Code & Tools

Statistical ⚡️ Forecast

This new package is said to be the fastest yet autoarima implementation for Python. It's claimed to be 20x faster than pmdarima and 500x faster than MetaAI's Prophet. Includes a collection of widely used univariate time series forecasting models, including exponential smoothing and automatic ARIMA modeling.
GitHub | Nixtla

 

imodels

Interpretability is important in high-stakes fields like medicine and political science but complicated models are increasingly difficult to interpret. imodels is a new Python package for concise, transparent, and accurate predictive modeling.
Chandan Singh

 

Resources

Probabilistic Machine Learning: Advanced Topics

Kevin Murphy just released the Advanced Topics edition of his popular Probabilistic Machine Learning series. It's not quite done but it's close and is free to download. The text covers Bayesian inference, causality, reinforcement learning, distribution shift, deep generative models, etc. For the other books in this series, see the pml-book website >>
Kevin P. Murphy

 

Data Visualization

Colors for all!

This new R package makes it easy to find the color palettes that will work best for your particular data and use-case. Just select a few parameters that describe your data and cols4all will score a variety of palettes on aspects that will make your visualizations more effective.
GitHub | Martijn Tennekes

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

Unsubscribe