Data Elixir logo

ISSUE 374  ·   February 15, 2022

 

In the News

ML Becomes a Mathematical Collaborator

Mathematicians often work together when they’re searching for insight into a hard problem. It’s a kind of a freewheeling collaborative process that seems to require a uniquely human touch. But in two new results, the role of human collaborator has been replaced in part by a machine...
Quanta Magazine | Kelsey Houston-Edwards

 

Sponsored Link

High-Quality Training Data to Accelerate AI & ML

Delivering Accurate Ground Truth Data for AI/ML Models

30+ years experience working with leading data-centric AI/ML models. With 3,500+ global SMEs and experience with any data type, we accelerate operations and advance models in record time. Scale your AI with your model's new secret weapon, Innodata.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 

Tutorials, Projects & Opinions

Data Distribution Shifts and Monitoring

Deploying a model to production isn't the end of the process because the model's performance will degrade over time. In this easy-to-follow deep dive, Chip Huyen explores the issues, including data distribution shifts, monitoring, and typical causes of ML failures. This post is intended for a machine learning systems design course at Stanford.
Chip Huyen

 

Privacy-preserving insurance quotes

Concrete Numpy is an open-source python package that compiles various numpy functions into their Fully Homomorphic Encryption (FHE) equivalents. In other words, Concrete Numpy allows models to work with sensitive data while encrypted. This tutorial walks through basic concepts of how Concrete Numpy works and how to use it.
Zama

 

Faster Python calculations with Numba

If you're writing array-oriented Python code that uses For loops, it doesn't help that NumPy is fast because the For loops are in Python, so it's slow. Here's how numba can get you a 13x speed increase with just two lines of code.
Itamar Turner-Trauring

 

Read: The Best Tools for ETL in 2022

Learn to efficiently integrate your data sources and get to analysis
// sponsored

 

Code & Tools

D-Tale

D-Tale is a visualization tool that makes it easy to view and analyze Pandas data structures. D-Tale supports a variety of pandas objects and it works seamlessly with Jupyter notebooks and python terminals. There's a lot of info here, including links to demos, tutorials and articles.
GitHub | Man Group 

 

Ask HN: Tools to visualize data in SQL databases?

Nice discussion about the various tools that are available to visualize data in a SQL table. There are a variety of use-cases here and discussion of pros/cons for both off-the-shelf products and open-source options.
Hacker News

 

Resources

The Effect: An Intro to Research Design and Causality

This new book is a great introduction to design-based causal inference. The first half takes an intuitive approach to develop an understanding for research design. The second half is more technical and introduces a standard toolset for doing causal inference. The entire book is written in a conversational style that's easy to follow. Free to read online.
Nick Huntington-Klein

 

Data Visualization

This map went viral. Here's how to make it.

This map went viral! Here's how to make it.

There's a lot of data presented in this state by state, stream graph representation of population data in the U.S. It's part of a bigger project that was done for a law firm and it's super effective. Here's a step-by-step tutorial showing how to make maps like this using R.
Data Stuff | Erin Davis

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

Unsubscribe