Data Elixir logo

ISSUE 375  ยท   February 22, 2022

 

Insight

Why we use Julia, 10 years later

Ten years ago this month, the Julia language was introduced to the world. It's now used by hundreds of thousands of people and taught at universities around the world. If you're not a user already, this post will give you a sense of what it can do and where things are going. And if you are a user, there are a lot of good Twitter feeds here to follow.
Julia Blog

 

Sponsored Link

Trust your data at scale with Castor

Trust your data at scale with Castor

Castor is a collaborative and automated data catalog. It is designed for mass adoption within your company, regardless of data literacy. Deploy in 30 minutes. Explore data lineage. Trust your data.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 

Tutorials, Projects & Opinions

Data Diffs: Algorithms for explaining what changed in a dataset

Nice introduction to "explanation algorithms" and how asking "why?" at the SQL level can be a useful way to identify the changes in a dataset that resulted in a different outcome. Along with exploring approaches, this post introduces an open-source data-differ that, essentially, takes two SQL queries and explains what makes their results different.
Adam Marcus

 

Lyft and urban mobility

Fun post from Mark Huberty that combines ride data from Lyft with graph theory and clustering to learn about urban mobility.
Lyft Engineering | Mark Huberty

 

The Unbundling of Airflow

If the unbundling of Airflow means all the heavy lifting is done by separate tools, what's left behind?
Gorkem Yurtseven 

 

Get training data for ML in record time

Designed by engineers for engineers, Toloka combines cutting-edge technologies with the power of the crowd to deliver high-performing data for Machine Learning projects in record time. Built-in quality control system provides superb data accuracy at scale.
// sponsored

 

Code & Tools

ipycanvas - Interactive Canvas in Jupyter

ipycanvas is a lightweight, fast and stable library that exposes the browser's Canvas API to IPython. In other words, this toolset makes it easy to draw simple primitives such as text, lines, polygons, arc, etc. directly from Python. For ideas, check out the examples.
GitHub | Martin Renou

 

Mito

Mito is a spreadsheet that lives inside your JupyterLab notebooks. It allows you to edit Pandas dataframes like an Excel file, and generates Python code that corresponds to each of your edits.
GitHub | Mito-DS

 

Career

Red Flags to Look Out for When Joining a Data Team

Thinking about changing jobs? This is a nice collection of insights to help you steer clear of bad surprises. It's tailored for data teams but the ideas here generally apply to most tech roles. For more, check out the Twitter discussion that started it >>
Eugene Yan

 

Data Visualization

How to use fewer colors in your data visualizations

If you've ever tried to decipher a chart with dozens of colors, you know that color isn't always a good thing. Here are 10 ways to use fewer colors in your charts while also making them more understandable.
Datawrapper | Lisa Charlotte Muth

 

Increasing Flexibility & Robustness of Plots in ggplot2

In this step-by-step tutorial, Meghan Hall shows how to make your ggplot2 visualizations more flexible and robust to accommodate data that's changing.
Meghan Hall

 

Outlier

Forgotten Books

Scholars that study medieval literature are faced with lots of missing evidence. Manuscripts degrade over time; libraries burn. And what's left is barely a sketch of what the scholars are interested in. But by borrowing statistical concepts from ecology, researchers are able to estimate the data that isn't there. This is an awesome project that explores the issues & approaches, with a new statistics package to boot.
Science | Forgotten Books

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

Unsubscribe