Data Elixir logo

ISSUE 349  ·   August 17, 2021

 

Trends

The one data platform to rule them all... but according to whom?

Great post that breaks down a lot of the recent startups and organizes them by popular end users (e.g. data scientist, analyst, ML engineer, etc). Since a lot of the data tooling startups eventually want to become "data platforms," this post makes it easy to understand where they all fit. 
Leigh Marie Braswell

 

Data Quality Unpacked

Here's a broad look at the trends, jobs, and players of the increasingly important and complex role of data quality solutions.
Gradient Flow | Kenn So and Ben Lorica

 

Sponsored Link

Accelerate development with healthcare data

Accelerate development with healthcare data

Join a panel discussion with thought-leaders from AWS and healthcare powerhouse companies like QIAGEN and IBM Watson Health. Learn how healthcare datasets can accelerate the bench-to-bedside cycle through enhanced visualizations, machine learning algorithms, and curated genomics data.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

Bootstrapping Labels

Most machine learning tutorials and papers assume the availability of training labels. But what if labeled datasets aren't available? In his latest post, Eugene Yan explores ways to generate labels from scratch using semi, active, and weakly supervised learning. There's a lot here, including examples from DoorDash, Facebook, Google, and Apple.
Eugene Yan

 

Making better decisions with statistics

To make better decisions with statistics, here are three key principles to keep in mind.
Thomas Vladeck

 

Filter data before reading with awk and R

Unix command line tools might be "old school" but you sure can do a lot with them. This tutorial shows how to use the awk utility to quickly pre-process and read lots of bigger-than-RAM csv files without crashing your computer.
Luis D. Verde Arregoitia

 

Invitation - how to make data consumable w/ AWS, Wawa, and Facebook

Join this webinar with data & analytics leaders from Wawa, AWS, and Facebook on how to make data consumable for teams, at scale. Each panelist will share strategies to help you transform cloud data into valuable insights, reduce operational complexity, and scale data access and governance across BI tools. Sign up here.
// sponsored

 

Resources

Modern Statistics with R

From wrangling and exploring data to inference and predictive modeling, this new text introduces the key parts of the modern statistical toolkit. Free to download.
Måns Thulin

 

Causal Inference for The Brave and True

This open-source book is a light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis. Part 1 contains core concepts and models for causal inference. Part 2 contains modern development and applications of causal inference in industry. 
Matheus Facure Alves

 

Career

Changes to the data science role and tools

This discussion with Sean Taylor covers a lot of ground and his insights about data science roles and how the field has evolved over the past few years is a worthwhile listen. Sean is a Data Science Manager at Lyft and was previously a research scientist at Facebook where he was key to the development of the open source forecasting library, Prophet.
Data Exchange Podcast

 

Data Visualization

An Overview of Dataviz for Categorical Data

Categorical data is data that's bucketed into categories. Although it can be plotted using traditional chart types like line plots and scatter plots, other techniques are easier to read and are better at showing the relationships and flows between categorical data points. In this post, Jonathan Dunne explores four chart types to consider.
Nightingale | Jonathan Dunne

 

Data-Driven Album Covers

There are a few well-known album covers that feature data visualization but not many that visualize the music itself. In this post, three designers explore the mystery of finding the perfect marriage of music and image. With lots of examples.
Nightingale | Carni Klirs

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

Unsubscribe