Data Elixir logo

ISSUE 403  ·   September 6, 2022

 

Join the Data Elixir Talent Collective

The Data Elixir Talent Collective is a reverse job board where top companies apply to you. Members control all communication, so you won't get the noise that's typical on other recruiting channels. Choose to be anonymous or public and get matched with opportunities that fit your specific interests; be it a career move, more pay, remote work, etc. 

The Collective is off to a great start with 75 members in its first week.

This is a free resource but membership is limited to professionals with 3+ years experience in data science, analytics, ML, visualization, and related fields. The plan is to create the highest signal-to-noise hiring resource for roles in the data ecosystem. For more info, apply here.

 

Insights

Down the Semantic Rabbit Hole

The purpose of a semantic layer is to close the gap between the “business language” and the “data language” and offer a unified and consistent view of the business. This deep dive into the topic is one of the more complete explainers I’ve come across.
JP Monteiro

 

Be Good-argument-Driven, Not Data-driven

When you begin to favor *bad* arguments that involve data over *good* arguments that don’t, then you’ll notice things start to go awry. Richard Marmorstein offers a reminder for us data folks to consider foundational elements of the argument itself before leaning on metrics.
Richard Marmorstein 

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

Communicating A/B Test Results with Ratios and Uncertainty Intervals

If you work with A/B tests frequently then the question of how to communicate uncertainty has likely come to mind. This argument for presenting A/B test tests for conversion rates as ratios, with uncertainty intervals around those ratios makes a lot of sense to me.
Harlan D. Harris

 

How Instacart Uses Machine Learning-Driven Autocomplete to Help People Fill Their Carts

Search is a big deal for eCommerce platforms like Instacart. This post describes how the team generates and ranks query suggestions in autocomplete and how this shapes a user’s search behavior — translating into larger basket sizes.
Tech at Instacart | Esther Vasiete

 

Measuring Downstream Impact on Social Networks by Using an Attribution Framework

It’s not the easiest task to quantify the impact that a certain action has in social networks because those actions have secondary impacts across other members. Here, LinkedIn introduces a practical framework for running experiments in these types of environments.
LinkedIn Engineering | Qiannan Yin, Derek Koh, and Jenny Wu

 

Bayesian Age/Period/Cohort Models in Python with PyMC

PyMC can be a powerful tool in the right hands. Austin Rochford does a good job of showcasing this through Bayesian APC models, including lots of code snippets for getting you ready to tackle the inferential challenges these models pose.
Austin Rochford

 

Creating a Real-Time Feature Store with Materialize

Why is everyone talking about feature stores, yet so few are (successfully) implementing them? In this post, we walk you through how Materialize can be used to create a real-time feature store for a fraud detection use case
// sponsored

 

Code & Tools

R Shiny Now Available in Python

For the uninitiated, Shiny is a package that makes it easy to build interactive web applications and dashboards. It was previously limited to the R programming language but recently, the creators of Shiny have announced Shiny for Python as well!
Dario Radečić

 

DocQuery: Document Query Engine

DocQuery is a Python library that makes it easy to extract information from documents. Works with semi-structured and unstructured documents, such as PDFs, scanned images, etc. Simply point DocQuery at one or more documents and specify a question you want to ask.
GitHub | Impira

 

Data Visualization

The Magic of Matplotlib Stylesheets

The “out of the box” visualizations in Matplotlib aren’t fantastic. Luckily, stylesheets can level up your plots in a very accessible manner. This overview takes you from fairly uninspiring visualizations to something much more compelling.
datafantic | Robert Ritz

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

This week's issue was curated by Conor Dewey. Conor is a data scientist and product growth operator, currently at Metabase. Data Elixir is edited and maintained by Lon Riesberg. If you have questions or suggestions, send a note!