Data Elixir logo

ISSUE 422  ·   January 31, 2023

 

Resources

Pandas Illustrated: The Definitive Visual Guide

Great, in-depth guide to pandas that's part tutorial and part reference, with lots of visuals, code samples and links to additional resources. Starts with a good discussion of how pandas is useful and continues with sections on Series & Index, DataFrames, and MultiIndexing. 
Better Programming | Lev Maximov

 

Sponsored Link

What do developers want the most in 2023? Is it better documentation or different tools?

Take part in the new Developer Nation survey and shape the ecosystem! For every survey response, Developer Nation will donate to one of the charities of respondents’ choosing. Hurry up, the survey is open until February 3! Start here.

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

Simpson's Paradox and Existential Terror

Great post about Simpson's paradox — a widely misunderstood statistical phenomenon.
Wilde Truth | Vera Wilde

 

Accelerating our A/B experiments with ML

This post explores a new testing approach that uses machine learning to predict the probable value of a customer. It's called "expected revenue" and it lets Dropbox make conclusions about A/B tests in days, rather than months, which means they can run experiments more often.
Dropbox Tech | Michael G. Wilson

 

Replacing a SQL analyst w/ 26 recursive GPT prompts

This probably won't replace people but it's a nice demonstration of how GPT could be used to help augment analytic workflows. The post is easy to follow and there's enough detail here that it could be a good starting point for your own needs and experiments.
Patterns | Ken Van Haren

 

Data is the lifeblood of modern businesses.

Innovating and delivering products quickly is essential to any company’s survival, but taking shortcuts on data security and privacy is very costly in the long run. As you aim to balance speed and security, don’t lose sight of the most common data privacy pitfalls and how you can avoid them. Download the free white paper today. 
// sponsored

 

Tools & Code

ipyflow

ipyflow is a next-generation Python kernel for Jupyter and other notebook interfaces that tracks dataflow relationships between symbols and cells during a given interactive session. It offers reactivity, execution suggestions, syntax extensions, and more.
GitHub | ipyflow

 

tidypolars

tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions that are familiar to R tidyverse users.
GitHub | Mark Fairbanks

 

Career

What we look for in a resume

If you're looking for work, this post will help you understand how resumes are read and what someone on the other side of the table might be looking for. There's a lot here, including how to show you have the right skills, resume length, cover letters, how startups are different, and more. Includes useful links.
Chip Huyen

 

Outlier

K-Means Clustering
xkcd | Randall Munroe

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!