Data Elixir logo

ISSUE 432 · April 18, 2023

 

Trends

The Data Delusion

From the early days of encyclopedias to the influence of data science in modern society, data and information storage have had a staggering impact on the world. This article explores the rise of numbers as an instrument of state power, the Technocracy movement, and the move from a culture of numbers to a culture of data.
The New Yorker | Jill Lepore

 

Sponsored Link

Webinar: How to expedite data analytics insights and reduce time-to-value with AWS & Rearc

You're invited!

Webinar: How to expedite data analytics insights and reduce time-to-value with AWS & Rearc

Leaders from AWS and Rearc showcase how incorporating third-party data in your analytics strategy streamlines data ingestion and saves time. Join our webinar to explore how data enrichment helps define business problems, find data dependencies, and accelerate insights. 

Date: April 19th 
Time: 10:00 AM (PST), 10:00 AM (BST), AND 10:00 AM (SGT)

Register now >

 
 
 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials & Opinions

A Framework for Making Decisions with Data

Great post that walks through a five step framework for making better, faster, and more data-informed decisions.
Caitlin Hudon

 

Detecting heart murmurs from time series data in R

Nice tutorial that shows how to use machine learning algorithms to make predictions using time series data. Shows how to calculate features, compare features between subjects, and how to fit a model with {tidymodels}. This is well organized and includes lots of code samples and visualizations along the way.
Nicola Rennie

 

Equality Of Odds

This new post from the popular MLU-Explain blog explores the concept of "Equalized Odds" (EO). EO is a fairness criterion that takes into account the underlying ground truth and aims to ensure that models make errors at similar rates within different groups of people. Using EO, you can measure and mitigate bias in a model's predictions.
MLU-Explain | Mia Mayer & Jared Wilber

 

Building LLM applications for production

Applications that are based on large language models (LLMs) have suddenly become super popular but the ambiguity and flexibility of LLMs present many challenges for making the applications "production-ready." This post explores strategies for mitigating these challenges and provides examples of use cases worth exploring.
Chip Huyen

 

Awesome Twitter Algo

For anyone interested in recommendation systems or, more specifically, how Twitter works under the hood, this is an awesome exploration of the Twitter code that was recently released. It discusses the key components of Twitter's recommender system and includes links to key papers and related resources along the way.
GitHub | Igor Brigadir and Vicki Boykis

 

Sharpen your math, CS and data skills in 15 minutes a day

For professionals and lifelong learners alike, Brilliant is one of the best ways to learn. The deets: Bite-sized interactive lessons make it easy to level up in everything from math and data science to AI and beyond. Join 10+ million people building skills every day. Start your 30-day free trial today!
// sponsored

 

Tools & Code

Introduction to rpolars

Polars is one of the fastest data table query libraries you can get. It optimizes queries for memory consumption and speed and is easy to use. rpolars is a package that allows R users to enjoy Polars benefits, potentially speeding up data processing tasks by 5-10x compared to dplyr. This post walks through its features and how to use it.
rpolars | Grant McDermott

 

AutoProfiler

AutoProfiler is a tool that automatically updates and visualizes Pandas dataframes in Jupyter notebooks, providing useful profiling information like column distributions, sample values, and summary statistics. 
GitHub | CMU Data Interaction Group

 

Resources

Building reproducible analytical pipelines with R

This book teaches data scientists, analysts, and researchers how to apply best practices from software engineering and DevOps to make projects robust, reliable, and reproducible. Ultimately, the ideas presented here will improve the quality of your work. Free to read.
Bruno Rodrigues

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir® is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!