Data Elixir logo

ISSUE 445 · July 25, 2023

 

Podcasts

Test Driven Data Analysis - with Nick Radcliffe

Test-Driven Data Analysis (TDDA) is an approach to quality for data and for data analysis pipelines. In this podcast interview, Nick Radcliffe introduces the philosophy and the associated Python library, TDDA, that he's been developing since 2016.
Code for Thought podcast

 

Sponsored Link

Complete customer profiles in your data warehouse

Complete customer profiles in your data warehouse

RudderStack Profiles takes the SQL grunt work out of building customer profiles. You specify the customer traits, then Profiles runs the joins and computations for you to create complete profiles, so you can build better models, faster. And that’s just one use case. Get all the details.

 
 
 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Posts & Tutorials

Getting started with Vector DBs in Python

Especially with the popularity of LLMs, vector databases are all the rage these days. Which should you choose? Here's a great overview of nine popular options for Python, including strengths of each, sample code, and useful links along the way.
Daniel Doubrovkine

 

Emphasize what you want readers to see with color

One of the biggest superpowers of visualization is to lead a reader’s eye to the data you want to emphasize. By using color to create a visual hierarchy, you can decide what your readers see first, second, third, and last. This is a great post with lots of examples along the way.
Datawrapper | Lisa Charlotte Muth

 

From zero to hero: end to end data applications with SQL and Jupyter

In this online course, you'll learn how to develop and deploy an end-to-end data application with SQL, Python and Jupyter notebooks. Covers exploratory data analysis, SQL basics, workflow reproducibility, data pipelines, deployment, and more.
ploomber

 

Introduction to Cloud-Based Geospatial Analysis 

Nice introduction to Cloud-Based Geospatial Analysis using Google's Earth Engine and the geemap Python package. Covers the basics of Earth Engine data types and how to visualize, analyze, and export Earth Engine data in a Jupyter environment using geemap.
SciPy 2023 | Qiusheng Wu and Steve Greenberg

 

Career

Salary Calculator

Awesome salary calculator, based on a compensation prediction model that was built for a Kaggle competition in 2022. This was made specifically for people who work with data and there are a lot of options, including a variety of positions, locations, experience levels, and more.
Jobs In Data

 

Resources

Cookbook Polars for R

This cookbook for R users offers a side-by-side comparison of polars, R base, dplyr, tidyr and data.table for common tasks and problems. 
Damien Dotta

 

Comprehensive Python Cheatsheet

Exhaustive and concise — a truly Pythonic cheat sheet for the Python programming language.
Jure Šorn

 

Outlier

LLM Training Puzzles

This is a collection of 8 challenging puzzles about training large language models (or really any NN) on many, many GPUs. The goal of these puzzles is to get hands-on experience with the key primitives and to understand the goals of memory efficiency and compute pipelining.
GitHub | Sasha Rush

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir® is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!