Data Elixir logo

ISSUE 420  ·   January 17, 2023

 

Resources

Modern Polars

If you're not already familiar with it, Polars is a multithreaded, memory efficient, and very fast DataFrames library. This online book is a side-by-side comparison of Polars and Pandas, based on the Modern Pandas series by Tom Augsburger. This looks like a great resource.
Kevin Heavey

 

Sponsored Link

Turn documents into structured data with Sensible

Turn documents into structured data with Sensible

Sensible is the developer-first platform that makes accessing the data in documents as easy as calling an API. Avoid the complexities and headaches of PDF parsing. Learn more about Sensible →

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

Let's build GPT: from scratch, in code, spelled out.

In this awesome explainer, Andrej Karpathy shows how to build and train a Transformer following the "Attention Is All You Need" paper. This is very approachable and there are a lot useful links in the notes. 
YouTube | Andrej Karpathy

 

How to get computational superpowers via ChatGPT

How does the impressively human-like ChatGPT get computational knowledge superpowers? Give it a Wolfram|Alpha neural implant!
Stephen Wolfram

 

Making predictions from a mixed model using R

Nice introduction to using mixed models to make predictions using R. Starts with a simple linear regression and then walks through using the mixed model package {lme4} to extract confidence intervals and predictions intervals.
Optimum Sports Performance | Patrick Ward

 

Tools & Code

balance

balance is a new Python package that makes it easy to adjust biased data samples. In this announcement post, Roee Eilat describes the problem of biased data, how it occurs, and how this new package works. 
Meta Platforms | Roee Eilat

 

A Jupyter kernel for GNU Octave

GNU Octave is a high-level programming language that's primarily intended for scientific computing and numerical computation. It has a mathematics oriented syntax that's mostly compatible with MATLAB and it helps solve linear and nonlinear problems numerically. In this post, Giulio Girardi introduces a GNU Octave kernal for Jupyter.
Jupyter Blog | Giulio Girardi

 

Career

Machine Learning in Weather & Climate MOOC

This free course introduces ML and its applications for weather and climate work. It covers a variety of research and operations areas in forecasting, ocean & climate modeling, meteorology and more. Includes expert speakers throughout and it looks like a great way to learn about a variety of careers that use ML for weather and climate.
ECMWF

 

Data Visualization

Analyzing labor markets in Python with LODES data

In his latest post, Kyle Walker shows how to analyze and map commute patterns with the Python pygris package and LODES data. This is the second of a series where Kyle translates his favorite sections from his book, Analyzing US Census Data, to Python.
Kyle Walker

 

Graphic Walker

Graphic Walker is an open source alternative to Tableau that's built as a React component for easy installation on a website. The interface uses simple drag-and-drop operations and is based on the Grammar of Graphics using vega-lite.
GitHub | Kanaries

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!