Data Elixir logo

ISSUE 424  ·   February 14, 2023

 

Insights

Big Data Is Dead

Most data isn't very big. And if it is, you probably don't query it all once. And even if you do, the data probably still fits on a single machine. And if it doesn't, there's a good chance you don't actually use it all. Why are data tools built for the remaining 1%? There a lot of gems here.
MotherDuck | Jordan Tigani

 

ChatGPT Is a Blurry JPEG of the Web

This essay by Ted Chiang offers a useful perspective for understanding the tech that drives ChatGPT. OpenAI’s chatbot offers paraphrases, whereas Google offers quotes. Which do you prefer?
The New Yorker | Ted Chiang

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

Data-Free Disneyland

Between browsers, cell phones, credit card tracking, facial recognition, and license plate readers, it's hard to be incognito in 2023 — even at Disneyland! This is a great post that walks through a typical family's data exhaust and ways to obfuscate it.
The Opt Out Project

 

GPT in 60 Lines of NumPy

This GPT-from-scratch tutorial is a technical introduction to GPT that's intended to be as simple — and hackable — as possible. There's a lot here, including a code repo and clear descriptions of GPT architecture and how everything works. 
Jay Mody

 

Data Mishaps Night - February 23, 7:00 pm CST

The third annual Data Mishaps Night is coming up in just two weeks! This free online event features a lineup of data mistake stories that focus on the human aspect of data work with lessons learned along the way. It looks like it will be worthwhile — check out the lineup!
Data Mishaps Night

 

Tools & Code

PySports

PySports is a curated collection of sports-related open-source analytics projects. Includes python libraries and R packages for accessing, analyzing, modeling, and visualizing data from a wide variety of sports, such as baseball, basketball, football, hockey, cycling and more.
PySports Open Source

 

Data Visualization

Population Around a Point

Population Around a Point

This site calculates the human population that's within a selected radius from any point in the world. Includes a detailed rundown of where the data comes from and how the site works. There are ~95K people inside of a 5km radius from me. How about you?
Tom Forth

 

The mapping software you didn't know you needed

Great introduction to the open-source geospatial data system called QGIS. It starts with a real world use-case that shows why QGIS is worth knowing about and then continues with a detailed tutorial that explores available datasets, how to access that data, build maps, add layers, customize maps, and then take your maps into the field. 
Christian Hollinger

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!