ISSUE 430 · March 28, 2023
Last week's SQL Tutor 🤖 wasn't perfect but it sure did stir up a lot of interest. It's still live if you haven't checked it out yet. SQL Tutor is the first of what will become a series of Data Elixir experiments and tools. Stay tuned...
Meanwhile, Data Elixir is taking next week off and will be back in your Inbox on April 11. Have a great couple of weeks!
Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI
In his latest interview, Lex Fridman talks with Sam Altman, the CEO of OpenAI. Given Sam's prominence in the startup community and how quickly OpenAI has been releasing groundbreaking new products, this is a good one to watch. The interview covers a lot of ground, including competition, AI fears, future applications, and advice for young people.
Explaining machine learning: A unique role at Amazon Web Services
As both a storyteller and AWS data scientist, Jared Wilber works on the Machine Learning University team where he creates visual explainers to help others learn about machine learning. His team launched MLU Explain, a public website containing fun animations to explain machine-learning concepts in an accessible manner.
Have a product, service, job, or event you'd like to share with Data Elixir readers?
Sponsor an Issue | Talent Collective
Tutorials & Opinions
Finding Undocumented APIs
It's not uncommon for data journalists to build their own datasets and undocumented APIs can be key. In this step-by-step tutorial, Leon Yin from The Markup explores how undocumented APIs have been used in investigations, how to find undocumented APIs in the wild and how to tap into them for robust data collection.
How the heck does one measure color?
Great rabbit hole about how to measure color. This is a fun read that includes historical info, science, and lots of useful links along the way. Color science is HARD!
Replacing an A/B Test with GPT
Could GPT be used to accurately predict the winner of a simple A/B test? This post explores the possibilities and along the way, shows how to build a model that can predict the difference between 2 vectors.
The Beginner's Guide to Databases
There are a lot of different types of databases out there, and they all do something slightly different. From Postgres to Elastic to Cassandra, there are virtually unlimited ways to store and query your data; and most companies will use several of them in tandem. This post makes it easy to understand what they all do.
Data for Good: Study Data Science at the Hertie School with a Full Scholarship
The Hertie School launched the Data for Good scholarship funded by the Dieter Schwarz Foundation, which covers 100% of tuition costs and will be awarded to up to 5 students. Applications are open for candidates of the Master of Data Science for Public Policy (MDS) passionate about data and its power to improve policymaking. The deadline to apply is 1 May.
Code & Tools
Data validation in Python: Pandera and Great Expectations
Pandera and Great Expectations are popular Python libraries for performing data validation but which should you choose? This post provides a broad overview of both libraries, shows how to create basic validation tests with each of them, and will help you decide which is best for your particular needs.
Eulogy for Dark Sky, a data visualization masterpiece
Great walk-through of the iOS Dark Sky app and the visualization considerations that led to its huge popularity. Dark Sky presented a lot of data to meet a variety of user needs and it did so in intuitive and compact ways. The insights here are useful for anyone presenting data, from apps and dashboards to papers and slide decks.