Data Elixir logo

ISSUE 438 · May 30, 2023

 

Favorite Data Podcasts?

If you have a favorite data podcast, cast your vote here and we'll report the top picks in an upcoming issue of the newsletter 👉

 

Talks & Conferences

State of GPT

In this session from last week's Microsoft Build, Andrej Karpathy describes the pipeline for training bots like ChatGPT. From there, he dives into into practical techniques for using GPT effectively, including prompting techniques, finetuning, tools, and things to expect. This is a great talk but if you're short on time, see Alex Volkov's notes 👉Microsoft Build | Andrej Karpathy — 43 minutes

 

Malloy - An Experimental Language for Data

Forcing data through a rectangle shapes the way we solve problems (e.g. dimensional fact tables, OLAP Cubes). But most data isn't rectangular — it's hierarchical. In this talk, Lloyd Tabb describes a new data programming language that transcends the rectangle paradigm and breaks long held misconceptions in the way we analyze data.
Data Council | Lloyd Tabb — 34 minutes

 

Sponsored Link

Datalore. A collaborative data science platform.

Datalore. A collaborative data science platform.

Data science teams face many challenges when trying to optimize their processes and ship research results and machine learning models faster. Datalore has become a game-changing solution for data teams across industries, enabling ergonomic data access, effortless collaboration, and easy reporting via Jupyter notebooks. Try Datalore for free

 
 
 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Posts & Tutorials

Intro to Vega-Lite

Intro to Vega-Lite

Vega-Lite is a high-level language for rapidly creating interactive visualizations. It includes support for a variety of data and visual transformations and doesn't need a lot of code. This multi-part tutorial introduces Vega-Lite and offers a variety of step-by-step examples.
Observable | Jon E. Froehlich

 

Choosing a good file format for Pandas

There are plenty of data formats supported by Pandas. Which should you choose and why?
Python⇒Speed | Itamar Turner-Trauring

 

Capturing Output to External Files

The sink() function in R is used to divert R output to an external connection. This can be useful for a variety of uses, such as exporting data to a file, logging R output, or debugging code. Here's how it works.
Steve’s Data Tips and Tricks

 

Some Intuition on Attention and the Transformer

As ChatGPT and other LLMs get thrust into the mainstream, more people outside of ML and NLP circles are trying to better understand Attention and the Transformer. Here are some answers to common questions, with a focus on conveying the intuition.
Eugene Yan

 

Google Advanced Data Analytics Certificate

The Google Advanced Data Analytics Professional Certificate is a 7-course series that focuses on building regression and machine learning models, applying statistical methods to investigate data, creating data visualizations, and communicating insights from data analysis to stakeholders. The course is run by Coursera and is free to get started.
Data Elixir Partner

 

Papers

Tree of Thoughts: Problem Solving with LLMs

Language models fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. You may have heard of "Chain of Thought" prompting to help overcome these issues. "Tree of Thought" works much better. 
arXiv | Shunyu Yao, et al.

 

LIMA: Less Is More for Alignment

Researchers at Meta have shown that remarkably capable LLMs can be achieved with only 1,000 carefully curated examples. This could be a game-changer for researchers and small-scale developers. 
arXiv | Chunting Zhou, et al.

 

Resources

Data Science Interview - Questions & Answers

Nice collection of data science interview questions and answers. There are 100+ questions here, covering machine learning, statistics, probability, python, SQL, and more.
GitHub | Youssef Hosni

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir® is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!