Data Elixir logo

ISSUE 443 · July 11, 2023

 

Talks & Conferences

Generative AI with LLMs: Hands-On Training feat.

This presentation by Jon Krohn is a great introduction to large language models. It's easy to follow and covers a lot of ground, starting with the technologies that got us to where we are today. The discussion continues with what modern LLMs are capable of, how to train and deploy LLMs, and ideas for getting commercial value from LLMs. 
ODSC | Jon Krohn

 

Sponsored Link

Complete customer profiles in your data warehouse

Complete customer profiles in your data warehouse

RudderStack Profiles takes the SQL grunt work out of building customer profiles. You specify the customer traits, then Profiles runs the joins and computations for you to create complete profiles, so you can build better models, faster. And that’s just one use case. Get all the details.

 
 
 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Posts & Tutorials

How to Peek at Statistical Tests w/o Breaking Them

Continuously monitoring A/B tests results in a considerable over-estimation of statistical significance. Many tests that should not be considered significant get misinterpreted. Generally, p-values are deeply problematic and will only make you regret your decisions. This post explores the problems of peeking and ways to alleviate them.
Brandon Rohrer 

 

Joins 13 Ways

Inner joins are common in the world of databases, but one thing about them is that everyone seems to have a different idea of what they are. In this post, Justin Jaffray explores different definitions of Inner Joins, ways to think about them, and ways to implement them. Some are arguably the same, but they’re all interesting nonetheless.
Justin Jaffray

 

Demystifying Text Data

Text data isn't structured in neat rows and columns like numerical data and it can be tricky to work with. In this tutorial, Saeed Esmaili walks through a few options, and in particular, shows how to use the python data transformation library called "unstructured."
Saeed Esmaili

 

CLI tools hidden in the Python standard library

Typical Python installations contain a lot of little tools that most people don't know about. This post shows how to find those hidden tools and discover what they do.
Simon Willison

 

Webinar: How to generate business intelligence leveraging Yelp's rich first-party data on AWS

Discover how to create actionable insights using Yelp's robust data sets to analyze your marketplace, your customers, and grow your business. Explore use cases on how businesses leverage this rich data with AWS Data Exchange to make strategic business decisions. Register now. 

Date: July 19, 2023
Time: 10:00 AM (PST)
              10:00 AM (BST)
              10:00 AM (SGT)

// sponsored

 

Tools & Code

Makie

Makie is an open-source, modern plotting library for Julia. It's a general-purpose tool that's powerful and packed with features that make it easy to explore large datasets, create graphics for publications, build dashboards, create interactive visualizations for the web, and more.
Makie

 

Messy Dates

Most packages for working with dates in R expect the dates to be in the standard yyyy-mm-dd format. But dates are often messy. {messydates} is an R package that allows for date translations from unstructured text, such as “Second of April, two thousand and twenty”, “26 BC”, as well as inputs that have uncertain, approximate, and sets/ranges of dates.
messydates

 

Career

How to Do Great Work

If you collected lists of techniques for doing great work in a lot of different fields, what would the intersection look like? How do ambitious people decide what to work on? What do they avoid? What are their common stumbling blocks? How important is luck? Why do nerds have an advantage? Why do rule-breakers rule?... Great post!
Paul Graham

 

Resources

Unraveling Principal Component Analysis (PCA)

PCA is a popular technique for analyzing datasets that have lots of dimensions for each data point and is commonly used in machine learning. This new book starts with the basics and then dives deep into the fundamentals, including eigenvectors, the spectral theorem, singular value decompositions, low-rank decompositions and more.
Peter Bloem

 

For Python! An Introduction to Statistical Learning

The newest edition of this popular statistics text was released a few days ago and this time, it's tailored for Python users. Topics cover some of the most important modeling and prediction techniques and is intended to be accessible to a broad audience. Follow the links to download the PDF.
Gareth James, Daniela Witten, et al.

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir® is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!