Data Elixir logo

ISSUE 417  ·   December 20, 2022

 

Happy Holidays everyone! Data Elixir is taking a break next week and will be back in your Inbox on January 3rd. However you celebrate the coming new year, I wish the very best for you and your families!

-Lon

 

Sponsored Link

Do more with data, together

Bring SQL, Python, no-code, and R together in one UI 

From exploratory analyses to beautiful data apps to ML modeling and data science, Hex streamlines the entire analytics workflow so your team can focus on generating insights, driving decisions, and moving things forward. No more jumping between tools, struggling with versions, or sharing via screenshot. Try Hex free with a 14 day trial.

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

TikTok’s Secret Sauce

It's easy to think that the most important part of a recommender is the underlying algorithm but in TikTok's case, the real innovation is less obvious. Even if recommenders on other social media platforms are equally accurate, it feels more accurate on TikTok. Here's why.
Columbia University | Arvind Narayanan

 

Mitigating the winner’s curse in online experiments

Experimentation plays a key role in testing and deploying new ideas at Etsy, but it can lead to overestimating the impact of winning treatments. It's a phenomenon that's known as the "winner's curse." This post explores how Bayesian statistics can be used to mitigate this effect and more accurately account for the business impact of experiments.
Etsy Code as Craft | Stephane Shao

 

NormConf: Selected talks and lessons learned

In case you missed NormConf last week or just didn't have time for everything, here's a nice set of notes and replays from a handful of selected presentations. Includes Vicki's opening keynote, Katie Bauer's talk on working with PMs, Josh Will's talk on model training, and Chris Albon's talk on invisible work.
Prefect Blog | Anna Geller

 
 

A Failed Machine Learning Project

Projects generally only show up in public if they work. This post takes a different approach and explores how a simple ML project failed.
Robert Ritz

 

Code & Tools

Sematic

Sematic is an open-source development toolkit to help prototype and productionize machine learning pipelines. It uses native python and helps to speed up your workflow, monitor all inputs & outputs, and can execute your pipelines locally or in the cloud.
GitHub | sematic-ai

 

Resources

High-Dimensional Probability

This course by Roman Vershynin explores high-dimensional probability and its applications in data science. There are 41 lectures here with detailed notes and homework assignments. The series is loosely based on his book with the same title, which can be downloaded for free.
UC Irvine | Roman Vershynin

 

List of AI and ML Conferences in 2023

Looking for a conference to attend, sponsor or submit a talk to? This latest list from Tryolabs includes 42 machine learning conferences around the world. Each listing is linked to the associated website and filters make it easy to search by continent or category.
Tryolabs

 

Data Visualization

R packages for visualising spatial data

Nice write-up of R packages that are commonly used for visualizing spatial data. Covers a variety of uses, such as creating and using themes, interactive maps, working with elevation data, plotting raster objects, shading for 3D effects and more.
Nicola Rennie

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!