Data Elixir logo

ISSUE 368  ·   January 4, 2022

 

Insight

The Case Of The Abandoned Metrics

The world we see is largely painted by numbers these days and so the numbers have become politicized. But be careful. "Changing our metrics to suit our narratives has caused confusion, frustrated the honest, and destroyed public trust..."
Marginally Compelling | Matt Shapiro 

 

Sponsored Link

Free Course: Natural Language Processing (NLP) for Semantic Search

Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications! This free course from Pinecone covers everything you need to build state-of-the-art language models for semantic search. Start reading→

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

Ways I Use Testing as a Data Scientist

Writing tests is more than just making sure your code works. Here are practical ways to think about and approach testing for different types of code and data. With examples and links to key resources along the way.
Peter Baumgartner

 

Real-time machine learning: challenges and solutions

Although mainstream adoption of continual learning is still a few years away, there's a lot you can do now. In this post, Chip Huyen explores the challenges and solutions for online prediction and continual learning, with step-by-step use cases, considerations, and examples.
Chip Huyen

 

PostHog - Host Your Own Product Analytics

PostHog is the all-in-one platform for building better products. Heatmaps, funnel analysis, feature flags, session replays and more in one powerful, open-source platform. Try PostHog today for free, with no time limits.
// sponsored

 

Code & Tools

ETNA Time Series Library

ETNA is an easy-to-use time series forecasting framework that includes built-in toolkits for time series preprocessing, feature generation, and a variety of predictive models, from classic machine learning to SOTA neural networks, models combination methods and smart backtesting. 
GitHub | tinkoff-ai

 

Zingg

Zingg is an ML based tool that provides scalable data mastering, deduplication and entity resolution.
GitHub | ZinggAI

 

Resources

⚽ Analytics 2021 Review

Awesome roundup of ⚽ analytics content from 2021. Covers research papers, blog posts, podcasts, webinars, and more!
Jan Van Haaren

 

📺 ML YouTube Courses

Here are some of the best and most recent machine learning courses that are available on YouTube. If you're not already familiar with courses on YouTube, the quality of the content may surprise you. There are a wide variety of ML courses here, ranging from introductory to advanced from Stanford, UC Berkeley, CMU, UMass, NYU, and MIT.
GitHub | DAIR.AI

 

Data Visualization

100 Beautiful and Informative Notebooks of 2021

Nice selection of top Observable notebooks, covering a wide range of topics in things like mathematics, sports, science, maps, tutorials, and much more. This is worth skimming first and then pick a few projects to dive into later. There's a lot here that's worthwhile.
Observable | Tom Larkworthy

 

Science visualization trends of 2021

Helena Jambor went through a collection of 2021 Nature articles and distilled 10 key visualization trends. This is well organized and includes screenshots, clear descriptions, and links to key resources.  
Helena Jambor

 

Outlier

Confounding Variables

Confounding Variables

OMG-stats! Not sure what it means? Here's an ~explainer >>
xkcd.com

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

Unsubscribe