Data Elixir logo

ISSUE 372  ·   February 1, 2022

 

Insight

Six Statistical Critiques That Don’t Quite Work

Skepticism about statistics is generally healthy but, just as blind belief can lead you to believe things that aren’t true, an overabundance of skepticism can lead you to disbelieve things that are actually true. Here are six common fallacies to watch out for.
Nick Huntington-Klein

 

We’ve only scratched the surface of the full potential for the data warehouse

Although it may feel like we’re at the peak of the data warehouse, there's a good argument here that we've barely scratched the surface. Here's how data warehouses are evolving to eventually become the control center for modern companies.
Inside Data | Mikkel Dengsøe

 

Sponsored Link

Solve Your Data Challenges Using AI & Human Expertise

Solve Your Data Challenges Using AI & Human Expertise

Innodata offers data annotation, transformation, collection, synthetic generation, and intelligent automation with industry-leading platforms and managed services. With 30+ years of experience and 3,500+ global SMEs, we accelerate operations and advance AI/ML models to help companies scale faster. Get started with Innodata today!

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

Experiment without the wait: Speeding up the iteration cycle with Offline Replay Experimentation

Online experimentation is often used to evaluate product ideas but it's costly and time-consuming. To help optimize the process, Pinterest developed a framework they call "Offline Relay Experimentation," which helps them predict outcomes without even running an experiment. Here's how it works.
Pinterest Engineering | Maxine Qian

 

A Modern Introduction to Probabilistic Programming

Probabilistic programming is a technique for translating mathematical models into executable code. This tutorial is an awesome introduction to how it works, including lots of examples, code samples, and links to important references along the way.
Austin Rochford 

 

A practical intro to Discrete Wavelet Transformation

Discreet wavelet transformations (DWT) can be used to remove noise from a signal, reduce dimensionality of data, and for tasks such as clustering and classification. This interactive post makes it easy to understand how DWT works and how to use it with simple examples.
Aurimas Račas

 

Predicting When Kickers Get Iced with {tidymodels}

This step-by-step tutorial explores data from the College Football Database, the modeling process using tidymodels, and how to explain the model using tools such as variable importance plots, partial dependency plots, and SHAP values. If you've been looking for a nice introduction to tidymodels, this is it!
JLaw's R Blog

 

Wordle 1/6 🟩🟩🟩🟩🟩

Everyone seems to be playing Wordle these days and many post their ⬛🟨🟩 scores on Twitter. There aren't answers in those squares but by using frequency distributions, Ben Hamner explores a clever approach for guessing the correct word on the first attempt — every time.
Kaggle | Ben Hamner

 

Get training data for ML in record time

Designed by engineers for engineers, Toloka combines cutting-edge technologies with the power of the crowd to deliver high-performing data for Machine Learning projects in record time. Built-in quality control system provides superb data accuracy at scale.
// sponsored

 

Code & Tools

SpyQL - SQL with Python in the middle

SpyQL is a query language that combines the simplicity and structure of SQL with the power and readability of Python. It's lightweight, easy to use and will feel familiar if you already work with Python or SQL.
GitHub | Daniel Moura

 

explainerdashboard

This package makes it easy to deploy a web app that explains the inner workings of a (scikit-learn compatible) machine learning model. Provides interactive plots on model performance, feature importances, feature contributions to individual predictions, "what if" analysis, partial dependence plots, SHAP (interaction) values and more.
GitHub | Oege Dijk

 

Resources

📢 Welcome to ar5iv.org

ar5iv offers a modern web view for arXiv's preprints. Change the "X" in any arXiv article link to a "5" and get a modern HTML5 document. This thread walks through what's included, why now, and how the project hopes to merge back into arXiv.
Twitter | @dginev

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

Unsubscribe