ISSUE 418 · January 3, 2023
Soccer Analytics 2022 Review
Awesome roundup of ⚽ analytics content from 2022. Covers research papers, blog posts, podcasts, books, code repos and more!
Statistical Rethinking 2023
The latest version of Richard McElreath's Statistical Rethinking course starts this week. This is a popular, online course that teaches data analysis, with a focus on scientific models. The course prioritizes conceptual, causal models and uses Bayesian data analysis to connect scientific models to evidence. Lectures are posted online each week.
Webinar: Don’t Get Lost in the Semantics
Join us for a dynamic conversation between Anna Filippova, Director of Community & Data at dbt Labs, and Benn Stancil, co-founder and Chief Analytics Officer at Mode about how data teams can make the most of the Semantic Layer. RSVP Now.
Tutorials, Projects & Opinions
Computing the Eigendecomposition and the Singular Value Decomposition
In part 5 of this series on Principal Component Analysis, Peter Bloem walks through three methods for computing the eigendecomposition and the singular value decomposition. Building these simple algorithms from scratch can teach a lot about what PCA actually does.
How Shapley Values Work
Shapley values - and their popular extension, SHAP - are machine learning explainability techniques that are easy to use and interpret but the theory can be intimidating to learn. This post explores how Shapley values work - not by using cryptic formulae, but through code and simplified explanations.
Code & Tools
Top Python libraries of 2022
Tryolabs' annual list of top Python libraries is consistently a must-read post. This well-researched list includes tools for distributed computing, putting notebooks in production, monitoring ML models, fast linting, profiling memory, interpretability, anomaly detection, and much more.
RTutor - Talk to your data via AI
RTutor enables users to interact with data via natural language. After uploading a dataset, users can ask questions about or request analyses in plain English. The app generates and runs R code to answer the questions with plots and numeric results. It can also explain statistical concepts and help users decide which tests to use. It's experimental but looks like a good tool for learning.
Why Business Data Science Irritates Me
A recent post called "Goodbye, Data Science" struck a nerve for a lot of people. In this post, another insider shares his own journey and frustrations. There's a lot of thought here, including practical ways to approach similar issues in your own career and workplace.
Annotated Forest Plots using ggplot2
Great tutorial that shows how to make annotated forest plots using ggplot2. This shows how to build them from scratch, without using packages like forester, forestplot, and ggforestplot. The approach outlined here gives you a lot of control and flexibility.
Exploratory spatial data analysis with Python
Kyle Walker's book, Analyzing US Census Data, has been a popular free download amongst Data Elixir readers but it's specifically for R users. In this first post of a new series, Kyle shows how to create some of his favorite examples from the book using Python.