Data Elixir logo

ISSUE 344   ·   July 13, 2021

 

Insight

Top 10 ideas in statistics over past 50 years

Andrew Gelman, a statistics professor at Columbia, and Aki Vehtari, a computer science professor at Finland’s Aalto University, recently published a list of the most important statistical ideas in the last 50 years. Here, they break it down in easy-to-understand terms.
Columbia News | Kim Martineau

 

Roadmap: Data Infrastructure

A wave of startups is enabling the next generation of data-driven businesses. They offer better and easier-to-use infrastructure for accessing, analyzing, and furthering the use of data. Here's how the team at Bessemer Venture Partners thinks about this space and the key trends that are driving innovation.
Bessemer Venture Partners

 

Sponsored Link

A/B Test Pitfalls and How to Avoid Them | Mode

 

A/B Test Pitfalls and How to Avoid Them 

A/B tests can measurably improve our product and create a better experience for the majority of our users. But, if we’re not careful, they can keep us chasing product changes that do almost nothing at all. 
Here's what to keep in mind.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

Against SQL

SQL was initially developed 50 years ago and in spite of its flaws, it's become the universal language for data. This post explores SQL's many imperfections and what an ideal replacement might look like. There's a lot to consider here and, not surprisingly, the post stirred up a lot of discussion around the web. Also, see the post on Hacker News.
Jamie Brandon

 

Self-serve is a feeling

Self-serve analytics tools can empower business users to answer their own questions but what does "self-serve" mean exactly? And how is it different for different users? Understanding that is key to building tools that are actually useful. Enter, the self-serve Turing Test...
Benn Stancil

 

Causal Inference in the Wild: Elasticity Pricing

There are a lot of good articles about the theory of causal inference but not many examples of real-world applications. In this tutorial, Lars Roemheld shows how machine learning can improve causal inference in industry using elasticity pricing. This assumes you already have some understanding of causal inference but overall, it's pretty approachable.
Lars Roemheld

 

Parameterizing and automating Jupyter notebooks

There are commonly used hacks for changing Jupyter notebook parameters but a tool like papermill offers a much cleaner and reproducible approach. This tutorial starts with an introduction to papermill, walks through an example of parameterization, and discusses ways to fully schedule and automate notebook execution using cron.
Matt Wright

 

Democratize data across your organization with semantic layer

Join this webinar panel for practical advice from data and analytics leaders at Verizon, Snowflake, EverQuote, and Affinity Federal Credit Union on how to build and foster a data literate, self-service analysis culture at scale using a semantic layer. Register here.
// sponsored

 

Code & Tools

Tuplex: Blazing Fast Python Data Science

Tuplex is a data processing framework that runs Python data science pipelines at the speed of compiled code. It works interactively in the Python toplevel, integrates with Jupyter Notebooks, and provides familiar APIs. Follow the links for downloads and recent papers. 
CS at Brown University

 

Risk Assessment of GitHub Copilot

It's been a couple weeks since GitHub announced their AI-powered auto-code tool called "Copilot." The demos look amazing but there are a lot of questions. This post dives into the issues and takes a hands-on approach to explore Copilots strengths — and where it could get you into trouble.
0xabad1dea

 

Resources

Concise and beautiful algorithms written in Julia

This collection of Julia examples covers a wide variety of algorithms for machine learning, optimization, reinforcement learning, decision-making, and more. These aren't for production use but are clean examples to learn from. All implementations are working and self-contained, with tests.
GitHub | Robert Moss

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

 

To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >>

 
 

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Unsubscribe