ISSUE 344 · July 13, 2021InsightTop 10 ideas in statistics over past 50 yearsAndrew Gelman, a statistics professor at Columbia, and Aki Vehtari, a computer science professor at Finland’s Aalto University, recently published a list of the most important statistical ideas in the last 50 years. Here, they break it down in easy-to-understand terms. Roadmap: Data InfrastructureA wave of startups is enabling the next generation of data-driven businesses. They offer better and easier-to-use infrastructure for accessing, analyzing, and furthering the use of data. Here's how the team at Bessemer Venture Partners thinks about this space and the key trends that are driving innovation. Sponsored LinkA/B Test Pitfalls and How to Avoid ThemA/B tests can measurably improve our product and create a better experience for the majority of our users. But, if we’re not careful, they can keep us chasing product changes that do almost nothing at all. Tutorials, Projects & OpinionsAgainst SQLSQL was initially developed 50 years ago and in spite of its flaws, it's become the universal language for data. This post explores SQL's many imperfections and what an ideal replacement might look like. There's a lot to consider here and, not surprisingly, the post stirred up a lot of discussion around the web. Also, see the post on Hacker
News. Self-serve is a feelingSelf-serve analytics tools can empower business users to answer their own questions but what does "self-serve" mean exactly? And how is it different for different users? Understanding that is key to building tools that are actually useful. Enter, the self-serve Turing Test... Causal Inference in the Wild: Elasticity PricingThere are a lot of good articles about the theory of causal inference but not many examples of real-world applications. In this tutorial, Lars Roemheld shows how machine learning can improve causal inference in industry using elasticity pricing. This assumes you already have some understanding of causal inference but overall, it's pretty approachable. Parameterizing and automating Jupyter notebooksThere are commonly used hacks for changing Jupyter notebook parameters but a tool like papermill offers a much cleaner and reproducible approach. This tutorial starts with an introduction to papermill, walks through an example of parameterization, and discusses ways to fully schedule and automate notebook execution using cron. Democratize data across your organization with semantic layerJoin this webinar panel for practical advice from data and analytics leaders at Verizon, Snowflake, EverQuote, and Affinity Federal Credit Union on how to build and foster a data literate, self-service analysis culture at scale using a semantic layer. Register here. Code & ToolsTuplex: Blazing Fast Python Data ScienceTuplex is a data processing framework that runs Python data science pipelines at the speed of compiled code. It works interactively in the Python toplevel, integrates with Jupyter Notebooks, and provides familiar APIs. Follow the links for downloads and recent papers. Risk Assessment of GitHub CopilotIt's been a couple weeks since GitHub announced their AI-powered auto-code tool called "Copilot." The demos look amazing but there are a lot of questions. This post dives into the issues and takes a hands-on approach to explore Copilots strengths — and where it could get you into trouble. ResourcesConcise and beautiful algorithms written in JuliaThis collection of Julia examples covers a wide variety of algorithms for machine learning, optimization, reinforcement learning, decision-making, and more. These aren't for production use but are clean examples to learn from. All implementations are working and self-contained, with tests. ![]() Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email. To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >> |