— Insight —
Great discussion with Hilary Mason about the current state of data science, lessons learned, and practical ways that businesses can develop useful products from machine learning technologies.
In this post, Dimitri Masin walks through the initial vision for Monzo's data team that helped them grow "a world class" data organization from zero to over 30 people in just three years. Includes an overview of their tech stack and team roles.
— Tools and Techniques —
Amundsen is a metadata driven application that indexes data resources and enables a search system that's based on usage patterns. Think of it as a Google-like search for data. It was designed and built at Lyft and was recently released as an open-source project. This post describes how it works, the road ahead, and how to get started.
SQL may be old and maybe isn't so "cool" to use these days but SQL has an important role in modern data pipelines. In this post, Florents Tselai makes a strong case for SQL and why it's important for data science.
So far, every stack in this Hacker News discussion is different!
This half hour presentation by Alexander Lew is an awesome introduction to probabilistic programming. It's "programming" but the vehicle for the presentation is a simple data cleaning task that clearly shows how probabilistic programming works and how it's useful.
— Data Viz —
This open-access book by Nicolas P. Rougier introduces openGL and shows how to create fast, scalable and beautiful visualizations using Python. It's easy to follow and includes lots of code examples and images along the way.
Great tutorial by Mike Bostock that shows how animated bar chart races work. This is also a nice example of a well-crafted Observable notebook that includes clear comments and is and easy to tinker with. Or fork it and swap in your own data.