— In the News —
Google has a massive impact on the tools, applications and research that help steer the data science community. As with prior years, this retrospective by Jeff Dean is amazing in its scope. Includes useful summaries, screenshots, videos and linked references throughout.
When Tim Berners-Lee helped create the World Wide Web, there was no way to know how far it would go. Now, three decades later, the big tech companies control vast troves of data and have become surveillance platforms and gatekeepers of innovation. Here's how his new startup aims to fix that.
— Tools and Techniques —
Eugene Yan's latest article explores how real-time machine learning looks in practice, especially regarding recommendations. When does real-time recommendation make sense? When does it not? How should you approach an MVP? This is an awesome article that's easy to follow.
Especially if you're new, the suggestions here may surprise you.
The Cerberus library provides data validation functions for Python and is designed to be simple to use and extensible. This introduction walks through some examples for getting started.
— Resources —
The Data is Plural newsletter by Jeremy Singer-Vine is a great resource for learning about obscure datasets but the archives have been hard to browse. Not anymore. This new search page by Amelia Wattenberger makes it easy to search the Archives using keywords and topics.
This popular text by Kevin P. Murphy has gotten a big upgrade this year. The updated book was expected to be over 1600 pages so to make it manageable, it's been split into 2 volumes. This is the electronic version of Volume 1. The second volume, Advanced Topics, is planned for 2022.