ISSUE 318 · January 12, 2021In the NewsGoogle Research: Looking Back at 2020, and Forward to 2021Google has a massive impact on the tools, applications and research that help steer the data science community. As with prior years, this retrospective by Jeff Dean is amazing in its scope. Includes useful summaries, screenshots, videos and linked references throughout. He Created the Web. Now He’s Out to Remake the Digital World.When Tim Berners-Lee helped create the World Wide Web, there was no way to know how far it would go. Now, three decades later, the big tech companies control vast troves of data and have become surveillance platforms and gatekeepers of innovation. Here's how his new startup aims to fix that. Sponsored LinkOnline Data Science Programs at Drexel UniversityFind your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more. Tutorials, Projects & OpinionsReal-time Machine Learning For RecommendationsEugene Yan's latest article explores how real-time machine learning looks in practice, especially regarding recommendations. When does real-time recommendation make sense? When does it not? How should you approach an MVP? This is an awesome article that's easy to follow. What do you wish you knew before you deployed your first ML model?Especially if you're new, the suggestions here may surprise you. Code & ToolsValidating Data in Python with CerberusThe Cerberus library provides data validation functions for Python and is designed to be simple to use and extensible. This introduction walks through some examples for getting started. Simple GraphSimple Graph is a graph database for SQLite, inspired by the article "SQLite as a document database". ResourcesData is Plural archiveThe Data is Plural newsletter is a great resource for learning about obscure datasets but the archives have been hard to browse. Not anymore. This project by Amelia Wattenberger makes it easy to search the Archives using keywords and topics. Probabilistic Machine Learning: An IntroductionThis popular text by Kevin P. Murphy has gotten a big upgrade this year. The updated book was expected to be over 1600 pages so to make it manageable, it's been split into 2 volumes. This is the electronic version of Volume 1. The second volume, Advanced Topics, is planned for 2022. Project PickParler video GPS dataThe social media app called Parler has taken a lot of heat recently for its role in the planning of domestic terrorist attacks in the U.S. Between the App Stores and AWS, it's been shut down but not before nearly all of its data was downloaded via some hacking heroics. This repo contains GPS metadata and code for working with it. Also, see this thread >> Data Elixir is curated and maintained by Lon Riesberg. For full-text search of prior issues, visit Data Elixir's Search Page. If you have suggestions or questions for the newsletter, just reply back to this email. Sign up to get Data Elixir's data science newsletter in your Inbox >> |