— In the News —
Machine learning startups are fascinating to track and there sure are a lot of them. Crunchbase lists over 5,000 startups that rely on machine learning and most have had fewer than two funding rounds. The 25 startups highlighted in this Forbes article cover a wide variety of applications with substantial market potential.
In HBO's Silicon Valley, the Weismann Score is a measurement of how close to ideal a compression algorithm is. It’s a fictional measurement but in reality, what it suggests is said to be the Holy Grail of the compression world. In this article, Tsachy Weissman, Stanford professor and creator of the Weismann Score, talks about real-world data compression.
— Sponsored Link —
Mode Studio combines a SQL editor, Python & R notebooks, and a visualization builder in one platform. And it's free forever. Connect data from anywhere and analyze with the best language for the job, without having to jump between tools. Build custom visualizations or use our out-of-the-box charts. Share your analysis with a click—every report lives at a URL.
— Tools and Techniques —
JupyterCon inspired a lot of activity around the web last week. In this post, Will Crichton highlights three major trends from the conference.
This second part of the Netflix "Notebook Innovation" series describes how the Netflix Data Platform uses notebooks to schedule and execute jobs. Includes rationale for using notebooks as an integral part of their data pipeline, challenges, key utilities, workflows, and discussion of how it all fits together.
There's definitely a lot going on with Jupyter these days but as a counterpoint to all the enthusiasm, this is a great slide deck by Joel Grus that shows the bad side of Jupyter notebooks and why you should avoid using them.
Deep learning has made enormous strides in NLP, but state-of-the-art models are still spurious and brittle. In this article, Ana Marasović explores the issues and offers suggestions for how they might be solved.
This step by step tutorial is a great walk-through of a practical application for text mining and sentiment analysis. Includes lots of code snippets and visualizations along the way.
— Data Viz —
After finishing “The Visual Display of Quantitative Information” by Edward Tufte, the Data Vis Book Club that's being sponsored by Datawrapper is dividing into two tracks. The track for novices will start with Alberto Cairo’s “The Truthful Art” and the deep divers track will read Tamara Munzner’s “Visualization Analysis & Design.” Check out the announcement for details.
— Career —
Product Management isn't a "data science" role but it's key for defining what your teams should be building. In this post, Clemens Mewald from Google explores what product managers do and what, exactly, they need to know for working on machine learning products. This is really a post to help people who are already working as product managers but there are useful insights here for applying data science skills to a very different career path.