— In the News —
Three years ago, IBM started selling Watson for Oncology as a tool for recommending the best cancer treatments to doctors around the world. This longread explores the technology, the hype, and the challenges of innovating with data in healthcare. Even if the technology worked perfectly, the Watson for Oncology story is complicated.
— Tools and Techniques —
Tom Augspurger's latest writing project is a series about scalable machine learning. This first part covers what to do when your training dataset is small enough to fit in memory but you need to predict on a much larger dataset. Tom is a very clear writer and if this becomes anything like his Modern Pandas series, you'll definitely want to watch this develop.
This is a great post in Uber's engineering blog about their internal ML-as-a-service platform. Covers the problems it solves, system architecture, and the design decisions that enable their machine learning workflow to be scalable, reliable, reproducible, easy-to-use, and automated.
One way to deal with inadequate computing power is to export your work into the cloud. But there's a hassle factor with configuring resources, which can be time consuming and costly. This tutorial shows how to automate much of the process for configuring new VMs on AWS so you can minimize the time spent on trivial setup tasks.
— Data Viz —
This online book is a practical introduction to data visualization using R and ggplot2. This is well-organized and covers a lot of ground. Topics include basic principles of data visualization, key things to avoid, and how to create a wide variety of plot types.
Here's a hands-on tutorial for creating visualizations that look similar to those published by FiveThirtyEight. Uses Python’s matplotlib and pandas.
— Career —
Train for your next data science interview with 1000+ real interview questions.
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...