— In the News —
Without public approval, advances in how we use data will stall. That is why a regulator's ruling against the operator of three London hospitals is about more than mishandling records from 1.6 million patients. It is a missed opportunity to have a conversation with the public about appropriate uses for their data...
The search for Malaysia Airlines flight MH370 involved the collection and analysis of large volumes of marine data from the southern Indian Ocean. This is an amazing presentation of that data, including details about how it was collected, how it's useful, and what's next. This is especially for readers that are interested in the sciences.
A how-to guide for data science - as told via xkcd comic strips!
— Sponsored Link —
Are you data curious? An aspiring data scientist? On 9/27, join Metis for a free, online event featuring 25+ incredible speakers from the data science field, including Kirk Borne, Deborah Berebichez, Chris Albon, Rachel Thomas, Bob Hayes, Lillian Pierson. Speakers will demystify data science and discuss the training, tools, and career path to one of the world's hottest jobs.
— Tools and Techniques —
This detailed post explores the mathematics of decision trees, along with techniques for dramatically improving their prediction accuracy.
This new tool from the University of Washington automatically checks if SQL queries are semantically equivalent. It's a "SQL Solver" and it enables you to easily verify the correctness of SQL rewrite rules, find errors in buggy SQL rewrites, develop SQL optimizers, etc. This article is a good overview of how Cosette works and how it's useful.
Real-time databases make it easy to implement reactive applications because they keep your critical information up-to-date. But how do they work and how do they scale? This is a great exploration of the options, including details about the architectures of a variety of real-time databases, their limitations, and how Baqend ultimately created their own solution.
— Deep Learning —
This post by Sebastian Ruder is a well-considered collection of best practices for using neural networks in Natural Language Processing.
In this post, Anish Athalye shows how small, carefully-crafted perturbations to inputs can cause neural networks to misclassify inputs in arbitrarily chosen ways. Apparently, this is 100% effective. Includes a Jupyter notebook so you can easily play with the inputs yourself.
— Data Viz —
This is an awesome tutorial about the nuances of histograms.
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...