— In the News —
Why is it that so many organizations make bad decisions, in spite of having access to enormous amounts of data? In this Ted Talk, Tricia Wang identifies the missing piece that she calls, "Thick Data." It's difficult to quantify but is vital for making Big Data work.
There's a lot of hype around AI these days but even so, it has a long way to go. Gary Marcus, an expert in human learning, has an idea to speed things along.
It turns out that it doesn't take much to identify specific users in anonymous browsing data. Using data that was easily obtained by online brokers, researchers were able to identify people and discover their entire month-long browsing history from an anonymous dataset. "These companies have lost their minds when it comes to data collection."
The dream, says a bioengineer, is to have cancer-relevant medical data flow unimpeded around the world, so that everyone, wherever they are, can see and use this information.
This is a great idea and would probably be super useful for advancing health care. But one of the problems, of course, is the idea that people will freely share health-related data in a time when pre-existing conditions might be used against them. Even "anonymizing" data wouldn't guarantee privacy. There are important goals here. And important issues that need to be figured out first.
— Tools and Techniques —
In spite of all the excitement, most organizations are not ready for AI. In this post, Monica Rogati explains why that is and offers a super clear path for creating the right foundation.
This tutorial shows how to design a complete machine learning pipeline, including the implementation decisions that you need to make if you want to do real work. There's code here but this isn't a copy/paste tutorial. It's intended to get you thinking.
This step by step tutorial shows how to build a sound recognition pipeline for detecting bats. Covers sound clip management, analysis, and a simple neural network. This isn't complicated and the ideas here could be easily extended.
fast.ai just released the second part of their popular deep learning course. This part covers a variety of cutting-edge topics including neural translation, generative models, memory networks, and applications for structured data and time series. This is a free online course and the pre-requisites for the series are fairly modest.
— Resources —
Essential Cheat Sheets for deep learning and machine learning. Includes cheatsheets for Tensorflow, Keras, Numpy, Scipy, Pandas, Scikit-learn, Matplotlib, etc, etc.
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...