Tools and Techniques
Docker containers are standalone packages that can be used to make applications completely independent of the infrastructure they're running on. They're widely used by engineers and increasingly, data scientists are finding them useful too. Here's everything you need to know to get started with Docker, including how Docker is useful for data science, definitions, common commands, how to create containers, etc. This is a must-read tutorial.
Bias can be a big problem in machine learning and as Kate Crawford argues in this keynote from NIPS, treating bias as a technical problem misses the point.
Sure, big data is useful but when was the last time you heard the story of a great innovation begin with, “I built a killer spreadsheet?” Sometimes, it's the small data that ultimately matters.
This latest post by Jake VanderPlas is a super fun read and there's a lot to learn here along the way. Covers Markov Chains, distributions, matrices, entropy, and some visualizations too.
Great tutorial that shows how to build and train a simple neural net to solve CAPTCHAs automatically. This is easy to follow and includes a repo of code, training data, and linked references.
Luke Oakden-Rayner is a radiologist with a specific interest in applying machine learning to medical images and when he dug deep into a dataset that's being used to train medical systems, he discovered a few problems. This is a fantastic deep dive that ultimately reveals the importance of having domain experts on your data science team.
Great roundup of important Deep Learning advancements for NLP in 2017. Includes summaries and links for each of the tools and techniques that are mentioned here. If you follow this space, it's a worthwhile reference and if you don't follow this space, it's a nice overview of what's been going on.