— In the News —
I missed this story when it came out in September but it's still being talked about and was recently shared on NPR. This is an intriguing article about a niche new industry that answers business questions with creative analytics - and data that's created on the fly.
— Sponsored Link —
Springboard’s Data Science Career Track is designed to get you hired in data science. Master the data science process by working on real capstone projects. You’ll benefit from 1-on-1 mentorship from a data science expert, and personalized career coaching. Recently, Springboard’s students have been placed at senior data science roles at Kaiser Permanente, Nielsen, Ford, the Federal Reserve Bank of Richmond, and more. Get a data science job or your money back.
— Tools and Techniques —
Docker containers are standalone packages that can be used to make applications completely independent of the infrastructure they're running on. They're widely used by engineers and increasingly, data scientists are finding them useful too. Here's everything you need to know to get started with Docker, including how Docker is useful for data science, definitions, common commands, how to create containers, etc. This is a must-read tutorial.
Bias can be a big problem in machine learning and as Kate Crawford argues in this keynote from NIPS, treating bias as a technical problem misses the point.
Sure, big data is useful but when was the last time you heard the story of a great innovation begin with, “I built a killer spreadsheet?” Sometimes, it's the small data that ultimately matters.
This latest post by Jake VanderPlas is a super fun read and there's a lot to learn here along the way. Covers Markov Chains, distributions, matrices, entropy, and some visualizations too.
Great tutorial that shows how to build and train a simple neural net to solve CAPTCHAs automatically. This is easy to follow and includes a repo of code, training data, and linked references.
Luke Oakden-Rayner is a radiologist with a specific interest in applying machine learning to medical images and when he dug deep into a dataset that's being used to train medical systems, he discovered a few problems. This is a fantastic deep dive that ultimately reveals the importance of having domain experts on your data science team.
Great roundup of important Deep Learning advancements for NLP in 2017. Includes summaries and links for each of the tools and techniques that are mentioned here. If you follow this space, it's a worthwhile reference and if you don't follow this space, it's a nice overview of what's been going on.
— Data Viz —
The rainbow color scale is notoriously ineffective. Research on the topic goes back more than 20 years. It's not a matter of taste. It's science. This article explores the issues and shows why - and how - to choose a better color scale for your particular needs. If you're not convinced by the end of the article, definitely watch the video.
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...
— About —
Data Elixir is curated and maintained by @lonriesberg. If some awesome person forwarded this issue to you, subscribe for free at dataelixir.com and get it delivered every week.