Tools and Techniques
In a 2014 paper, Hadley Wickham showed how to organize datasets so that they're easy to manipulate, model, and visualize. He called this "Tidy Data" and used R to show how to wrangle datasets into a standard format. In this post, Jean-Nicholas Hould shows how to make Tidy Data using Python.
Part 2 of a hands-on introduction to Pandas, a Python package that makes it easy to import and analyze data.
Here's how data scientists at Pachyderm created a pipeline to annotate and analyze the recent World Chess Championship. This is very well done and includes their final game analyses, linked references, and a Github repo with their code.
Awesome tutorial! Demonstrates a basic workflow for building prediction models with R. Starts with how the data is structured and continues with feature selection, handling missing values, training, and algorithm comparison.