— In the News —
Great post from the data engineering team at Slack that describes their data infrastructure and rationale for the decisions behind it.
This article inspired a lot of discussion around the web this week about the practical side of AI's potential.
Interested in developing AI? Check out these two new playgrounds:
— Sponsored Link —
Quickly & easily deploy your database with Compose. Free for 30 days.
— Tools and Techniques —
In a 2014 paper, Hadley Wickham showed how to organize datasets so that they're easy to manipulate, model, and visualize. He called this "Tidy Data" and used R to show how to wrangle datasets into a standard format. In this post, Jean-Nicholas Hould shows how to make Tidy Data using Python.
Part 2 of a hands-on introduction to Pandas, a Python package that makes it easy to import and analyze data.
Here's how data scientists at Pachyderm created a pipeline to annotate and analyze the recent World Chess Championship. This is very well done and includes their final game analyses, linked references, and a Github repo with their code.
Awesome tutorial! Demonstrates a basic workflow for building prediction models with R. Starts with how the data is structured and continues with feature selection, handling missing values, training, and algorithm comparison.
— Resources —
Andrew Ng, Chief Scientist of Baidu, is writing a book that will help you make practical decisions about building machine learning systems. Sign up for his mailing list to receive free draft chapters as they're completed.
— Podcasts —
data.world is a new platform for sharing, exploring, and collaborating with data. This is a great discussion with data.world's Chief Product Officer, Jon Loyens, about what this rapidly growing platform is all about and where it's going.
Causal Impact is a technique for estimating the impact of a particular event on a time series. This episode of Data Skeptic explores what it is and how it's useful.