— In the News —
Ever wonder how music recommendation systems really work?
Geoff Hinton, one of the godfathers of deep learning and neural network research, did a fascinating Ask Me Anything on Reddit last week. Read the highlights in this article or see the entire AMA here -->
Brilliant! This is a very well done video about a team from Harvard that's using courtside cameras to extract data from basketball. It's adapted from Faster, Higher, Stronger: How Sports Science Is Creating a New Generation of Superathletes by Mark McClusky. The video has high production values and it's just 4 minutes. Check it out!
— Tools and Techniques —
Interested in mining documents? Overview is a tool for reading and analyzing thousands of documents super quickly. It offers full text search, topic modeling, coding, tagging, visualizations and more. Several investigative stories have been accomplished using this tool, including one that won a Pulitzer. There's a lot to explore here and it's free to create your own account.
Yah! A plotting library for D3. MetricsGraphics.js is optimized for visualizing time-series data and currently supports line charts, scatterplots and histograms as well as features like rug plots and basic linear regression. I'd like to see more interactive options but it looks like this library will continue to evolve and is definitely worth watching.
Here's a repository of small datasets - for those times when you don't want to manage something big. It's intended to help with rapid prototyping but I can imagine a lot of uses for something like this. Currently, you'll find data under broad headings like "animals," "governments," "geographies," "plants," and "words." The data has been cleaned and is formatted as JSON - looks super useful.
FnordMetric allows you to write SQL queries that return SVG charts rather than tables. Turning a query result into a chart is literally one line of code.
— Resources —
Nice collection of tutorials, courses, Meetups, and books covering Data Science.
— Data Viz —
Maps can be super useful to give data context, make sense of events that are moving geographically, and to tell stories. Here are some of the best free tools.
Seaborn is a Python visualization library based on matplotlib and is intended to help create great looking statistical graphics. A new version was recently released that includes support for heatmaps and clustering algorithms. There's a nice gallery of examples here with sample code that make it easy to get started.