— In the News —
The SS Central America, a steamer carrying a huge cache of gold, sank off the southeast coast of the United States in 1857. Part mystery, part adventure story, this 20 minute video recounts the tale of a group who used Bayesian theory to find the ship and its gold.
Classifying a painting by artist and style is tricky for humans; spotting the links between different artists and styles is harder still. So it should be impossible for machines... right?
— Sponsored Link —
This two-day event in San Francisco will begin with a day of presentations by Data Science experts such as UC Berkeley Professors Mike Franklin and Mike Jordan, Stanford Professor Rob Tibshirani, UW Professor Carlos Guestrin, CMU Professor Alex Smola, and more. See the complete list of speakers here.
The second day, 7/21/15, will be the 4th year of the Dato conference (Previously referred to as The GraphLab Conference), an event focused around machine learning implementation with GraphLab Create, including data science tutorials and case studies.
Data Elixir readers can save 35% by using "DataElixir" as the discount code during registration! Offer expires on June 1st. Register Now and Save!
— Tools and Techniques —
This infographic that pits R against Python has gotten a lot of attention around the web this week. There's a lot of worthwhile information here and it's super easy to follow the presentation. This is a great contribution to the R vs Python debate. Highly recommended.
Data scientists typically don't have much, if any, formal computer science or software development experience. That may be fine for solo analysis work but can lead to problems when trying to collaborate. This is a great run-down by Trey Causey of what data scientists really need to know about writing code for quality, understandability, and reusability. It's a short list of essential practices and is a MUST READ.
Deep neural network implementation without the learning cliff! This library implements multi-layer perceptrons as a wrapper for the powerful pylearn2 library that's compatible with scikit-learn for a more user-friendly and Pythonic interface. Includes automated tests, benchmarks, a visualization, examples, and good documentation. This has gotten a lot of attention quickly.
— Resources —
Great curation of useful R packages, including descriptions, sample commands and links to tutorials and GitHub repos. This is a great resource for R users.
Andrew Ng's Machine Learning MOOC is an infamous introduction to the topic. Here's a fantastic set of notes to go along with the course by a PhD student named Alex Holehouse. If you've been interested in taking the course but haven't been able to fit it into your schedule, now's a great time to think about it. Along with help from Alex's notes, the course was recently opened up as on-demand course on Coursera and is entirely self-paced.
— Data Viz —
This is an interesting look at an evolving and rapidly growing trend in the art world. Data is the new medium and along with enabling an innate desire of self-expression, artists are helping scientists see their research from new angles. Great read, with lots of linked references.
Along with being an AI researcher at the University of Pennsylvania, Randy Olson leads the largest online community dedicated to data visualization on Reddit, /r/DataIsBeautiful, which serves over 2,000,000 unique monthly visitors. This isn't a long AMA, but there are some great answers here, which has inspired over 8600 views so far. You'll find tips for getting started with data viz, common mistakes to avoid, and lots of worthwhile links.
Anyone interested in market analysis will definitely want to check out this new D3 library. Components include Bollinger bands, candlestick series, moving averages, sparklines, relative strength indicators, etc, etc. This is well documented with examples. It's great to see libraries like this being built with D3. This is super useful!