— In the News —
A group of researchers at the University of California, Berkeley have made some startling advancements recently. They've built a robot that learns how to do a variety of tasks that previously took months of programming to accomplish. This is a nice rundown of what they're up to with realtime videos for the non-believers.
An article that had been published in Science magazine was recently retracted by one of its authors because apparently, the data was contrived. This has been a hot topic in the news this week because the article had groundbreaking implications and was becoming influential. This article from FiveThirtyEight is a great overview of the story and its implications.
Organizations that have managed to build data science teams are discovering unexpected limitations of human-powered data science: it’s not scalable. This is a good overview of the issues and potential solutions which, ironically, will require data scientists to implement.
Ben Porterfield leads a group of engineers that are helping companies make better decisions. In this interview, Porterfield explains the importance of an analytics infrastructure and shares his wisdom on where to store data, the best tools to use, common mistakes to dodge, and what you should measure to start making the right moves today.
— Tools and Techniques —
Great article exploring the NBA's SportVU motion tracking data. "We're spoiled for choice, with literally hundreds of thousands of data points, per game, begging to de dissected and analyzed." There are LOTS of possibilities for this data and this article is a great inspiration for digging deeper. Highly Recommended.
Well-written article describing the Mean Shift clustering algorithm. The diagrams, code snippets, and comparison with other techniques make it easy to understand how this algorithm works.
— Resources —
It's MOOC season! This is a nice overview of upcoming data-related courses from edX, Udacity, and Coursera. If you don't have time for MOOCs, this post also includes a list of data science journals that are worth paying attention to.
Statistics Done Wrong is a guide to the most common statistical errors and slip-ups committed by scientists every day. It assumes no prior knowledge of statistics, so "you can read it before your first statistics course or after thirty years of scientific practice."
Simple and comprehensive search engine for R packages. This is a newly released "alpha" version but very well done and worth using. The code behind this is available on Github and contributions are welcome so expect this resource to keep getting better.
— Data Viz —
Brilliant! A key objective of data visualization is to enable and motivate people to engage with the data. In this piece from the New York Times, readers must think about the data before they're even able to access it. This is an interesting approach that works really well.
This is my favorite Data Stories episode yet! Miriah Meyer discusses her work building interactive data analysis and discovery tools for scientists. If you're interested in building exploratory tools and/or working with scientists, this is a MUST LISTEN to episode! The links in the show notes are also highly recommended.