— In the News —
A huge project unexpectedly led to a way of finding disease genes without needing to know about diseases (image by Lollyknit).
New York Times op-ed about the challenges inherent in data-driven stories.
— Sponsored Link —
The New Relic Platform lets you collect and display massive amounts of real-time data in a single framework. Every team in your organization can make critical decisions about your software, your customers, and your business—fast.
Create a free account with New Relic and get this swanky shirt for FREE!
— Tools and Techniques —
Great overview of key streaming algorithms and how to work with them. This article was adapted by a talk given by Ted Dunning, the Chief Applications Architect for MapR.
Compared to centroid-based clustering like K-Means, density-based clustering works by identifying "dense" clusters of points, allowing it to learn clusters of arbitrary shape and identify outliers in the data. This is a good overview of popular density-based clustering algorithms with lots of sample code and screenshots.
If you've ever comparison shopped on Craigslist, you'll appreciate this. This tutorial shows how to build a Craigslist scraper and analyze the results. Web scraping isn't the most robust approach for finding things on Craigslist but it's a nice tutorial and should work for personal use.
— Resources —
Data Science MOOCs
It's MOOC season! Here's a nice overview of each of the best courses related to data science, including pre-reqs, dates, and expected workloads. It's also worth checking out these new specializations from Coursera:
— Data Viz —
Great interview with Bryan Van de Ven, the project maintainer for Bokeh. In this podcast, Brian talks about Bokeh's history, some interesting use cases for it, what its near future looks like, and how it compares to other visualization libraries.
— Byte Sized —
The Knight News Challenge on Data is now open for applications, asking people to submit ideas that address the question: How might we make data work for individuals and communities? A collaboration between Knight, Data & Society and the Open Society Foundations, the challenge is offering a pool of $3 million for innovative ideas.