— In the News —
Should prison sentences be based on crimes that haven’t been committed yet? Excellent article that explores some profound impacts that data has on society.
Deep learning is good at taking dictation and recognizing images. Can it also master human language? This is a great read that covers the current challenges and progress being made.
Publishing reproducible data analysis is an expectation in many domains and is growing in popularity. Here's a good overview of what that means exactly and how one news team is accomplishing it.
— Sponsored Link —
Only Localytics brings app marketing and analytics together. If you either have or are thinking about an app of your own, this set of templates will help you define your app feature roadmap, now and over time.
— Tools and Techniques —
Interested in what's in store for Python data tools? These tools were recently identified by data science instructors at Galvanize as the ones to know.
Although Generalized Additive Models (GAM) aren't widely used by the data science community, they are powerful and simple to use. Here's a great overview of what GAMs are, how to use them, and how they compare to other techniques.
The machine learning APIs that made it to this top-10 list offer a wide range of capabilities including image tagging, face recognition, document classification, speech recognition, predictive modeling, sentiment analysis, and pattern recognition. This is a great list for anyone interested in developing machine learning applications on the web.
Getting Started with Spark
Spark is all the rage these days for working with large, distributed datasets. These two articles provide a great background for understanding when Spark is worthwhile and how to get started:
— Resources —
Definitely bookmark this one! This is a curated collection of computer science projects, including associated research papers on arXiv, GitHub repos, and related links. There are a wide variety of categories, including things like machine learning, genetic algorithms, deep learning, natural language processing, robotics, visualization, and many more.
Here's a great collection of graph and network datasets. There are hundreds of datasets here in a variety of domains that should be useful for research and/or benchmarking. This site also includes interactive analytics that enable users to visualize the statistical characteristics and structure of the networks.
— Data Viz —
Nice Python data visualization tutorial. This is a detailed walk-through with lots of code snippets and linked references.
Interesting project that uses existing R packages to replicate the data visualization practices developed by Edward Tufte.