— In the News —
More and more, decisions are made by algorithms that have been carefully crafted to act on their own. It's one thing when the algorithms are optimizing ads but what about when they're making decisions regarding things like loan approvals, crime prevention, and medical diagnoses? Interesting read.
DJ Patil and Ruslan Belkin recently kicked off First Round's CTO Summit and offered their most salient lessons-learned related to building and launching data products. This is a fantastic recap, including a well-reasoned, pre-launch checklist. Regardless of the domain you're working in, if you're working with data, this is a MUST READ.
Amazon just announced a new AWS service for machine learning that will let users make real-time predictions at scale. It looks fairly capable and apparently, is designed for users that aren't necessarily machine learning experts. This post in Amazon's AWS blog offers a good overview of what you can do with this new service and how to get started.
— Tools and Techniques —
Nice collection of Python utilities that are conspicuously missing from the standard library. This collection is very well documented and is gaining traction quickly. Highly recommended.
If you're interested in learning how to use Python for machine learning, this looks like a great place to start. Kevin Markham is hosting a video series about Python's scikit-learn on Kaggle's no free hunch blog. This week's video is just 10 minutes and the blog post includes supporting material and links.
— Resources —
Free download of the popular and highly regarded Introduction to Statistical Learning. This book covers a broad range of topics in statistical learning methods and is not heavy in mathematical theory. Advanced readers should check out The Elements of Statistical Learning: Data Mining, Inference, and Prediction, which is also free from the authors' website. If you prefer print, both are available from Amazon where they have very strong reviews.
— Data Viz —
This visualization by Martin Bellander caused quite a stir on the social networks this week. It's based on 94,526 paintings and shows a clear trend towards blue over the past 100 years. This is a great post that details his data extraction and analysis process using R.
The ComplexHeatmap package (R) provides a flexible way to arrange multiple heatmaps while supporting self-defined annotation graphics. This is a good writeup of its features and how to work with it.
Gestalt Principles for Data Visualization
Interesting series by Elijah Meeks. There's a lot to think about here and the series is very well organized and presented. Highly recommended.
— Archive Pick —
Focused on workflows and development tools, this is a great tutorial for getting started with Python - by Benjamin Bengfort.