— In the News —
Jeffrey Hammerbacher is a number cruncher — a Harvard math major who went from a job as a Wall Street quant to a key role at Facebook to a founder of a successful data start-up. Then he was given a medical diagnosis that changed everything. Great read.
Everyone tells early stage startups to use data for big strategic decisions. But does that really work, and whatever happened to vision?
— Tools and Techniques —
Nice hack for reverse engineering the original values from a png image using R.
An autoencoder is an unsupervised learning technique that uses a neural network to produce a low dimensional representation of a high dimensional input. This is high-level but is a good overview of how autoencoders work.
— Resources —
Interested in building a system that's optimized for deep learning? This article explains what you need and why. It's comprehensive with suggestions for RAM, CPUs, GPUs, cache, drives, etc, etc. Think you have what you need already? Read this to be sure.
It isn’t long but this is a great slide deck by Wes McKinney that outlines key tools and libraries for working with data. Wes started the Python pandas project and is the author of Python for Data Analysis. He knows this space well. There’s a bit of everything here including an essential Python stack, tools for R, Jupyter, Spark, and industry trends.
— Data Viz —
If you ever have a document with time-oriented data or events that you'd like to display on a timeline, this is a great new tool to know about. TimeLineCurator automatically extracts temporal references from freeform text and generates a timeline that you can easily edit and export to TimelineJS. This is very well laid out with lots of example timelines to play with.
Nice collection of data journalism interactives by the New York Times and Guardian. This is well organized and a great source of ideas.
— Archive Pick —
Kaggle ran a competition in 2014 to predict how Galaxy Zoo users would classify images from the Sloan Digital Sky Survey. Sander Dieleman wrote this very readable explanation of his winning solution, which is based around convolutional neural networks. With links to code, forums and other resources, this is worth spending some time with.
— About —
Data Elixir is curated and maintained by @lonriesberg. If you find this newsletter worthwhile, please help spread the word! Forward to your colleagues or Click here to Tweet. Thanks.