— In the News —
Scientists working in a little-known branch of psychology called perceptual learning have been developing techniques to help people intuitively find patterns in large amounts of data. This is a fascinating read about an aspect of Big Data that doesn't get a lot of attention.
— Tools and Techniques —
Researchers recruited 61 analysts and asked them specific questions about the same set of data. What happened next suggests some worthwhile approaches for both small and large teams.
Continuation of a three part series that explores the massive landscape of tools that are available to data scientists. This week's focus: Data Wrangling - the necessary dirty work that's said to take up to 80% of data scientists' time.
A number of researchers have recently shown that Convolutional Networks can be easily fooled. This article is a good discussion of what's really going on and offers suggestions for moving forward.
Sentiment analysis is a type of Natural Language Processing that extracts subjective information, such as emotion, from text. This tutorial by District Data Labs provides a great overview of current strategies and how to work with two commonly used libraries: Word2Vec and Doc2Vec. Highly recommended.
Much of the analysis that used to be done with a traditional GIS can now be done in R. This is a good tutorial for getting started, using the raster and rasterVis packages by Oscar Perpiñán.
— Resources —
This is a well-organized and comprehensive introduction to doing statistical analysis with Python. It's clearly written and includes lots of sample code and screenshots. Definitely worth bookmarking.
— Data Viz —
Nice cheatsheet for data visualization with ggplot2 (pdf).
Thoughtful essay by Fernanda Viégas and Martin Wattenberg regarding data visualization design and the importance of critique. I like articles that make me think and I found myself coming back to this one a few times this week. It begins with Edward Tufte's redesign of the charts that were used in the decision process that led to the explosion of the Space Shuttle Challenger in 1987. Several great examples follow and the essay concludes with suggested "Rules of Engagement."
— Archive Pick —
This might not be the most rigorous research project but it sure is a good idea! Sigmond Axel looks for correlations between two datasets that have been collected over the past century: Missing Persons Reports and UFO Sightings. This is a great read that describes his entire process and some of the most interesting findings. Even better than how he handled the missing people that turned up after a couple years, he made his cleaned data available on GitHub >>
— Career —
Interviewing is a two-way street. If you're looking for data science work, here are 3 key things to consider in a potential new employer.
— About —
Data Elixir is curated and maintained by @lonriesberg. If you find this newsletter worthwhile, please help spread the word! Forward to your colleagues or share to your favorite network: