In the News
The Computers Are Listening - How The NSA Converts Spoken Words Into Searchable Text
Though perfect transcription of natural conversation apparently remains the Intelligence Community’s "holy grail," the Snowden documents describe extensive use of keyword searching as well as computer programs designed to analyze and “extract” the content of voice conversations, and even use sophisticated algorithms to flag conversations of interest... This isn't a technical read but is a good run-down of the progress and challenges facing the intelligence community.
How Not to Drown in Numbers
There is a special sauce necessary to making big data work: surveys and the judgment of humans — two seemingly old-fashioned approaches that we will call small data... This is an insightful article by two data scientists, one from Facebook and one from Google, about how to find meaning in the never-ending deluge of big data. This is short and very worthwhile.
Artificial Intelligence is the Next Big Thing for Hedge Funds Seeking an Edge
Computer-assisted trading is nothing new but machine learning is radically changing the game. Here's how that's going.
Courts Docs Show How Google Slices Users Into “Millions Of Buckets”
Google watches most everything you do and say online. It reads your email, peers over your shoulder while you browse, knows what you watch on YouTube, and — by tracking your devices — even knows where you are at this very moment. And that's just the beginning. By using advanced statistics and data mining techniques, Google knows far more about you than you likely ever imagined. Fascinating read.

Sponsored Link

Check out this lineup!
This two-day event in San Francisco will begin with a day of presentations by Data Science experts such as UC Berkeley Professors Mike Franklin and Mike Jordan, Stanford Professor Rob Tibshirani, UW Professor Carlos Guestrin, CMU Professor Alex Smola, and more. See the complete list of speakers here.
The second day, 7/21/15, will be the 4th year of the Dato conference (Previously referred to as The GraphLab Conference), an event focused around machine learning implementation with GraphLab Create, including data science tutorials and case studies.
Data Elixir readers can save 35% by using "DataElixir" as the discount code during registration! Offer expires on June 1st. Register Now and Save!

Tools and Techniques
Rendering Notebooks on GitHub
Jupyter/IPython notebook (.ipynb) now render directly on GitHub. These notebooks make it easy to capture data-driven workflows that combine code, equations, text and visualizations and share them with others. Having them render directly on GitHub is huge.
Emojineering - Discovering The Hidden Semantics Of Emoji
👍😍The Instagram Engineering team posted two great articles this week about how they handle Emoji characters. These are great posts that describe how Instagram uses machine learning and natural language processing to discover the hidden meanings in this relatively new, visual language. Even if you don't think you're interested in machine learning, these are MUST READS! Really well-written at an easy-to-understand level:
Large-Scale Social Phenomena - Data Mining Demo
Great text data mining tutorial contained in a set of IPython notebooks. This is self-contained with data and is very well-documented. It demonstrates the basic functionality of several libraries, including Numpy, Scipy, Matplotlib, Scikit-learn, Pandas, Networkx, and NLTK. Fun little evening project.
Predicting a Billboard Music Hit with Shazam Data
Fascinating 10 minute presentation about insights that Shazam derives from its users' data. Highly recommended.
Clusterize.js
Nice, small, JavaScript library that makes it easy to display large datasets. This was just recently released and has 2300 stars already on GitHub. It's well documented and includes lots of examples.

Resources

NASA's Data Portal
NASA's data portal includes thousands of datasets, has well-documented APIs, and is easily searchable. If you're interested in the sciences, this is a highly recommended resource.
Automate the Boring Stuff with Python
This is a free HTML version of a new Python book that's been getting rave reviews. It's billed as a guide to "practical programming for total beginners" so you won't find things like Numpy and Scipy here but you will find practical stuff like web scraping, parsing PDFs and Word docs, updating spreadsheets, working with email, scheduling tasks, pattern matching, and manipulating images.

Data Viz

What Killed The Infographic?
If you've seen fewer experimental data visualizations lately, it's only because the medium has grown up and gotten a job... This is a super interesting article that has generated quite a buzz around the web this week. There's some great viz work here too. This is a MUST READ for everyone interested in the field of data visualization and where it's going.
Best Of The Visualisation Web
And for insights into how the above article translates to the real world, check out this monthly roundup by Andy Kirk of the best data visualization work around the web. There's a lot here to learn from and be inspired by. Highly recommended.
To the Point: 7 Reasons You Should Use Dot Graphs
Points go beyond where lines and bar graphs stop. Here are 7 good reasons to consider using points instead of lines.
Data Elixir is curated and maintained by @lonriesberg. If you find this newsletter worthwhile, please help spread the word! Forward to your colleagues or use the links below to share to your favorite network:
Thanks!