— In the News —
Though perfect transcription of natural conversation apparently remains the Intelligence Community’s "holy grail," the Snowden documents describe extensive use of keyword searching as well as computer programs designed to analyze and “extract” the content of voice conversations, and even use sophisticated algorithms to flag conversations of interest... This isn't a technical read but is a good run-down of the progress and challenges facing the intelligence community.
There is a special sauce necessary to making big data work: surveys and the judgment of humans — two seemingly old-fashioned approaches that we will call small data... This is an insightful article by two data scientists, one from Facebook and one from Google, about how to find meaning in the never-ending deluge of big data. This is short and very worthwhile.
Computer-assisted trading is nothing new but machine learning is radically changing the game. Here's how that's going.
Google watches most everything you do and say online. It reads your email, peers over your shoulder while you browse, knows what you watch on YouTube, and — by tracking your devices — even knows where you are at this very moment. And that's just the beginning. By using advanced statistics and data mining techniques, Google knows far more about you than you likely ever imagined. Fascinating read.
— Sponsored Link —
This two-day event in San Francisco will begin with a day of presentations by Data Science experts such as UC Berkeley Professors Mike Franklin and Mike Jordan, Stanford Professor Rob Tibshirani, UW Professor Carlos Guestrin, CMU Professor Alex Smola, and more. See the complete list of speakers here.
The second day, 7/21/15, will be the 4th year of the Dato conference (Previously referred to as The GraphLab Conference), an event focused around machine learning implementation with GraphLab Create, including data science tutorials and case studies.
Data Elixir readers can save 35% by using "DataElixir" as the discount code during registration! Offer expires on June 1st. Register Now and Save!
— Tools and Techniques —
Jupyter/IPython notebook (.ipynb) now render directly on GitHub. These notebooks make it easy to capture data-driven workflows that combine code, equations, text and visualizations and share them with others. Having them render directly on GitHub is huge.
Emojineering - Discovering The Hidden Semantics Of Emoji
👍😍The Instagram Engineering team posted two great articles this week about how they handle Emoji characters. These are great posts that describe how Instagram uses machine learning and natural language processing to discover the hidden meanings in this relatively new, visual language. Even if you don't think you're interested in machine learning, these are MUST READS! Really well-written at an easy-to-understand level:
Great text data mining tutorial contained in a set of IPython notebooks. This is self-contained with data and is very well-documented. It demonstrates the basic functionality of several libraries, including Numpy, Scipy, Matplotlib, Scikit-learn, Pandas, Networkx, and NLTK. Fun little evening project.
Fascinating 10 minute presentation about insights that Shazam derives from its users' data. Highly recommended.
— Resources —
NASA's data portal includes thousands of datasets, has well-documented APIs, and is easily searchable. If you're interested in the sciences, this is a highly recommended resource.
This is a free HTML version of a new Python book that's been getting rave reviews. It's billed as a guide to "practical programming for total beginners" so you won't find things like Numpy and Scipy here but you will find practical stuff like web scraping, parsing PDFs and Word docs, updating spreadsheets, working with email, scheduling tasks, pattern matching, and manipulating images.
— Data Viz —
If you've seen fewer experimental data visualizations lately, it's only because the medium has grown up and gotten a job... This is a super interesting article that has generated quite a buzz around the web this week. There's some great viz work here too. This is a MUST READ for everyone interested in the field of data visualization and where it's going.
And for insights into how the above article translates to the real world, check out this monthly roundup by Andy Kirk of the best data visualization work around the web. There's a lot here to learn from and be inspired by. Highly recommended.
Points go beyond where lines and bar graphs stop. Here are 7 good reasons to consider using points instead of lines.