— Profiles —
In this interview, Matt Turck of FirstMark Capital discusses the way that machine learning-first startups are built, which ones are pushing the ecosystem forward, and why they look so different than the SaaS startups that came before them. Matt is super knowledgeable in this space and if you're at all interested in the business of AI, this is definitely worthwhile.
— Sponsored Link —
SignalBox is a preconfigured Deep Learning Platform with a drag and drop web interface for model building, automated data ingestion, and one-click promotion to production APIs.
This is a super useful platform that I've become personally involved with. If you have a business case for deep learning, SignalBox lowers your cost of experimentation and reduces the need for engineering support.
If you're interested, I'd be happy to discuss your specific use cases. Send me a note at [email protected] or set up a time on my calendar for a short call.
— Tools and Techniques —
Great article that explores how the locations of new businesses are determined. Using data from dozens of different sources and techniques from a wide range of disciplines, including game theory, machine learning, natural language processing, statistics, and urbanism, the authors create a model that predicts optimal coffee shop locations in New York City.
Another day, another breach. Here's a practical guide for protecting your data online and, even more importantly, everyone else's.
A recent study by Stack Overflow showed that Python is the fastest-growing major programming language. In this article, David Robinson explores why. He considers the specific packages being discussed on Stack Overflow, industries that users are coming from, and user job roles. It turns out that Python's growth is largely about data science.
This tutorial shows how to use Twitter's API to access a user's Twitter history and perform basic sentiment analysis using Python's textblob package. Includes lots of code snippets and it's trivial to swap in any Twitter user you might be interested in.
Want to build or improve a search experience? Start here. This is a fantastic overview, including design considerations, algorithms, practical suggestions, existing search products, and lots of linked references. This post is part of an open, collaborative effort to build an online reference, The Open Guide to Practical AI.
— Data Viz —
Sparklines are small, word-sized graphics that offer a quick representation of some set of numbers. Edward Tufte calls them "datawords." This new font by the After the Flood studio enables you to create sparklines in a block of text without using code. It's free and it works in a variety of applications, including MS Word, Illustrator, and popular browsers.
Visualizing the bytes in a binary file can be used to reveal information about the hidden structure of the file. In some cases, that's useful but even when not, the results are compelling. This article shows how to create simple binary data visualizations with R, using ggplot2. Includes code, screenshots, and discussion of the structural differences between file types (e.g. audio, pdf, executables, compressed files, etc).
— Career —
The list of skills that organizations look for can be daunting: statistics, computer science, machine learning, data visualization, TensorFlow, clear writing, domain expertise, etc, etc. Ever feel like you're not a real data scientist?
Here's a "required skills" summary of the first 1000 results when you scrape Indeed.com for jobs that are related to each of the following: "data scientist," "machine learning," "data engineer," and "data analytics."
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...
— About —
Data Elixir is curated and maintained by @lonriesberg. If some awesome person forwarded this issue to you, subscribe for free at dataelixir.com and get it delivered every week.