Issue 161
— In the News —
There’s No Such Thing As ‘Sound Science’
It might seem like it would be easy to sort good science from bad but the reality is complicated. For results to be believable, everything needs to be out in the open, including the raw data, code, and details of how it was processed and analyzed. From there, decision-makers get to figure out how to interpret the uncertainties. This article in FiveThirtyEight explores the issues and how science is being turned against itself.
Using Artificial Intelligence to Augment Human Intelligence
Great article in Distill that explores how AI-powered user interfaces can give people new tools for reasoning. This is a worthwhile longread that includes interactives to play with.
Wes McKinney built the basics of pandas in 2008 and made the project public in 2009. The following year, it started being discovered and Wes made the decision to drop out of grad school to work on pandas full-time. That turned out to be an important move for Wes and, for the data science community, it was pivotal. Here's the backstory to what has become one of the most important tools in data science.
— Profiles —
Meet the man behind the most important tool in data science
If you're new to machine learning, definitely check out this slide deck by Jason Mayes at Google. It's a fun and well-organized approach to learning and you decide how deep to go. Do you want the green pill or the blue pill..? SQL Window Functions offer a lot of flexibility in cases where you might otherwise be tempted to write hack-ish workarounds. Window functions are readable, performant, and are easy to debug. This tutorial describes what they are and shows how to use them. The typical analytics stack is moving away from monolithic solutions that try to do everything. This post on Mode's blog explores a better approach. Even if you don't work with it everyday, Excel is a widely used data tool that's worth being familiar with. This tutorial shows how to extract the data from Excel files into pandas so you can work with it; and how to write data back to Excel files so your boss can understand it. Covers a variety of Excel topics like multiple sheets, headers, how to skip records, how to read a subset of columns, pivot tables, etc.
— Tools and Techniques —
Machine Learning 101
SQL Window Functions Tutorial
Designing An Analytics Stack Like We Design Software
Using Excel with pandas
A key research team at Google just released a research paper that shows how machine-learned indexes can replace B-Trees, Hash Indexes, and Bloom Filters. It's significantly faster than current methods, requires far less space, and runs on GPUs. Ultimately, "replacing core components of a data management system through learned models has far reaching implications for future systems designs." This is ground-breaking work. Twitter was on fire last week with comments and announcements from the thirty-first Annual Conference on Neural Information Processing Systems (NIPS 2017). Here are the highlights, including lots of linked references:
— Deep Learning —
The Case for Learned Index Structures
NIPS 2017
There are many types of maps that are used to display data. Common strategies often focus on a particular variable but those don't always work when you have multiple variables to present at the same time. In this post, Jim Vallandingham explores the options for multivariate data presentation.
— Data Viz —
Multivariate Map Collection
Be sure to catch the most popular links from last week's issue...
— In Case You Missed It —
Data Elixir is curated and maintained by @lonriesberg. If some awesome person forwarded this issue to you, subscribe for free at dataelixir.com and get it delivered every week.
— About —
Sign up for Free
and join the thousands of data lovers who already start their week with us.
No spam, ever. We'll never share your email address and you can opt out at any time.
No spam, ever.