— Insight —
In many ways, GPT-2 works remarkably well so does it matter that it doesn't know what it's talking about? In this essay, Gary Marcus explores what GPT-2 reveals about the nature of intelligence.
— Tools and Techniques —
More and more, researchers are acknowledging that training big models requires a non-trivial amount of energy. How much energy? This interactive site estimates the carbon impact of your research based on your specific hardware, run-time and cloud provider. Also, see the related paper, Quantifying the Carbon Emissions of Machine Learning.
Wayfair's product catalog is massive. They have something for everyone but that selling point makes it hard to know which items to present to new customers. In this post, Dave Harris and Tom Croonenborghs walk-through Wayfair's new Bayesian system that considers a variety of factors to identify products with broad appeal.
Notebooks are wildly popular but they're far from perfect. In this post, Austin Z. Henley summarizes the findings of a recent paper that explores key pain points, needs, and design opportunities.
277 data scientists across 20 industries confirmed the broad challenges observed in our customer organizations. Labeling training data for ML projects poses a significant obstacle for data science teams as they strive to prove ROI and get their projects into production.
— Resources —
Google's Dataset Search is now officially out of beta. There are currently ~25 million datasets in the index and it's still growing. Here's an overview of what's in the index now, how to access it, and how to make sure your own datasets get included.
This introduction to data science is intended for people with no programming experience. The goal is to present a small subset of Python that allows you to do real work in data science as quickly as possible. This is by Allen Downey, author of Think Python, and will be used to teach introductory data science in courses at Olin and Harvard.
— Data Viz —
Altair is a Python visualization library that's rooted in the "Grammar of Graphics," like ggplot2. It's related to a few other visualization specifications and libraries, such as Vega-Light, Vega, and D3.js. This is a super useful post that walks-through what they each do, how they're connected, and ultimately, how to best utilize this visualization stack.
In this interview, Stuart A. Thompson gives a behind-the-scenes look at how the “One Nation, Tracked” story came to life at the New York Times.
— Conferences & Events —
Strata Data & AI - London - Data feeds AI; AI makes sense of data. So it also made sense to combine the O’Reilly Strata Data and AI Conferences—covering two of the most pressing technological trends of the decade—and giving you access to the full breadth of both programs. April 20-23. Details and Registration Info >>
— Career —
Chip Huyen analyzed compensation and levels data for 19k tech workers to answer questions about compensation, career paths, and bias. The data primarily covers software engineers in large U.S. companies but this is a great analysis with broad insights across tech.
Great post by an industry insider about getting work in sports analytics. Includes insights about university opportunities, where to find datasets for personal projects, competitions, conferences and more.