Notebook pains and opportunities. How to get into sports analytics. Nature of intelligence. Bayesian product ranking. Understanding Altair. How to calculate your model's CO2 impact.

No images? Click here

Data Elixir

ISSUE 269  ·   January 28, 2020        

 

Insight

GPT-2 and the Nature of Intelligence

In many ways, GPT-2 works remarkably well so does it matter that it doesn't know what it's talking about? In this essay, Gary Marcus explores what GPT-2 reveals about the nature of intelligence.
The Gradient

 

Sponsored Link

Data scientists are in demand on Vettery

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tools and Techniques

ML CO2 Impact

More and more, researchers are acknowledging that training big models requires a non-trivial amount of energy. How much energy? This interactive site estimates the carbon impact of your research based on your specific hardware, run-time and cloud provider. Also, see the related paper, Quantifying the Carbon Emissions of Machine Learning.
Github | mlco2

 
 
 

Bayesian Product Ranking at Wayfair

Wayfair's product catalog is massive. They have something for everyone but that selling point makes it hard to know which items to present to new customers. In this post, Dave Harris and Tom Croonenborghs walk-through Wayfair's new Bayesian system that considers a variety of factors to identify products with broad appeal.
Wayfair Tech Blog

 
 
 

What's wrong with computational notebooks?

Notebooks are wildly popular but they're far from perfect. In this post, Austin Z. Henley summarizes the findings of a recent paper that explores key pain points, needs, and design opportunities.
Austin Z. Henley's Blog

 
 
 

Survey Shows AI & ML Are Still Nascent

277 data scientists across 20 industries confirmed the broad challenges observed in our customer organizations. Labeling training data for ML projects poses a significant obstacle for data science teams as they strive to prove ROI and get their projects into production.
// sponsored

 

Resources

Discovering millions of datasets on the web

Google's Dataset Search is now officially out of beta. There are currently ~25 million datasets in the index and it's still growing. Here's an overview of what's in the index now, how to access it, and how to make sure your own datasets get included.
Google Blog

 
 
 

Elements of Data Science

This introduction to data science is intended for people with no programming experience. The goal is to present a small subset of Python that allows you to do real work in data science as quickly as possible. This is by Allen Downey, author of Think Python, and will be used to teach introductory data science in courses at Olin and Harvard.
Allen Downey

 

Data Viz

Understanding The Altair Stack

Altair is a Python visualization library that's rooted in the "Grammar of Graphics," like ggplot2. It's related to a few other visualization specifications and libraries, such as Vega-Light, Vega, and D3.js. This is a super useful post that walks-through what they each do, how they're connected, and ultimately, how to best utilize this visualization stack.
Eitan Lees Blog

 
 
 

‘Twelve Million Phones, One Dataset, Zero Privacy’

In this interview, Stuart A. Thompson gives a behind-the-scenes look at how the “One Nation, Tracked” story came to life at the New York Times.
Nightingale

 

Conferences & Events

Strata Data & AI - London  - Data feeds AI; AI makes sense of data. So it also made sense to combine the O’Reilly Strata Data and AI Conferences—covering two of the most pressing technological trends of the decade—and giving you access to the full breadth of both programs. April 20-23.  Details and Registration Info >>

More >>

 

Career

Analysis of compensation, level, and experience details of 19k tech workers

Chip Huyen analyzed compensation and levels data for 19k tech workers to answer questions about compensation, career paths, and bias. The data primarily covers software engineers in large U.S. companies but this is a great analysis with broad insights across tech.
Chip Huyen's Blog

 
 
 

Getting into Sports Analytics 2.0

Great post by an industry insider about getting work in sports analytics. Includes insights about university opportunities, where to find datasets for personal projects, competitions, conferences and more.
Sam Gregory

 

Job Board

Along with the jobs below, announcements on the Job Board this week include openings for a Head of Data at the NBA, a remote Data Scientist at Life Epigenetics, a Lead for NASA's Scientific Visualization Studio, and a Data Scientist at Apple.

The Job Board is being actively curated. If you know of an interesting opening, get in touch!

  • Analytics Analyst at Horizon Blue Cross Blue Shield - Newark, NJ - Horizon Blue Cross Blue Shield of New Jersey is seeking Analytics Analysts, who will partner with multiple customers across the Enterprise to provide complex resolutions to analytics/data problems...
     
  • Director of Digital Strategy & E-Commerce at SmartyPants Vitamins -  Marina del Rey, CA - SmartyPants is seeking a highly motivated and tactically creative individual who will develop and maintain a robust and nimble digital strategy practice...
     
  • Big Data Engineers at Leidos - Reston, VA - The DOMEX Data Discovery Platform (D3P) program is a next generation machine learning pipeline platform providing cutting edge data enrichment, triage, and analytics capabilities to Defense and Intelligence Community members. This senior engineer will...

More >>

 

Data Elixir is curated and maintained by @lonriesberg. For additional finds from around the web, follow Data Elixir on LinkedIn, Twitter or Facebook.

 
FacebookTwitterLinkedInWebsite
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe