Data Elixir logo

ISSUE 327   ·   March 16, 2021

 

In the News

How Facebook got addicted to spreading misinformation

Artificial Intelligence at scale depends on vast resources of data and talent that few organizations are able to pull together. Facebook does it but an organization as large as Facebook is easily tangled in complex internal dynamics that can hinder efforts for "Responsible AI ." This is a well-researched, epic article that explores how Facebook failed.
MIT Technology Review | Karen Hao

 
 
 

Gartner cites ‘glut of innovation’ in DS and ML

A recent Gartner report evaluates DSML platforms and, among other things, cites a “glut of compelling innovations.”  Here are the highlights.
VentureBeat | Louis Columbus

 

Sponsored Link

Need high-quality training data for Machine Learning & AI?

Need high-quality training data for ML & AI?

Let your data teams focus on core research and business objectives while our in-house, advanced workforce handles the data annotation work for you. iMerit is trusted by Microsoft, Autodesk, and TripAdvisor for their computer vision and natural language processing data annotation needs. Get a free quote today.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

How to Write Better

Eugene Yan consistently writes some of the most useful data science articles you'll find on the web. His posts are well researched and his writing is super clear and organized. In this two-part series, he offers his framework for thinking through and organizing writing projects —  especially as they relate to technical and design docs:

  • Part 1: The Why, What, How Framework
  • Part 2: How to Write Design Docs for Machine Learning Systems
  • Design Doc Template

Eugene Yan

 
 
 

We Failed to Set Up a Data Catalog 3x. Here’s Why.

When the data team at Atlan decided to create a data catalog, they thought it would be easy. After three failed attempts over five years, they finally got it right. This post gives a high-level view of each iteration, how they approached it, and lessons learned along the way.
Humans of Data Blog | Prukalpa Sankar

 
 
 

Pattern-based spatial analysis

Pattern-based spatial analysis allows for spatial search, change detection, and clustering of areas with similar patterns. It's commonly used with satellite imagery for environmental monitoring, real estate planning, healthcare analytics, etc. This five part introduction is easy to follow and includes lots of examples, code snippets and screenshots.
Jakub Nowosad

 
 
 

Instrumental variables analysis for non-economists

Instrumental Variables analysis is a popular way to do causal inference but it's often explained using concepts that are mostly familiar to economists. In this introduction, Chris Said uses a visual approach to explain concepts in a way that tends to be more intuitive for data scientists and engineers.
Chris Said

 
 
 

Defensible Machine Learning

ML companies have the potential to revolutionize business but they're hard to scale and they look and feel different from typical software businesses. Here's a simple framework to help founders and investors think through three types of defensible machine learning companies. Especially if you're thinking about ML startups, this is a must-read.
Base Case Capital | Ankur Goyal, Alana Anderson

 

Code & Tools

Luminaire - A hands-off Anomaly Detection Library

Luminaire is a python package that provides ML-driven solutions for monitoring time series data. Luminaire provides anomaly detection and forecasting capabilities that incorporate correlational and seasonal patterns as well as uncontrollable variations in the data over time.
GitHub | Zillow

 
 
 

image2csv

image2csv is a Python application converts images of an array of numbers to corresponding csv files. In other words, take a screenshot of a table and image2csv will create a csv file using the data in the table.
GitHub | artperrin

 

Resources

Data Pricing: from Economics to Data Science

This survey paper explores ways to systematically assess the value of data. At 82 pages, it's not a light read but it's well-organized and is a fantastic resource for anyone interested in putting a monetary value on data products. For a good introduction, check out Ben Lorica's recent interview with the author on The Data Exchange Podcast.
arXiv | Jian Pei

 

From the Archives

How to send an 'E mail' - from 1984 via Database

This short video that's been bouncing around the web this week offers a glimpse into the world of email in 1984. Ultimately, it shows why email wasn't a big thing back then! We've come a long way since 1984 but then again, if you watch closely when he enters his password, you might think that some things haven't changed much at all.
YouTube | ThamesTv

 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
Data Elixir logo

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

 

To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >>

 
FacebookTwitterLinkedInWebsite
 
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe