Practical decision-making w/ causality. SQL for DS. Lacking uncertainty. Urban sensing. Efficient computation w/ R. NLP best practices.

No images? Click here

Data Elixir

ISSUE 265  ยท   December 31, 2019        

 

In the News

One Nation, Tracked

This New York Times investigation is a must-read deep dive into a data industry that's perfectly legal but should alarm us all. Via apps on our phones, dozens of companies log the movements of tens of millions of people and can show where we spend our days, who we spent last night with, where we worship, whether or not we visit a psychiatrist, etc. Follow the links for info about locking down your phone.
New York Times

 
 
 

AI is rushing into patient care - and could raise risks

Medical applications are a hot growth area for AI but in the "fail fast and fix things later" world of entrepreneurial tech, serious mistakes are being made.
Scientific American

 

Sponsored Link

ML Best Practices

ML Best Practices: The Keys To Achieving Quality Data Annotation

To maximize the potential in ML models, high-quality data is key. When measuring the quality of labeled data consider IOU, Accuracy, Recall, Precision, and F1 Score. An enterprise-grade labeling platform that employs a holistic annotation approach can scale quality annotation without diminishing complexity and compromising accuracy.
Learn more.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tools and Techniques

The 'largest stock profit or loss' puzzle: efficient computation in R

In his latest post, David Robinson starts with a SQL interview question and shows how to make the solution as computationally efficient as possible using a tidyverse approach. This is a good think-through of different approaches.
Variance Explained

 
 
 

Want to make good business decisions? Learn causality.

Great post on the StitchFix tech blog that uses a common marketing question to show how being causal information driven is more effective than being naively data driven.
MultiThreaded

 
 
 

Urban sensing: quantifying the sensing power of vehicle fleets

Smart cities are powered by sensors that measure things like air pollution, traffic congestion, temperature, humidity and road quality. Covering a large area is expensive but what if sensors could be mobile? How many taxis would it take to effectively "scan" an entire city? This is a nice exploration of the problem and how to measure urban sensing. 
Urban Data Science

 

Resources

SQL for Data Science Course

This free course by Kristen Kehrer starts with simple database queries and continues through data cleaning and feature engineering. Includes videos, cheat sheets and an interactive SQL browser for following along.
Data Moves Me

 
 
 

NLP Recipes

This collection of Jupyter notebooks showcases best practices and examples for common scenarios involving text and language. The intention is to significantly reduce development time for both researchers and practitioners.
Microsoft Github Repo

 

Data Viz

The Curious Absence of Uncertainty from Many (Most?) Visualizations

Consider the last few visualizations you encountered outside of scientific publications. Did they depict uncertainty? Probably not. Here's what Jessica Hullman discovered when she asked why.
Multiple Views

 
 
 

Inquiry into Matplotlib's Figures, Axes and Gridspec

Matplotlib is notoriously difficult to learn but if you're already familiar with it, this is a great guide to help you level-up your skills.
Akash Palrecha via Github

 

Career

Becoming an Independent Researcher

This post is a hard reality check on the path to Machine Learning Research. There's a lot of opportunity here but the competition and stakes are high.  This followup thread from Julian Togelius is also worthwhile.
Andreas Madsen via Medium

 

Job Board

  • Data Analyst at Social Finance - London, UK
  • Part-time Data Scientist for Applied Healthcare Research at University of Lucerne, Switzerland - REMOTE
  • Research Associate, Center for Data Insights at MDRC - New York, NY or Oakland, CA
  • Data Analytics Course Mentor at Springboard - REMOTE
  • Senior Data Scientist at Intercom - San Francisco, CA, USA
  • Data Engineer at OECD (Organisation for Economic Co-operation and Development) - Paris, France

More >>

 

Data Elixir is curated and maintained by @lonriesberg. For additional finds from around the web, follow Data Elixir on LinkedIn, Twitter or Facebook.

 
FacebookTwitterLinkedInWebsite
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe