Overfitting: A Guided Tour. Data project checklist. Learning ML. Optimizing sample sizes in A/B testing. Chess w/ GPT-2. On DS consulting.

No images? Click here

Data Elixir

ISSUE 267  Ā·   January 14, 2020        

 

In the News

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Google has a massive impact on the tools, applications and research that help steer the data science community. As with prior years, this retrospective by Jeff Dean is amazing in its scope. Includes useful summaries, screenshots, videos and linked references throughout.
Google AI Blog

 

Insights

Ironies of automation

"... as we look forward to the next decade a few things seem certain: increasing automation, increasing system complexity, faster processing, more inter-connectivity, and an even greater human and societal dependence on technology. What could possibly go wrong?"
The Morning Paper

 

Sponsored Link

Announcing The Metis Corporate Training Courses for 2020

Metis offers immersive, hands-on data science and analytics training that can scale to fit the needs of your organization. Our senior data scientists specialize in teaching business teams in-demand topics including data literacy for technical and non-technical employees, Python for analysts, and Foundations in ML, all through a real world lens. Learn More.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tools and Techniques

Overfitting: A Guided Tour

Great introduction to overfitting by Alex Hayes that offers insights beyond what's typically covered in introductory machine learning resources. Covers both prediction and inference problems, provides supervised and unsupervised examples of overfitting, and presents a fundamental relationship between train and test error.
aleatoric

 
 
 
 

Data project checklist

This collection of considerations for new data projects is based on Jeremy Howard's decades of experience with projects across a wide variety of industries. This is a post to come back to again and again.
fast.ai

 
 
 

Optimizing sample sizes in A/B testing

In a 3-part series, Chris Said presents a practical way of determining optimal sample sizes that abandons the notion of statistical significance. This first part is a non-technical overview that ends with a section called, "Three lessons for practitioners." Follow the links for more.
The File Drawer

 
 
 

A Very Unlikely Chess Game

Clever use of GPT-2. Give it a corpus of chess games, represented as text-based moves, and it learns to play! Includes a link to a Colab Notebook where you can try it yourself.
Slate Star Codex

 
 
 

Check out the latest episode of No BiAS: NeurIPS 2019 Review

No BiaS is a new podcast about the emerging and ever-shifting terrain of artificial intelligence & machine learning. Didn't win the lottery to get into the hottest ML conference of the year? No sweat! Get the full rundown of NeurIPS 2019 in the latest episode of No BiAS šŸŽ§ with Cheryl Martin & Brent Schneeman.
// sponsored

 

Resources

Learning to Teach Machines to Learn

Awesome collection of resources for learning machine learning. This is very well curated and organized. Includes online books, interactives, courses, key articles and other resources.
Alison Hill's Blog

 

Career

Doing Freelance Data Science Consulting in 2019

Consulting work can be lucrative but even in data science, it's not a sure path. This is an insightful post by Ethan Rosenthal about the challenges and opportunities of data science consulting work.
Ethan Rosenthal's Blog

 

Job Board

  • Executive Director at The Carpentries - REMOTE - The Carpentries seeks its next Executive Director to spearhead the organisation’s vision of becoming the leading inclusive community teaching data and coding skills.
     
  • Director of Digital Strategy & E-Commerce at SmartyPants Vitamins -  Marina del Rey, CA - SmartyPants is seeking a highly motivated and tactically creative individual who will develop and maintain a robust and nimble digital strategy practice...
     
  • Big Data Engineers at Leidos - Reston, VA - The DOMEX Data Discovery Platform (D3P) program is a next generation machine learning pipeline platform providing cutting edge data enrichment, triage, and analytics capabilities to Defense and Intelligence Community members. This senior engineer will...
     
  • Data Scientist - Use Data to Improve Mobile Apps that Treat Disease at Pear Therapeutics - Boston, MA - We are looking for data scientists with a deep product sense, who have an innate curiosity about end-users, and are eager to dive into large, complex datasets and create actionable insights.
     
  • Data Scientist with a Data Engineering Background at Pear Therapeutics - Boston, MA - We are answering the question, "What if our phones could treat disease?" We are looking for a Data Scientist with experience in data engineering to provide a strong data foundation to build upon for reporting, analysis, and modeling.
     
  • Data Analyst at Social Finance - London, UK - Social Finance is an ambitious not for profit organisation which seeks to drive social change. We are looking for high performing Data Analysts to join our growing Digital Labs team.

More >>

 

Data Elixir is curated and maintained by @lonriesberg. For additional finds from around the web, follow Data Elixir on LinkedIn, Twitter or Facebook.

 
FacebookTwitterLinkedInWebsite
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe