Intro to autoencoders. Move predictions to your db. ML/NLP research summaries. Modeling salary and gender. Why R? Bayesian inference. Sheets-fu.

No images? Click here

Data Elixir

ISSUE 266  ยท   January 7, 2020        

 

In the News

Precision fisheries: Navigating a sea of troubles with advanced analytics

This new report from McKinsey & Company explores how advanced analytics may help struggling fisheries thrive while simultaneously protecting endangered ocean resources.
McKinsey & Company

 

Sponsored Link

Evaluating Generative Adversarial Networks (GANs)

With deep fakes entering the mainstream, data scientists and researchers are assessing whether to leverage GANs in their own workflows. To help industry with this assessment, the upcoming Domino Data Lab webinar covers an implementation of a basic GAN model and demonstrates how adversarial networks can be used to generate training samples. Register here.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tools and Techniques

Why I use R. They said the war was over...

This isn't another R versus Python post. Gordon Shotwell offers a smart, thoughtful perspective about his preference for R and how ultimately, programming languages are "just bundles of trade-offs."
Gordon Shotwell's Blog

 
 
 

Modeling salary and gender in the tech industry

In her latest post, Julia Silge uses data from the 2019 Stack Overflow Developer Survey to explore how gender affects salary for people who code. This is insightful, easy to follow and is a great walk-through of her approach for modeling the data.
Julia Silge's Blog

 
 
 

Iterated mark and recapture

Suppose you have a population of wild animals and you want to estimate the population size. It's impractical to catch them all, so what do you do? This post walks through the problem with a nice demonstration of Bayesian inference.
Andrew M. Webb's Blog

 
 
 

Sheetfu

Sheetfu is a small library that provides an easy way to interact with Google Sheets from Python. With Sheetfu, you can get or set cell values, background colors, font colors or any other cell attributes that are supported by the Google App Script API.
Social Point Labs via Github

 
 
 

tidypredict

tidypredict lets you move your predictions to your database! Fit a model in R and then use tidypredict with dplyr to create a runnable SQL statement. Supports a variety of models including linear regression, GLMs, random forest, XGBoost, tree models and more.
tidymodels via Github

 
 
 

Comet: Machine Learning Experiment Management

Join tens of thousands of data scientists worldwide who use Comet.ml
Automatically track, compare, explain and reproduce your ML models and experiments.  Sign-up for free.
// sponsored

 

Resources

10 ML & NLP Research Highlights of 2019

Great summary of 2019 research highlights by Sebastian Ruder. Each highlight includes a short summary, links and an outlook for the future.
Sebastian Ruder's Blog

 
 
 

FiveThirtyEight Datasets

FiveThirtyEight is an online news outlet that uses statistical analysis to tell stories about elections, politics, sports, science, economics and lifestyle. Many people don't realize that FiveThirtyEight also shares the data behind each of its articles so you can verify the analysis or dig in and find other stories. This guide to the articles and data is a great learning resource.
FiveThirtyEight

 

Data Viz

2019 was hotter than normal - what does that mean?

What does "normal" even mean?
vis4net (Gregor Aisch's Blog)

 
 
 

Anomagram: Interactive Visualization for Autoencoders

This project by Victor Dibia is a gentle introduction to anomaly detection with autoencoders.  His introduction to the project on Medium is also worthwhile.
Victor Dibia via Github

 

Job Board

  • Data Scientist - Use Data to Improve Mobile Apps that Treat Disease at Pear Therapeutics - Boston, MA
  • Data Scientist with a Data Engineering Background at Pear Therapeutics - Boston, MA
  • Executive Director at The Carpentries - REMOTE
  • Data Analyst at Social Finance - London, UK
  • Part-time Data Scientist for Applied Healthcare Research at University of Lucerne, Switzerland - REMOTE
  • Research Associate, Center for Data Insights at MDRC - New York, NY or Oakland, CA
  • Data Analytics Course Mentor at Springboard - REMOTE
  • Senior Data Scientist at Intercom - San Francisco, CA, USA
  • Data Engineer at OECD (Organisation for Economic Co-operation and Development) - Paris, France

More >>

 

Data Elixir is curated and maintained by @lonriesberg. For additional finds from around the web, follow Data Elixir on LinkedIn, Twitter or Facebook.

 
FacebookTwitterLinkedInWebsite
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe