Data Elixir logo

ISSUE 341   ยท   June 22, 2021

 

Insight

Experts doubt ethical AI will be broadly adopted anytime soon

This new report shows that a majority of tech innovators, business & policy leaders, and researchers & activists believe that the evolution of AI will continue to be focused on optimizing profits and social control. That's easy to understand, especially considering the range of responses here. Even so, many are optimistic about how AI will ultimately be beneficial. This is a thought-provoking longread.
Pew Research Center

 

Sponsored Link

Using external data in a post-vaccinated world

Using external data in a post-vaccinated world

Join this webinar to discover how AWS Data Exchange provides the data to help you understand consumer behavior for better data insights and business decisions in a post-pandemic economy.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

What is a "data scientist" or "machine learning engineer", really?

Job titles tend to be overloaded and aren't super helpful for understanding the work that someone does. What does a "Data Scientist" actually do? How about a "Data Analyst?" Or an "ML Engineer?" In this project, Paige Bailey uses survey results from Kaggle, Anaconda and Stack Overflow to explore who does what, and how. Based on the responses, she found 6 distinct clusters; and no apparent correlation to job title.
GitHub | Paige Bailey

 

Understanding p-values Through Simulations

P-values are often misinterpreted or misused. This is a fantastic interactive tool for learning about and/or explaining how p-values work.
Kristoffer Magnusson

 

Applied NLP Thinking: How to Translate Problems into Solutions

Applied NLP is different than research NLP. With Applied NLP, you'll never ship anything valuable if you abstract out the application goals and treat the work as an optimization problem. This post is a non-technical introduction to thinking through real-world NLP problems and how to approach them. 
Explosion | Ines Montani

 

Using Geospatial Data in R

Great tutorial for working with geospatial data using R. Starts with an overview of how geospatial data corresponds to places in the real world and then introduces a workflow that includes data management, data wrangling with simple features, and ways to visualize geospatial data. This is well-organized and includes lots of code snippets, linked references, and screenshots along the way.
Method Bites

 

 

StreamSets: Build Smart Data Pipelines for Free

Get early access to the new StreamSets DataOps Platform. Quickly build and deploy streaming, batch, CDC, ETL and ML pipelines across hybrid and multi-cloud. Plus, handle data drift automatically so jobs keep running even when schemas and structures change. Get early access now.
// sponsored

 

Code & Tools

Kats - One stop shop for time series analysis in Python

Kats is a lightweight, easy to use, and extendable framework that aims to be a one-stop shop for time series analysis, including detection, forecasting, feature extraction/embedding, multivariate analysis, etc. Kats is from Facebook's Infrastructure Data Science team and is open-sourced with an MIT license.
Facebook Research

 

Exploring Data Lineage with OpenLineage

OpenLineage is an open platform for collecting and analyzing data lineage. It tracks metadata about datasets, jobs, and runs, and gives users the information they need to identify the root cause of complex issues. Here's how to get started.
Hightouch | Pedram Navid

 

Resources

Machine Learning Interviews Book

Regardless of what side of the table you're on, this is an awesome new resource for anyone who's interviewing for machine learning roles. Along with a wide range of sample questions, this includes discussion of typical roles, types of companies, interview processes, how to think through an offer, do's and don'ts, useful resources and much more. This is an open-source book that's being actively developed.
Chip Huyen

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

 

To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >>

 
 

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Unsubscribe