Data Elixir logo

ISSUE 339   ·   June 8, 2021

 

In the News

Machine learning is booming in medicine. It’s also facing a credibility crisis.

Experts say fatal flaws in the methodology of many machine learning papers make it hard to have faith in published claims.
STAT | Casey Ross

 

Sponsored Link

Online Data Science Programs from Drexel University

Online Data Science Programs from Drexel University

Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

How Airbnb Achieved Metric Consistency at Scale - Part 2

This second post in a series walks through the core design principles of Airbnb's in-house metric platform called "Minerva." Specifically, here's how Airbnb defines, versions, and back-fills datasets to effectively manage its data at scale. This a useful post that includes lots of details and rationale for why things work they way they do.
Airbnb Engineering & Data Science | Amit Pahwa et al.

 
 
 
 

Building Effective Data Science Teams

Whether you're the first data person at your organization or leading a team of hundreds, creating an effective data team depends on a lot more than the tech you're using. In this transcript of a recent panel discussion, data science leaders from a variety of organizations share their insights.
RStudio Blog | Rachael Dempsey

 
 
 

Saturn Cloud: End-to-End Data Science & Machine Learning Platform

Join thousands of data scientists creating faster and more scalable Python projects with Hosted Dask, GPUs, and more. Switch from working on a laptop to a cloud instance with hosted Jupyter, use jobs and deployments to run repeating tasks, host APIs, and deploy dashboards. Join for free here
// sponsored

 

Code & Tools

JupyterLite - browser-based interactive computing

JupyterLite is a WebAssembly (Wasm) powered Jupyter distribution that runs entirely in the browser. This is an early release, built from the ground-up using JupyterLab components and extensions. Real-time collaboration is said to be Coming Soon! Follow the links for a live demo.
GitHub | Jeremy Tuloup

 
 
 

Stemma: Helping you trust your data

Amundsen is the leading open-source data catalog. It has the largest and fastest-growing community and the most integrations. Stemma brings you Amundsen and more, with enterprise management and richer metadata.
Mark Grover

 

Resources

Data Science at the Command Line, 2nd edition

This hands-on guide shows how command line tools can help you be more be efficient and productive with data. The new second edition will be published in October and is available to read online while it's being finalized. Read the first edition here >>
Jeroen Janssens

 
 
 

R for Public Health

Nice collection of resources for people who are interested in working with R in a public health context. Includes courses, books, and key packages to know.
R Views | Joseph Rickert

 

Data Visualization

Visualizing Distributions with Raincloud Plots

Great tutorial that shows how boxplots can be problematic, how to improve them, and alternative techniques that can be used to show summary statistics, as well as the true distribution of the raw data. Includes lots of examples and sample code using ggplot2.
Cédric Scherer

 
 
 

Scientific visualizations in the browser for Python

They can be super effective but if you're not a web expert, building dynamic visualizations for the web can be daunting. In this guide, Patrick Mineault presents a variety of web visualization frameworks, starting with the lowest barrier to entry as possible. It's written from the perspective of a Python user but for the most part, Python isn't required. 
Patrick Mineault

 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
Data Elixir logo

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

 

To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >> 

 
 
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe