Data Elixir logo

ISSUE 388  ·   May 24, 2022

 

In the News

Using ML to Help Protect the Great Barrier Reef

In spite of the costs, machine learning has been successfully used in a variety of conservation projects around the world. Here's an inside look at how the Great Barrier Reef Foundation leveraged the latest technologies to survey, monitor and map reefs at scale.
TensorFlow Blog

 

Organizations

Don’t just run your data team like a product team, run it like a company that needs to scale

Data teams are always under-resourced, but simultaneously can be seen as an already expensive investment. Here are some ideas for getting the support your data team needs.
Emily Thompson

 

Sponsored Link

How to Capture Advantages by Investing in High-Quality Training Data

How to Capture Advantages by Investing in High-Quality Training Data 

At the enterprise level, machine learning requires either large amounts of training data or a smaller set of extremely high quality data, as well as the infrastructure to support high data volumes. Consequently, labeling data through robust software or in partnership with an annotation service provider is critical to project success.  Read more.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 

Tutorials, Projects & Opinions

How random forests really work

In this notebook tutorial, Jeremy Howard from fast.ai shows how Random Forests work, by building one from scratch, and then using it to submit to a Kaggle competition.
Kaggle | Jeremy Howard 

 

Visualizing multicollinearity in Python

Multicollinearity is when two or more features are correlated with each other in a dataset and it's important to identify and understand it prior to training predictive models. This post explores three ways to visualize multicollinearity, including pros/cons of each.
Kenan Ekici

 

Marginalia

In the world of statistics, “marginal” means “additional,” or what happens to outcome variable y when explanatory variable x changes a little. This isn't short but it's a gentle introduction to all things marginal and how they work: marginal effects, marginal slopes, average marginal effects, marginal effects at the mean, and more.
Andrew Heiss

 

Unlock Secret Knowledge from Data Experts for $10

Packt's Spring Sale is on and for a limited period, all eBooks and Videos are only $10. Our Products are available as PDF, ePub, and MP4 files for you to download and keep forever. All the practical content you need - by developers for developers.
// sponsored

 

Resources

Software Development Resources for Data Scientists

Great collection of resources that will help data teams create reproducible and production-ready code and tools. This is a crowd-sourced collection covering project structure, automatated testing, reproducible environments, and version control.
RStudio Blog | Isabella Velásquez

 

Mathematics for Machine Learning

This is a tightly curated collection of free books, videos, and papers for learning mathematics for machine learning. Covers all levels.
GitHub | Elvis Saravia

 

Code & Tools

LineaPy

LineaPy is a Python package for data scientists that makes it easy to go from prototype to production. Just add two lines of code and LineaPy will automatically capture, analyze, and transform messy data science code to production data pipelines. No refactoring or new tools needed.
GitHub | LineaPy

 

NannyML

NannyML is an open-source python library that estimates real-world model performance (without access to targets), detects data drift, and links data drift alerts to changes in model performance. It's easy to use, model-agnostic and supports all tabular binary classification use cases.
GitHub | NannyML

 

Obsidian Dataview

Dataview is a data index and query language over Markdown files. It's designed as an Obsidian plugin and will give you superpowers with your Obsidian Vaults. If you're not familiar with it, Obsidian is a free graph knowledge base that works on top of a local folder of Markdown files and is great for things like note taking, book development, ideation, etc.
GitHub | Obsidian Dataview

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, just reply back to this email.

Unsubscribe