Data Elixir logo

ISSUE 331   ·   April 13, 2021

 

Insight

Introducing the MAD (ML, AI, Data) Public Co. Index!

It's rare for articles about investing to make it into Data Elixir but this post by Matt Turck is a nice starting point for data people to invest in what they know. The "MAD Index" is a small basket of publicly traded companies that are as close as possible to being “pure” data or ML/AI. The post covers selection criteria, earnings and key growth metrics.
Matt Turck, VC at FirstMark

 
 
 

The Mathematics of how Connections Become Global

At what point does a meme go viral, a product start to dominate a market or a disease become a pandemic? Here's how a mathematical concept called "percolation theory" can be used to explore network behavior and the effects of randomly adding and/or removing nodes.
Scientific American | Kelsey Houston-Edwards

 

Sponsored Link

apply(): the ML Data Engineering Conference

apply(): the ML Data Engineering Conference
April 21-22

apply() is the first community conference for ML data engineering. This is a free, online event where users and thought leaders will share their experiences. Speakers include thought leaders from Google, Microsoft, LinkedIn, Netflix, DoorDash, Spotify, Pinterest, Stitch Fix, and more. We’d love to have you join us! Register for free.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

Moving beyond “algorithmic bias is a data problem”

A lot of people think that bias is a data problem but algorithms are not impartial and some design choices are better than others. Recognizing how model design impacts bias opens up new mitigation techniques that are far less burdensome than comprehensive data collection.
Patterns | Sara Hooker

 
 
 

Process large datasets w/o running out of memory

Great series of articles that show how to process larger-than-RAM datasets in Python. Covers techniques for code structure, data management, Pandas, NumPy, database querying, and more.
PythonSpeed | Itamar Turner-Trauring

 
 
 

10 Things to Know About BERT and the Transformer Architecture That Are Reshaping the AI Landscape

This post is an awesome overview of some of the most important things that are happening in the fast-paced world of NLP. It's organized in ten sections that show where each technology came from, how it was developed, how it works, and what to expect from it in the near future. The post is fairly high-level with links to key papers to go deeper.
Neptune Blog | Cathal Horan

 
 
 

Monte Carlo’s Ultimate Data Observability Checklist

Building or buying a data observability platform to tackle data quality at your company? Great! But how do you get started? Here are the 26 things you need to know.
// sponsored

 

Code & Tools

Data Connector

Data Connector is a Google Sheets Add-on that lets you import and export data to/from Google Sheets. There are other add-ons that do that too but the focus here is privacy and the code is open-sourced so it can be reviewed. Eventually, it will also support APIs, databases & more.
GitHub | Brent Adamson

 
 
 

MC2: A Platform for Secure Analytics and ML

MC2 is a platform for running secure analytics and machine learning on encrypted data. MC2 enables secure collaboration and lets users outsource their confidential data workloads to the cloud, while ensuring that the data is never exposed unencrypted to the cloud provider.
GitHub |  RISELab, UC Berkeley

 

Data Visualization

Exploring Other {ggplot2} Geoms

ggplot2 has built-in geoms (geometric objects) for common visualizations such as barcharts, histograms, and scatterplots but what if you need something different? This post explores some popular geoms that have been created within the R community, such as streamgraphs, ridgeline plots, sankey diagrams, bump charts, waffle charts and more. Includes screenshots and code snippets for each.
Isabella Velásquez

 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
Data Elixir logo

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

 

To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >>

 
FacebookTwitterLinkedInWebsite
 
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe