ISSUE 331 · April 13, 2021
It's rare for articles about investing to make it into Data Elixir but this post by Matt Turck is a nice starting point for data people to invest in what they know. The "MAD Index" is a small basket of publicly traded companies that are as close as possible to being “pure” data or ML/AI. The post covers selection criteria, earnings and key growth metrics.
Matt Turck, VC at FirstMark
At what point does a meme go viral, a product start to dominate a market or a disease become a pandemic? Here's how a mathematical concept called "percolation theory" can be used to explore network behavior and the effects of randomly adding and/or removing nodes.
Scientific American | Kelsey Houston-Edwards
apply() is the first community conference for ML data engineering. This is a free, online event where users and thought leaders will share their experiences. Speakers include thought leaders from Google, Microsoft, LinkedIn, Netflix, DoorDash, Spotify, Pinterest, Stitch Fix, and more. We’d love to have you join us! Register for free.
Tutorials, Projects & Opinions
A lot of people think that bias is a data problem but algorithms are not impartial and some design choices are better than others. Recognizing how model design impacts bias opens up new mitigation techniques that are far less burdensome than comprehensive data collection.
Patterns | Sara Hooker
Great series of articles that show how to process larger-than-RAM datasets in Python. Covers techniques for code structure, data management, Pandas, NumPy, database querying, and more.
PythonSpeed | Itamar Turner-Trauring
This post is an awesome overview of some of the most important things that are happening in the fast-paced world of NLP. It's organized in ten sections that show where each technology came from, how it was developed, how it works, and what to expect from it in the near future. The post is fairly high-level with links to key papers to go deeper.
Neptune Blog | Cathal Horan
Building or buying a data observability platform to tackle data quality at your company? Great! But how do you get started? Here are the 26 things you need to know.
// sponsored
Data Connector is a Google Sheets Add-on that lets you import and export data to/from Google Sheets. There are other add-ons that do that too but the focus here is privacy and the code is open-sourced so it can be reviewed. Eventually, it will also support APIs, databases & more.
GitHub | Brent Adamson
MC2 is a platform for running secure analytics and machine learning on encrypted data. MC2 enables secure collaboration and lets users outsource their confidential data workloads to the cloud, while ensuring that the data is never exposed unencrypted to the cloud provider.
GitHub | RISELab, UC Berkeley
ggplot2 has built-in geoms (geometric objects) for common visualizations such as barcharts, histograms, and scatterplots but what if you need something different? This post explores some popular geoms that have been created within the R community, such as streamgraphs, ridgeline plots, sankey diagrams, bump charts, waffle charts and more. Includes screenshots and code snippets for each.
Isabella Velásquez
Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.
To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >>
|