No images? Click here

Data Elixir

ISSUE 308 ·   October 20, 2020        

 

Insight

Emerging Architectures - Modern Data Infrastructure

20+ experts, 1 architecture, 3 blueprints. This is a great article from Andreessen Horowitz that explores how data architectures are evolving as systems become increasingly based on data, rather than code. The article came out of discussions with dozens of practitioners.
a16z | Matt Bornstein, Martin Casado, and Jennifer Li

 
 
 

When do we trust recommenders more than people?

Here's a twist on algorithmic bias. Recent research shows that it's not just the machines that are biased. People hold biases towards machines. Sometimes they trust the machines. Sometimes not. Understanding "why" is essential for developing effective Recommenders.
Harvard Business Review

 
 

Sponsored Link

Global Master of Management Analytics from Smith School of Business at Queen’s University

Future-Proof Your Career, While You Work

The Global Master of Management Analytics from Smith School of Business at Queen’s University is a 12-month program that can be taken from anywhere in the world.  Master the essential strategies for applying analytics to business needs in this ground-breaking program.

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

An Illustrated Guide to Superlearning in R

Why use one machine learning algorithm when you could use all of them?! This post contains a step-by-step walk-through of how to build a superlearner prediction algorithm in R.
KHstats | Kat Hoffman

 
 
 

Structural Time Series

Structural time series display common characteristics, such as a general upwards or downwards trend; repeating, and potentially nested, patterns; or sudden spikes or drops. This applied research report shows how to work with and model structural time series.
Cloudera Fast Forward

 
 
 

Dockerfile Security Best Practices

Good starting point for mitigating security risks with Dockerfiles. For more, see the Hacker News discussion for this article >>
Cloudberry Engineering

 

Code & Tools

• VisiData 2.0 is a major new release of this popular multi-tool for tabular data.  "Combines the clarity of a spreadsheet, the efficiency of the terminal, and the power of Python, into a lightweight utility."

• tldrstory is a python framework for AI-powered understanding of headlines and text content related to stories.

• stacks is an R package for model stacking that aligns with the tidymodels. Model stacking is an ensembling method that takes the outputs of many models and combines them to generate a new model

• waldo is designed to find and concisely describe the difference between a pair of R objects.

 

Resources

Awesome Data Engineering

Very well organized, choose-your-own learning path for data engineering topics. Covers things like SQL, database design, noSQL, data warehouses, data processing, pipelines, security, and more.
Snir David

 
 
 

Stanford MLSys Seminar Series

This new seminar series explores the frontier of machine learning systems and how machine learning changes the modern programming stack. The goal is to help curate a curriculum of awesome work in ML systems. Lectures start in the Fall and will be streamed for free online.
Stanford University

 

Conferences

VIS 2020

The IEEE VIS conference is one of the best data visualization conferences in the world and this year, it will be online and free. This is a technical conference that explores advances in visualization and visual analytics and there's a LOT to explore here.

In particular, check these out:

  • Visualization in Data Science
  • Conference overview, history, and coming changes
  • @ieeevis on Twitter for links to key papers and project sites

VIS 2020

 

And Finally...

Covid-19: The global crisis — in data

As part of a major new series, the Financial Times has compiled chronological chapters of the Covid-19 crisis using data from around the world. It's free to access and includes lots of data viz for key trends.
FT Visual & Data Journalism team

 

Data Elixir is curated and maintained by Lon Riesberg. If you need help on a data project or have a suggestion for the newsletter, reply back to this email or grab a spot on my calendar >>

 
 
FacebookTwitterLinkedInWebsite
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
Unsubscribe