Clean code. Bloom filters. Project Silica. Understanding UMAP. Structuring data teams. Data map challenge. U.S. vs privacy.

No Images? Click here

Data Elixir

ISSUE 258   ·   November 5, 2019        

 

In the News

The Government Protects Our Food and Cars. Why Not Our Data?

The United States is virtually the only developed nation without a comprehensive consumer data protection law and an independent agency to enforce it. That may be changing.
New York Times

 
 

Project Silica proof of concept stores Warner Bros. ‘Superman’ movie on quartz glass

Imagine a coaster-sized piece of glass that's encoded with 80GB of data. It can be burned, boiled, dropped, scoured, and microwaved and still, the data will persist, unscathed. This is the future from Microsoft Research and it's coming fast.
Microsoft Innovation

 

Insight

How should I structure my data team? A look inside HubSpot, Away, M.M. LaFleur, and more

For a lot of organizations, the data team is a brand new thing: it’s not “IT”, it’s not finance, it’s not any of the typical business functions within an operating business. So... who does it report to? How does it interact with the rest of the organization? How big is it? Useful insights here.
dbt Blog

 
 

Profiles

Prepping for a Flood of Heavenly Bodies

Just as mathematics transformed physics from a philosophy into a science, data and computation are transforming science today. In this interview, Mario Jurić describes the push to get astronomy ready for the torrents of data that are about to flow.
Quanta Magazine

 

Sponsored Link

Become a Data Scientist in 12 Weeks with Metis

Learn from Industry Experts in the Only Accredited Data Science Bootcamp

Metis's rigorous, immersive program provides the high caliber curriculum, instructors, career support, and network you need to start your career in Data Science. Now available Live Online. Apply early by November 11!

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tools and Techniques

Understanding UMAP

UMAP is a new dimensionality reduction technique that offers increased speed and better preservation of global structure. This interactive article by Andy Coenen and Adam Pearce shows how it works and when it's better to use than t-SNE.
Google PAIR

 
 
 

Coding habits for data scientists

Clean code is less error-prone, easier to share and is faster to work with while iterating solutions. In this post on the Thoughtworks Blog, David Tan offers a well-organized and actionable set of guidelines that will help you identify bad habits and write cleaner, simpler code.
Thoughtworks Blog

 
 
 

nbcommands

Unix commands for Jupyter notebooks!
github/vinayak-mehta/

 
 
 
 

Search Optimization for Large Data Sets for GDPR

In this post, Jaemi Bremner walks-through Adobe's approach for using bloom filters as a fast, cost-effective solution to complete GDPR requests at scale. Includes a discussion of the problems, alternative solutions, and a look at bloom filter performance.
Adobe Tech Blog

 
 
 

Data scientists are in demand on Vettery

Vettery is an online hiring marketplace that's changing the way people hire and get hired. Ready for a bold career move? Make a free profile, name your salary, and connect with hiring managers from top employers today.
// sponsored

 

Resources

Causal Inference: What If

The initial version of the Causal Inference book by Miguel Hernán and Jamie Robins is now freely available online. The book is organized in 3 parts of increasing difficulty: from counterfactuals and causal diagrams to treatment-confounder feedback and g-methods. A print version of the book will be released in 2020.
Harvard School of Public Health

 
 

Online R Books

Awesome collection of R resources that are free to read online. Covers a wide variety of topics including programming, text mining, data visualization, data-driven journalism, spatial data science, etc.
Committed To Tape Blog

 
 

Job Board

Recent Listings:

  • Postdoctoral Fellowships - Stanford Center on Philanthropy and Civil Society - Stanford, CA
  • Manager, Advanced Analytics at MIB Group - Braintree, MA
  • Senior Data Scientist at smava GmbH - Berlin, Germany
  • Data Analyst  at Zalando SE - Berlin, Germany

More >>

 

Data Viz

#30DayMapChallenge

Anyone interested in using data to create interesting maps should definitely pay attention to the #30DayMapChallenge on Twitter this month. At this point, it's only been going for a few days and it's already an amazing stream of ideas, interactives and resources.
Twitter

 

Data Elixir is curated and maintained by @lonriesberg. For additional finds from around the web, follow Data Elixir on LinkedIn, Twitter or Facebook.

 
FacebookTwitterLinkedInWebsite
Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308
  Forward 
Unsubscribe