— In the News —
The United States is virtually the only developed nation without a comprehensive consumer data protection law and an independent agency to enforce it. That may be changing.
Imagine a coaster-sized piece of glass that's encoded with 80GB of data. It can be burned, boiled, dropped, scoured, and microwaved and still, the data will persist, unscathed. This is the future from Microsoft Research and it's coming fast.
— Insight —
For a lot of organizations, the data team is a brand new thing: it’s not “IT”, it’s not finance, it’s not any of the typical business functions within an operating business. So... who does it report to? How does it interact with the rest of the organization? How big is it? Useful insights here.
— Profiles —
Just as mathematics transformed physics from a philosophy into a science, data and computation are transforming science today. In this interview, Mario Jurić describes the push to get astronomy ready for the torrents of data that are about to flow.
— Tools and Techniques —
UMAP is a new dimensionality reduction technique that offers increased speed and better preservation of global structure. This interactive article by Andy Coenen and Adam Pearce shows how it works and when it's better to use than t-SNE.
Clean code is less error-prone, easier to share and is faster to work with while iterating solutions. In this post on the Thoughtworks Blog, David Tan offers a well-organized and actionable set of guidelines that will help you identify bad habits and write cleaner, simpler code.
Unix commands for Jupyter notebooks!
In this post, Jaemi Bremner walks-through Adobe's approach for using bloom filters as a fast, cost-effective solution to complete GDPR requests at scale. Includes a discussion of the problems, alternative solutions, and a look at bloom filter performance.
— Resources —
The initial version of the Causal Inference book by Miguel Hernán and Jamie Robins is now freely available online. The book is organized in 3 parts of increasing difficulty: from counterfactuals and causal diagrams to treatment-confounder feedback and g-methods. A print version of the book will be released in 2020.
Awesome collection of R resources that are free to read online. Covers a wide variety of topics including programming, text mining, data visualization, data-driven journalism, spatial data science, etc.
— Data Viz —
Anyone interested in using data to create interesting maps should definitely pay attention to the #30DayMapChallenge on Twitter this month. At this point, it's only been going for a few days and it's already an amazing stream of ideas, interactives and resources.