— In the News —
This is an important and very well articulated essay about the work that's needed to ensure that emerging technologies are beneficial. AI is part of it but the ideas that Michael Jordan present here extend broadly to data science, engineering and society.
Big news from Wes McKinney this week. The creator of pandas has founded an independent development lab with the mission of "innovation in data science tooling." This post describes what it's all about, including some discussion about the challenges and opportunities that open source software creates for the data science community.
— Sponsored Link —
JupyterCon will bring together data scientists, business analysts, researchers, educators, developers, and core Project contributors and tool creators in NYC for in-depth training, insightful keynotes, networking events, and practical talks exploring the Project Jupyter platform. Presented by O'Reilly Media and NumFOCUS Foundation. Save 20% on most passes when you use the discount code ELIXIR20
— Tools and Techniques —
Sweet collection of command line tricks for wrangling data on *nix operating systems (e.g. linux, unix, mac os).
Tom Augspurger's Modern Python series was super popular when he started releasing it nearly two years ago. In this recent installment, Tom shows how to use dask with pandas to work with datasets that are too large to fit into RAM.
Pipeline debt is a type of technical debt that specifically infects data systems. It creates inefficiencies and, more importantly, can put analytic integrity at risk. This article explores the issues that lead to pipeline debt and introduces an open-source project that enables automated pipeline testing. This is a young project but is being actively developed.
Making Sense of Satellite Data, An Open Source Workflow
Here's everything you need to get started with satellite data. This multi-part series by Robert Simmon demonstrates a complete open-source workflow, including ways to access free data and open-source tools for processing and understanding the results:
You already know the longer it takes to detect a problem, the more expensive it is to resolve. Your testing needs to happen earlier in the development pipeline while taking into account all aspects of privacy, security and monitoring. Read this 4-part eBook to learn how to detect problems earlier in your DevOps testing processes.
— Blogs —
I spoke with Data Elixir reader, Nate Cheever, a few months ago about starting a data science blog. Since then, Nate settled on a writing platform and has short-listed a selection of useful posts. There are worthwhile insights and tools here and it's a nice example of a data science blog that's clean and easy to get started with.
— Data Viz —
Xeno.graphics is a collection of innovative and experimental visualizations. The collection is being actively curated by Maarten Lambrechts and it's worth checking out and coming back when looking for ideas. Each example includes screenshots and linked references.
How do you visualize skewed data over hundreds of categories? Xan Gregg created a new chart type to address this question and built this website to showcase how it works, use-cases, examples, variations, references, etc.