ISSUE 362 · November 16, 2021TrendsData Librarian: Next Generation Of Data EngineeringSupporting an experimental data science process requires the role of data engineering to evolve. At the least, data engineers need to be data science literate because curating data requires an understanding of what it takes for data to be useful. The role described here is much more than building pipelines and collecting metadata... Sponsored LinkApply today: Master of Data Science for Public PolicyPassionate about data and its role in public affairs? Join the Master of Data Science for Public Policy (MDS) at the Hertie School in Berlin, Germany! This two-year master’s taught entirely in English welcomes students from all academic disciplines who want to use data to solve real-world challenges. Explore the MDS and tune in to our Open Week from 6-10 December to learn more. Tutorials, Projects & OpinionsBuilding confidence in a decisionDecision making under uncertainty can be difficult and with A/B testing, results aren't necessarily a reflection of the underlying truth. Here's how Netflix looks beyond just the numbers to decide if the results of an A/B test are compelling. Why machine learning hates vegetablesIn this personal account of automated "intelligence" going awry, Emily Riederer explores why an ML product would recommend that she stop eating vegetables. Ultimately, it's a cautionary tale of what can happen when out-of-the-box algorithms are applied to real-world problems. Root Causing Data FailuresFinding the root cause of a data quality issue can take a lot of time and once you understand where a problem is, many things need to happen to resolve the issue and get the data back in good shape. But what if you could find and resolve issues automatically? How to think about the ROI of data workGreat post that explores how each role on a data team can have the highest possible impact. It turns out that Graph Theory is a surprisingly useful concept for estimating the ROI of data work. How to Get Strategic Value from Data AnalyticsA significant proportion of organizations are neither making the Code & ToolsdoubtlabDoubt your data, find bad labels. This repository contains general tricks that may help you find bad, or noisy, labels in your dataset. Data VisualizationScientific Visualization: Python + MatplotlibAwesome new open-access book on scientific visualization using Python and Matplotlib. Covers Matplotlib fundamentals, visualization design, advanced topics such as optimization & animation, and concludes with a showcase of examples. Free to download. Clarity and Aesthetics in Data VisualizationGreat set of guidelines to help make your visualizations visually appealing and easy to understand. Includes lots of examples. |