ISSUE 349 · August 17, 2021TrendsThe one data platform to rule them all... but according to whom?Great post that breaks down a lot of the recent startups and organizes them by popular end users (e.g. data scientist, analyst, ML engineer, etc). Since a lot of the data tooling startups eventually want to become "data platforms," this post makes it easy to understand where they all fit. Data Quality UnpackedHere's a broad look at the trends, jobs, and players of the increasingly important and complex role of data quality solutions. Sponsored LinkAccelerate development with healthcare dataJoin a panel discussion with thought-leaders from AWS and healthcare powerhouse companies like QIAGEN and IBM Watson Health. Learn how healthcare datasets can accelerate the bench-to-bedside cycle through enhanced visualizations, machine learning algorithms, and curated genomics data. Tutorials, Projects & OpinionsBootstrapping LabelsMost machine learning tutorials and papers assume the availability of training labels. But what if labeled datasets aren't available? In his latest post, Eugene Yan explores ways to generate labels from scratch using semi, active, and weakly supervised learning. There's a lot here, including examples from DoorDash, Facebook, Google, and Apple. Making better decisions with statisticsTo make better decisions with statistics, here are three key principles to keep in mind. Filter data before reading with awk and RUnix command line tools might be "old school" but you sure can do a lot with them. This tutorial shows how to use the awk utility to quickly pre-process and read lots of bigger-than-RAM csv files without crashing your computer. Invitation - how to make data consumable w/ AWS, Wawa, and FacebookJoin this webinar with data & analytics leaders from Wawa, AWS, and Facebook on how to make data consumable for teams, at scale. Each panelist will share strategies to help you transform cloud data into valuable insights, reduce operational complexity, and scale data access and governance across BI tools. Sign up here. ResourcesModern Statistics with RFrom wrangling and exploring data to inference and predictive modeling, this new text introduces the key parts of the modern statistical toolkit. Free to download. Causal Inference for The Brave and TrueThis open-source book is a light-hearted yet rigorous approach to learning impact estimation and sensitivity analysis. Part 1 contains core concepts and models for causal inference. Part 2 contains modern development and applications of causal inference in industry. CareerChanges to the data science role and toolsThis discussion with Sean Taylor covers a lot of ground and his insights about data science roles and how the field has evolved over the past few years is a worthwhile listen. Sean is a Data Science Manager at Lyft and was previously a research scientist at Facebook where he was key to the development of the open source forecasting library, Prophet. Data VisualizationAn Overview of Dataviz for Categorical DataCategorical data is data that's bucketed into categories. Although it can be plotted using traditional chart types like line plots and scatter plots, other techniques are easier to read and are better at showing the relationships and flows between categorical data points. In this post, Jonathan Dunne explores four chart types to consider. Data-Driven Album CoversThere are a few well-known album covers that feature data visualization but not many that visualize the music itself. In this post, three designers explore the mystery of finding the perfect marriage of music and image. With lots of examples. |