— Insight —
In this essay, Catherine D'Ignazio explores the importance of discovering context when working with a new dataset. This is written with data journalists in mind but the insights here apply to most anyone working with data.
— How-to —
Jupyter notebooks integrate metadata, source code, formatted text, and rich media into a single document, which makes them poor candidates for conventional version control systems. This article explores a variety of ways to version control your notebooks, including built-in solutions and external tools. This is well-organized and includes useful links and examples.
Active learning makes it possible to build applications using a small set of labeled data, and enables enterprises to leverage their large pools of unlabeled data. This post explores how active learning works.
This tutorial walks-through key concepts for working with data in the tidyverse, including the new pivoting functions in tidyr.
Here's how to turn a collection of small building blocks into a versatile tool for solving regression problems.
Mode Studio combines a SQL editor, Python & R notebooks, and visualization builder in one platform. Connect data from anywhere and analyze with your preferred language. Make custom viz (D3.js, HTML/CSS) or use out-of-the-box charts.
— Data Viz —
This free online book shows how to create interactive visualizations for data analysis using R. This book is more than just a how-to guide for building chart elements. It's written with a data science workflow in mind and you'll also get insights into best practices for a variety of visualization types.