Issue 310
— Insight —
In the Age of AI
Looking for something good to watch? This Frontline documentary explores how artificial intelligence is changing life as we know it — from jobs to privacy to a growing rivalry between the U.S. and China.
Great post by Eugene Yan about all the data discovery platforms out there. Here's an overview of the questions they each help answer, key features and how they compare. In addition to well-known proprietary platforms, he covers open-source solutions such as LinkedIn's DataHub, Lyft's Amundsen, Netflix's Metacat and Apache Atlas. When people consider doing AI/ML-for-Good projects, they often think about use-cases for predictive models. But on a practical level, smart methods for extracting data from forms would do more for journalism, climate science, medicine, democracy etc. than almost any other application. Here's a great overview of the issues and opportunities. This introduction to using mixed models in R covers the most common techniques with demonstration primarily via the lme4 package. Discussion includes extensions into generalized mixed models, Bayesian approaches, and more. Awesome interactive tool for learning how correlations work. Be sure to click the ⚙️ icon for options. Orbit is a Python package for time series modeling and inference using Bayesian sampling methods for model estimation. It provides a familiar and intuitive initialize-fit-predict interface for working with time series tasks, while utilizing probabilistic modeling under the hood via PyStan. TimescaleDB is an open-source database for time series data that looks like Postgres but is optimized for fast ingest and complex queries. With this major new release, TimescaleDB introduces the first multi-node, petabyte-scale relational database for time-series. This 2.0 post is a nice overview of the project and what you can do with it. For VS Code users, Stories is a simple way of sharing code snippets. This is a free extension that's gaining traction fast.
— Tools and Techniques —
An Overview of Data Discovery Platforms
To apply AI for good, think form extraction
Mixed Models with R
Interpreting Correlations
Orbit - Bayesian time series modeling and inference
TimescaleDB 2.0
Stories for Visual Studio Code
This guide to publicly available resources is a good place to start for accessing and exploring U.S. political data. Includes election returns for presidential and congressional races, political ideology scores for U.S. lawmakers, and characterizations of U.S. congressional districts.
— Resources —
American Political Data and R
Awesome interactive that shows how correlations between states differ between the FiveThirtyEight and Economist election models. This is an engaging and well-crafted visualization.
— Data Viz —
Election Forecast Correlations
No spam, ever.