— In the News —
You can do awesome and meaningful work with data science but there's a lot of bad you can do too. This essay by Rachel Thomas from fast.ai explores the ethics and responsibilities of being a data scientist.
— Sponsored Link —
Join data experts from Daimler, Goldman Sachs, Slack, Pfizer, Ubisoft, Dataiku & more in NYC on November 30th at EGG - the non-conforming conference. Share stories of scaling data initiatives & teams and a great event. Register here and save 20% on tickets.
— Tools and Techniques —
SQLite is a self-contained, serverless database that's used to provide local data storage for applications and devices. A SQLite database can be stored in memory or as a standalone file and is super useful for a variety of things that aren't well suited for client/server databases. This guide by Mark Litwintschik explores SQLite's key features and how to use them.
Chris Moody recently wrote about calculating word vectors using word counts and matrix factorization rather than deep learning. It's a practical approach that uses techniques that are familiar to most data scientists and is easy to understand. In this post, Julia Silge shows how to implement the technique using tidy data principles and sparse matrices.
When data science projects move from being research endeavors to production endeavors, there are some key questions to think through when designing the infrastructure. This post by Adam Fletcher of Gyroscope Software offers good insights for making sure your production data science process is reproducible and predictable.
— Deep Learning —
Last month, Quanta Magazine published a puzzle that explored how deep learning networks are so powerful. The puzzle was super popular and helped readers understand how neural nets really work. This month, they reposted the puzzle, along with the best solutions.
Part 3 of the wildly popular video series by Grant Sanderson. This episode explores backpropagation and shows what's actually happening in a neural network as it learns.
— Datasets —
If you have a use for geospatial data, you'll definitely want to check out this collection of datasets that are hosted on AWS. These are all publicly available elsewhere but having them on AWS greatly facilitates access. Includes SpaceNet machine learning imagery, Digital Globe Open Data, Next Generation Weather Radar data, Landsat data, terrain tiles, etc.
— Data Viz —
Turn geospatial polygons like states, counties or local authorities into regular or hexagonal grids automatically. This is very well documented, including links to examples and a great overview of why this is worth having in your toolkit.
Vega-Lite is a high-level language for rapidly creating interactive visualizations. This article explores the latest release, which adds support for flexible combinations of charts and interactions such as panning, zooming, interactive filtering, and linked selection. After reading the post, check out the examples here and here.
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...