Issue 316
— Insight —
You Can’t Escape Hyperparameters and Latent Variables: ML as a Software Engineering Enterprise
Great "keynote" from NeurIPS 2020 about the past, present and future of ML research — especially as it pertains to algorithmic bias. It's 45 minutes but don't let that deter you. This is a well-paced presentation that's full of insightful conversations with researchers, engineers, and data scientists. Starts at 29:04.
The raw content of Jupyter Notebooks is a mix of dissimilar source code, Markdown, and HTML, making Jupyter notoriously challenging for code reviews. This tutorial shows how to use an open-source utility called nbautoexport to simplify the process. Machine learning models use large amounts of data, some of which can be sensitive. If the models aren't trained correctly, sometimes that data is inadvertently revealed. This interactive essay shows how. Data catalogs aren't cutting it anymore when it comes to metadata management and data governance. Here's how data discovery can help. Great walk-through of the evolution of the Financial Times data platform. Includes detailed timelines and decisions along the way. This post bills itself as a "deep dive" into production machine learning monitoring but it's organized in a way that also makes it easy to get a high level view of what ML monitoring is all about. Covers outlier detectors, drift detectors, metrics servers and explainers. The fastest time-to-results for deep learning training and visualization - AMAX GPU workstations with latest AMD EPYC™ Processors provide unprecedented data-transfer speed and performance for data and graphic intensive applications. Special pricing for academia/startups. // sponsored
— Tools and Techniques —
Easier Code Reviews For Jupyter Notebooks
Why Some Models Leak Data
Data Catalogs Are Dead; Long Live Data Discovery
Financial Times Data Platform: From zero to hero
End-to-end Production ML Monitoring
AMAX: Revolutionary GPU solutions for data insights at deskside
This GitBook accompanies a semester-long course that focuses on the "hard" part of data science. The assumption is that you already have a background in statistics, programming and the basics of reproducible research. The focus in this course is on synthesizing those tools into a data analysis and then communicating that analysis to an audience. Papers with Code recently launched new sites for statistics, math, computer science, and more. Use these sites to discover new projects and/or sync your code to show on arXiv. See the post for details.
— Resources —
Advanced Data Science 2020
Papers with Code is Expanding
Blog Opera is a surprisingly addictive machine learning experiment by David Li in collaboration with Google Arts and Culture. Nothing to learn here. Just a fun machine learning toy to brighten up your holidays! ✨
— Inspiration —
Blob Opera
No spam, ever.