— In the News —
A dataset of police department email addresses and passwords have been making the rounds on the Internet this week. In this post, Troy Hunt of Have I Been Pwned, walks through his approach for determining the credibility of the dataset and ultimately defusing it.
With all the recent turmoil in cities around the U.S., you might be fooled into forgetting that there's a global pandemic going on! This interactive post lets you play with the models that were making experts nervous two weeks ago. Given the scale of the protests, it's probably safe to say that the epidemiologists are even more nervous now.
A new study published in Nature shows that highly flexible workflows are contributing to a reproducibility crisis in the sciences. In other words, with so many options for processing data, different teams of researchers can arrive at completely different conclusions - even when using the same data.
— Tools and Techniques —
This multi-part series by Luigi Patruno is a great resource for learning about model deployment. Covers a variety of topics, including common pitfalls, software interfaces, inference, model registries, test-driven ML development, A/B testing and more.
This how-to guide for posting your own interactive tutorials online is very well organized and includes lots of links to resources and examples. It's geared towards R but there's a Python path here too.
Run your notebooks automatically, catch errors faster and serve outputs. Reproducibility for your team without docker files or scripts. Integrates with Github, free for open source.
— Resources —
This popular course by David Beazley offers a broad introduction to working with Python including best practices and lots of tips. It's a practical, learn-by-doing course and is intended for professionals who already have experience coding in at least one other language.
Learn how to use R for advanced uses in data science and statistics via this public-facing course from Cal Poly. Covers version control systems, tools for reproducibility, randomization & bootstrapping, functional programming, dynamic data visualizations and more.
— Data Viz —
Nathan Yau's visualization tutorials are some of the best on the web. In this post, he pulls together a collection of tutorials for visualizing things like uncertainty, incomplete data, outliers, differences, lies and more. If you like these, his membership program is definitely worthwhile.