— In the News —
The nerd in me is intrigued by the Google/Apple plan to enable contact tracing via smart phones. Global pandemic? Tech to the rescue! But is it really a good idea? Here's a reasonable look at how well this anonymous data sharing scheme might actually work and what the risks are.
The most important position in football is also one of the most difficult to analyze. But thanks to new models and data-inclined front offices, teams are closer to predicting QB prospects’ success than ever before.
— Tools and Techniques —
Awesome introduction to Rt, a key measure of how fast a virus is reproducing. This notebook is also the basis for a new site that shows the progression of Rt for each U.S. state over time: https://rt.live
Many things that we think of as exponential functions will actually follow an s-curve. COVID-19 cases, for instance, will eventually become an s-curve because otherwise, the system would reach infinity. This is a short and clear post that shows why s-curves are so hard to model.
Here's an analogy that makes it easy to understand backpropagation. "Imagine you're a project manager, somewhere deep inside a vast company..."
PyCaret is an open source, low-code machine learning library that aims to make it faster and easier to develop models. Essentially, it's a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, Microsoft LightGBM, spaCy, and more.
Tidymodels provides a comprehensive framework for machine modeling in R. Whether you are just starting out today or have years of experience with modeling, tidymodels offers a consistent, flexible framework for your work. Here's how to get started.
Clean data, analyse, visualise, make shareable templates and create beautiful reports. Nocode needed. Become a citizen data scientist in a few clicks. Made by the clever & funny folks at Gyana.
— Resources —
This newly released research report from Cloudera Fast Forward is a nice introduction to machine learning interpretability. Includes an overview of interpretability, interviews, a working prototype that demonstrates LIME, and discussion of challenges and what's to come.
The PDF version of this best-selling text by Sheldon Axler is free to download through July.
— Data Viz —
Venn diagrams are good for showing where a small number of sets intersect but they don't work well with complex intersections of many sets. Enter, UpSet plots. This is a great introduction to the technique, including examples and links to related resources.
Real-world data is always messy but COVID-19 data takes that up a notch and has surfaced some issues that have been floating around the data visualization community for awhile. Uncertainty viz, in particular, is tricky. This is a nice overview of some of the challenges and approaches.
— Career —
Edouard Harris is a co-founder of SharpestMinds, which mentors data scientists while they're looking for jobs. That gives him a unique perspective of the job market, which is far from typical these days. Here's what he's seeing.