— In the News —
This new open-source tool from Mozilla is designed to enable data scientists and engineers to create and share interactive documents in a browser. It's currently in Alpha but there are already a number of examples to play with.
The latest AI algorithms are probing the evolution of galaxies, calculating quantum wave functions, discovering new chemical compounds and more. But they're not perfect and we're far from having artificial Einsteins. This article in Quanta Magazine explores what AI can and can't do to help make sense of the enormous amounts of data being collected in the sciences.
This year's interactive March Madness bracket from FiveThirtyEight includes conditional probabilities! If you're not in the U.S. and not familiar with it, March Madness is a 68-team, single-elimination 🏀 tournament and is one of the premier events in all of sports.
— Sponsored Link —
Vettery specializes in tech roles and is completely free for job seekers. Interested? Submit your profile, and if accepted onto the platform, you can receive interview requests directly from top companies growing their data science teams.
— Tools and Techniques —
This article by Paco Nathan introduces ~50 of the most popular Python libraries and frameworks that are used in data science. In addition to short descriptions that include useful links, each library and package is presented in a landscape diagram that shows how they all fit together.
This retrospective describes the challenges and successes of a large machine learning project that involved two data scientists, two back-end engineers and a data engineer. It's not a long post but it offers practical lessons for developing complicated models with a team.
Here's a creative approach for labeling training data using organizational knowledge as weak supervision. The project is called Snorkel Drybell and is a collaboration between Google, Stanford and Brown University.
If you're not a regular R user, you may not realize that it does a lot more than data analysis. This short post on the Simply Stats blog highlights 10 awesome things you can do with R that you may not know about.
Nice walk-through that shows how to approach a dataset you haven't seen before.
It takes a lot of data to train an algorithm that is production-ready. Data that is labeled and annotated with near-flawless quality. If you’re not sure what it takes to produce enormous volumes of quality training data, you need our Blueprint.
— Resources —
Want to become a better Pythonista? PyCoder’s is a free, weekly email newsletter for those interested in Python development and various topics around Python and the community.
— Data Viz —
In this interview on the Multiple Views blog, Ben Shneiderman shares his perspectives on the history and practice of data visualization. Among the gems: "visualization is such a powerful amplifier of human abilities that it should be illegal, unprofessional, and unethical to do data analysis using only statistical and algorithmic processes." Includes advice for newcomers to the field and linked references.