— In the News —
Inspired by the Google Books Project, the new Caselaw Access Project from the Library Innovation Lab at Harvard puts the entire corpus of published U.S. case law online for anyone to access for free. The project involved scanning and digitizing 100,000 pages per day over two years. This is a big deal that will enable new analytical insights, research, and applications.
They said it could never be done. The game was too fluid, too chaotic. The players’ movements too difficult to track reliably. But, decades after sports like baseball first embraced statistics, football - known as soccer in the U.S. - is starting to play the data game.
— Sponsored Link —
Online experimentation, or A/B testing, is the gold standard for measuring the effectiveness of changes to a website. But while A/B testing can appear simple, there are many issues that can complicate an analysis. In this presentation, Emily Robinson, data scientist at DataCamp, will cover 10 best practices that will help you avoid common pitfalls.
— Tools and Techniques —
Great discussion about one of the most important aspects of analyzing data - being skeptical of the results. Includes lots of useful examples.
In 2015, machine learning was not widely used at Uber. Just three years later, Uber has advanced capabilities and infrastructure, and hundreds of production machine learning use-cases. This post describes the wide variety of ways that Uber uses machine learning and how they've managed to scale their systems so quickly and effectively.
Andrew Trask's new book, Grokking Deep Learning aims to be the easiest introduction possible to deep learning. Each section teaches how to build a neural component from scratch in NumPy. This repo contains the code examples for each lesson.
Rasmus Rothe, Founder at Merantix, explores the differences between academia and industry when applying deep learning to real-world problems. This article goes into detail about differences regarding workflow, expectations, performance, model design and data requirements.
Vettery specializes in tech roles and is completely free for job seekers. Interested? Submit your profile, and if accepted onto the platform, you can receive interview requests directly from top companies growing their data science teams.
— Resources —
Nice map of R cheatsheets that's organized around common workflows. It's like a cheatsheet for cheatsheets.
Rachel Thomas from fast.ai asked for Python learning recommendations on Twitter and the resulting thread was amazing. These two recommendations, in particular, stand out:
— Career —
Identifing your weaknesses is one of most important things you can do to become effective in your career. In this post, William Koehrsen explores his particular weaknesses as a data scientist and the steps he's taking to overcome them. The approach he models here may be uncomfortable for some but it's a super effective strategy.