Tools and Techniques
Great data detective story. Using natural language processing techniques, Jeff Kao, a student at Metis, analyzed comments that were recently submitted to the FCC and discovered that more than a million anti-net-neutrality comments are fakes. This overview of how he did it has been widely shared around the web this week.
Fantastic post on dimensionality reduction that explores an audio dataset in two dimensions. Covers NSynth, UMAP, t-SNE, MFCCs and PCA. This is very well written and includes an interactive web app, lots of code snippets, and visualizations.
Regular expressions are a key tool for exploring large corpuses of text but they can be tricky to master. This tutorial introduces Python’s regex module and shows how it works for common use cases. Next, the tutorial combines regex with Pandas to create flexible and powerful workflows.
Here's a useful stats idea. Use the trimmed mean as a robust estimation of the mean. Does the information that's lost in the process matter? Well, that depends...
Part 2 in a series on using machine learning to predict the weather. This part shows how to build a linear regression model in Python using the data that was collected and processed in Part 1.
Your code isn't just between you and your computer. People need to understand it too. This post is an insightful and actionable guide for making your code easier to work with.
If you're interested in sports analytics, you'll definitely want to check this out. Throne hosts live sports prediction contests for data scientists. It provides users with live competitions, data, features, and backtesting modules to make it easy to use quantitative methods in sports. This post describes the platform and how to get started.