— Sponsored Link —
For Cyber Week only, get over $100 off unlimited access to DataCamp's data science learning library. For those who don't know DataCamp: it's the most intuitive way to futureproof your R, Python, and SQL skills with online courses by expert instructors. Check it out now, offer ends 12/5!
— Tools and Techniques —
Great data detective story. Using natural language processing techniques, Jeff Kao, a student at Metis, analyzed comments that were recently submitted to the FCC and discovered that more than a million anti-net-neutrality comments are fakes. This overview of how he did it has been widely shared around the web this week.
Fantastic post on dimensionality reduction that explores an audio dataset in two dimensions. Covers NSynth, UMAP, t-SNE, MFCCs and PCA. This is very well written and includes an interactive web app, lots of code snippets, and visualizations.
Regular expressions are a key tool for exploring large corpuses of text but they can be tricky to master. This tutorial introduces Python’s regex module and shows how it works for common use cases. Next, the tutorial combines regex with Pandas to create flexible and powerful workflows.
Here's a useful stats idea. Use the trimmed mean as a robust estimation of the mean. Does the information that's lost in the process matter? Well, that depends...
Part 2 in a series on using machine learning to predict the weather. This part shows how to build a linear regression model in Python using the data that was collected and processed in Part 1.
Your code isn't just between you and your computer. People need to understand it too. This post is an insightful and actionable guide for making your code easier to work with.
If you're interested in sports analytics, you'll definitely want to check this out. Throne hosts live sports prediction contests for data scientists. It provides users with live competitions, data, features, and backtesting modules to make it easy to use quantitative methods in sports. This post describes the platform and how to get started.
— Deep Learning —
Nice post about using Deep Learning with structured data. This isn't a typical way to approach structured data but it's a very practical idea and it looks like it works well.
— Data Viz —
Here's part 3 of a series that demonstrates the variety of ways you can work with spatial river and stream data in R. This series is thorough and very well done. If you're interested in this, definitely check out the first two parts too:
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...