— In the News —
By now, everyone in tech knows that machine learning is a Next Big Thing but what does that mean, exactly? This post by Benedict Evans draws useful parallels between the machine learning landscape of today and breakout technologies of the past to derive practical expectations for machine learning applications.
Baseball has a reputation for it's effective use of analytics but that didn't just happen when the data pros showed up. As Jeff Luhnow shares in this interview, the human side of analytics can be more important than the numbers and it took a few years of building trust before the Houston Astros were able to win with analytics. There are useful insights here for organizations of all types.
— Sponsored Link —
In just a few days at JuypterCon in NY (Aug 21-24), you'll learn how to build a flexible, future-proof, and highly scalable shared data infrastructure; create reproducible and iterative analysis, and leverage these powerful tools for sharing and communicating data analysis. Discover how the world’s most successful, data-driven organizations are implementing Jupyter tools across their enterprises to accomplish mission-critical data projects. Early Price ends June 29. Save an additional 20% on most passes with the code: ELIXIR20
— Tools and Techniques —
Here's the third and final part of Robert Chang's popular series about data engineering from the perspective of a data scientist. This part introduces data engineering frameworks, including design patterns and specific examples that are commonly used at Airbnb. This series is a great introduction to the engineering concepts that data scientists should be familiar with. In case you missed them, here are the previous parts:
The next time someone asks you for a definition of data science, point them to this review paper.
Fun, and very popular, tutorial that explores the world of Mario Kart. Ultimately, it seeks to determine who the "best" character is and along the way, does a very nice job of exploring a fan-generated dataset. Who's the best? Well, that depends...
Anyone interested in Natural Language Processing should definitely check this out. Sebastian Ruder is maintaining a repository to keep track of datasets and the latest state-of-the-art for common NLP tasks. It's only a few days old and already, the GitHub repo for this project has been starred over 2,000 times.
This post identifies key features of PostgreSQL that are super helpful but not commonly known.
Discovering, responding to, and resolving incidents is a complex endeavor. Read this narrative to learn how you can do it quickly and effectively by connecting AppDynamics, Moogsoft and xMatters to create a monitoring toolchain.
— Data Viz —
Uncertainty information makes visualizations more complex but if you don't integrate it correctly, then end-users may ignore the variation in the data and make incorrect conclusions. Value-suppressing uncertainty palettes is a technique from the UW Interactive Data Lab that's straight-forward; both for designers to implement and for end-users to understand.
DIVE integrates semi-automated visualization and statistical analysis features into a unified workflow. This new data exploration tool is an open-source project from the MIT Media Lab.