— Insight —
In this Fireside Chat, Pedro Domingos and Matt Turck cover a variety of topics, including Domingos' book, The Master Algorithm, scary things about AI, and why finance is a killer app for machine learning. This post on Matt's blog includes a video and notes of the conversation.
— Tools and Techniques —
Polynote is a newly released notebook environment from Netflix that offers big improvements for data science workflows. Polynote provides first-class Scala support, Apache Spark integration, multi-language interoperability, IDE-like editing, native data visualization, kernel visibility, and reproducibility is baked into the design. This announcement walks-through the details and it’s definitely compelling.
Nice stats puzzle: drop marbles into a modified Galton board and watch them collect in the bins below. What does the distribution of marbles say about whether nature (or coin flips) is truly random?
This tutorial on the FloydHub Blog is a great introduction to current approaches for training and defending against adversarial attacks.
— Resources —
This year's edition of Figshare's State of Open Data report summarizes the survey results from over 8500 researchers about topics like data sharing and funding. Surprisingly, many researchers still consider papers to be more important than data but there's a lot of pressure to change that. The report includes the full survey results, alongside a collection of articles from industry experts around the world.
This new book is intended to be a useful resource for a wide range of users from beginners that are interested in learning Apache Spark, to experienced readers that are seeking to understand why and how to use Apache Spark from R. It's free to read online or if you prefer print, order here on Amazon.
— Data Viz —
In his summary of the paper, "Task-Based Effectiveness of Basic Visualizations," Adrian Colyer breaks down how to choose the right visualization type for your data. Ten basic analysis tasks are considered for a variety of common chart types. The paper includes an extensive decision table and, for certain tasks, even pie-charts made the cut.
— Career —
After having candidates work with some data and build a simple model, Alex Gude asks them to come up with a business case for the model. And then the key question: ”How would you measure the success of this model in production?" In this post, Alex offers great insights for thinking through an answer.
"Glue work" is work that needs to be done but doesn't necessarily help anyone move along a career path. It's important work but taking on too many glue tasks can limit your career options. It's especially important to be aware of in Analytics roles because in many places, analytics is glue work. This post is a good starting point for understanding these types of tasks, including ways to make sure that valuable work is properly valued.