— In the News —
One of the most influential ideas to emerge from Cathy O'Neil's 2016 book, The Weapons of Math Destruction, was the idea of creating a 'Hippocratic oath' – an ethical code of conduct – for data scientists to follow. It's a popular idea but hasn't gained much traction in practice. In this interview, she explores how to finally give a Data Science Hippocratic oath some teeth.
Comical short read about how machine learning algorithms will sometimes bend the rules to win. From the aptly named blog, AI Weirdness.
— Profiles —
Donald Richards is a statistician who believes that "in most ventures in life, the odds are against you." Even so, his enthusiasm for statistics is infectious. This is a great profile of the man who lives to uncover subtle patterns that hide in real-world data.
— Sponsored Link —
Mode Studio combines a SQL editor, Python & R notebooks, and a visualization builder in one platform. And it's free forever. Connect data from anywhere and analyze with the best language for the job, without having to jump between tools. Build custom visualizations or use our out-of-the-box charts. Share your analysis with a click—every report lives at a URL.
— Tools and Techniques —
The ability to predict psychological traits from Facebook Likes seems to be constantly in the news these days but how much can you really learn about a person from their Likes? This article by Thomas Ebermann explores the issues and shows how to create a machine learning model to find out.
Misunderstanding the differences between data science and data engineering positions can create a lot of problems for teams and the organizations they work for. This post explores these two key roles, including the skills they each require, how to decide how many of each you need, and common issues that result from not fully appreciating how these roles differ.
This is a great, and growing, list of Recommender Systems. Includes SaaS products, open source recommenders, academic research projects, benchmarking tools, media applications, and recommended reading.
Wikidata serves as the central data storage for the Wikimedia projects, which includes Wikipedia. It contains more than 46 million data items and it's free to use. The one catch is that it's organized as an RDF database that uses the SPARQL query language. This post offers a gentle introduction to what that means, why it's useful, and how to get started with it.
With Udemy, you can. From web development to digital marketing to cooking courses, Udemy has something for everyone. Take top-rated courses from our master instructors, now at a special discount. Every day, students around the world choose Udemy to discover new passions, expand their skills, and even change careers.
— Data Viz —
Data Illustrator is a new tool that lets you bind data to a vector design and create visualizations without programming. It's like Adobe Illustrator, but for data. This is a research collaboration between Adobe Systems and Georgia Institute of Technology and is worth being familiar with. Start with the short video on the home page and then check out the Gallery.
The Data Visualization Checklist is a compilation of 24 guidelines on how graphs should be formatted to best show the story in your data. The 24 guidelines are broken down into 5 sections: Text, Arrangement, Color, Lines, and Overall. In the new interactive version, upload a graphic and you'll be guided through the guidelines to determine an overall score.