— In the News —
Aleksandr Kogan is a researcher at Cambridge University and one of the creators of the app that harvested profile data from millions of Facebook users. He's one of only a few people that would know the details behind what Cambridge Analytica did with its massive trove of Facebook data. When asked in an email how the model worked, he answered! This article by Matthew Hindman is an awesome exploration of his response, including background info and linked references.
The media is great at portraying AI as a self-aware menace to be reckoned with. In this post, François Chollet takes a different tact. In his view, the real concern is the "highly effective, highly scalable manipulation of human behavior that AI enables, and its malicious use by corporations and governments." This is a great read.
With 14.5 million active users ordering from 80,000 restaurants, you might think that building a food recommendation engine would be fairly straight-forward. But when Grubhub's data team set out to build their own taxonomy of food, they discovered that the only thing that millions of menu items had in common was that sometimes people ate them. This article in Wired tells their story of innovation, failure, a cookbook author, and eventual success.
— Sponsored Link —
Data is the key to solving some of the world’s most challenging problems, and the need for professionals who can understand and manage that data is growing every day. The UC Berkeley School of Information is meeting that need with [email protected], a Master of Information and Data Science degree program delivered online.
— Tools and Techniques —
The reticulate package embeds a Python session within an R session and provides a comprehensive set of tools for interoperability between Python and R. Reticulate lets you do things like import Python modules; source Python scripts; translate between R and Pandas data frames; translate between R matrices and NumPy arrays, etc.
Google Sheets can be super useful, especially if you strategize a bit beforehand. This article describes four key considerations and why, ultimately, 80% of sheet design should be for editors and only 20% for data scientists.
Data Wrangling with dplyr
dplyr is an R package for data manipulation and it's an important part of the tidyverse. To create this series of posts, Suzan Baert went through the entire dplyr documentation and organized key functions into this cookbook-like reference:
Run Keras and TensorFlow models in a browser! Compatible with the Keras API with support for importing pre-trained models, client-side model creation and GPU acceleration. Data stays on users' devices, making TensorFlow.js useful for low-latency inference, as well as for preserving privacy. This post introduces this new library with links to live examples.
— Resources —
Hadley Wickham is teaching a new class at Stanford this Spring that will consider key readings in applied data science. Topics include things like Data Collection & Collaboration, Ethics, Workflows, Reproducibility, Industry, Career, etc. Participation in the class is limited to on-campus students but this syllabus includes a complete list of links to the readings.
— Career —
Along with contributing to causes you care about, volunteering for a Non-Profit Organization (NPO) can be a great way to develop your skills. In this post, Jesse Maegan explores what it's like to do data science for NPOs and how to find opportunities. Includes lots of links to useful resources and organizations.
Kristen Kehrer looks back at the first several years of her career in data science. This is a candid look at her work life, including details about her day-to-day work, tools she uses, things she wasn't prepared for, opportunities, and her career trajectory.