— In the News —
A few months ago, a federal judge in California issued a preliminary injunction in favor of a startup that was scraping data from LinkedIn to create its own products. The case is still developing and for a lot of reasons, it's a big deal. This is a great exploration of the players involved and the issues surrounding your online identity and who gets to own it.
Getting AIs to explain their decisions is becoming increasingly important but figuring out how to do that isn't exactly straightforward. Along with technical issues, there are concerns about revealing trade secrets and stifling innovation. This article in MIT Technology Review takes a look at the issues and possible solutions.
— Profiles —
Nice insights here about working as a data scientist at Booking.com. To give you an idea of scale, Booking.com handles 1.5 million transactions per day and has 120+ data scientists so it's not a small operation. This post starts with the interview process and includes details about specific projects, team structure, the people, and some current challenges.
— Tools and Techniques —
David Robinson from Stack Overflow shares some great reasons for starting a data science blog. Check it out and if you decide to create one, let me know and I'll include a list of new blogs in an upcoming issue of Data Elixir.
Data comes in a variety of states. Some datasets are organized and well-labeled while other datasets are messy and come with lots of caveats. Here's a worthwhile scheme for thinking through the state of your data and communicating that state to stakeholders in a way that's easy to understand and useful.
Pandas is one of the most important Python libraries for data science. Here's a step by step approach for mastering it.
Here's a great introduction to the "tidyverse." Includes an overview of “tidy” data principles, examples, key things to know, and worthwhile references.
— Deep Learning —
If you're just getting started with deep learning, here's what you need in terms of hardware, software, skills, and data. This is well-organized and includes lots of useful links.
This tutorial might not make you rich but it is a nice introduction to the concepts of building TensorFlow models.
— Resources —
Here's a must-watch video series about topics in linear algebra. This series is by Grant Sanderson, who runs the popular Animated Math blog.
— Data Viz —
There's a lot involved in creating a data visualization and the process can be messy. In this post, Lisa Charlotte Rost from Data Wrapper describes her approach to a data visualization pipeline and shows how it's effective.
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...