— In the News —
Algorithms have gotten a lot of bad press in the past few years. Recent books include titles such as Weapons of Math Destruction, Automating Inequality, and The Black Box Society. Longform articles have followed a similar path and by now, it should be clear that algorithms can't really be trusted. Humans can't really be trusted either though, right? What does the research say?...
This cover story from the Spring/Summer issue of Columbia Magazine explores a wide variety of data science applications in medicine. This is a longread that specifically highlights work at Columbia University but as you'll see, they cover a lot of ground.
What would happen if people controlled their own data and businesses were required to pay for access? There are some interesting ideas in this recent article from the Economist and some businesses are actually trying to broker such schemes. But even if it works, you're not likely to get rich anytime soon.
— Insight —
Organizations often look for "data scientists" but what do they really need? In this post, Elena Grewal, the Head of Data Science at Airbnb, suggests that organizations consider three distinct data science tracks — Analytics, Inference, and Algorithms. This is a super useful post for organizations trying to keep up in this rapidly evolving space.
— Sponsored Link —
Interested in the future of the data science profession, including whether it will even exist in 10 years? Dr. Hugo Bowne-Anderson, host of DataFramed, a DataCamp podcast, recently interviewed 30 thought leaders. He discovered common themes and lessons about the past, present, and future of data science. Sign up now for the live webinar.
— Tools and Techniques —
Here's a great story about the history of R, its culture, where it's going and why it matters. This is a well-timed article since August marks the 25th anniversary of the creation of R.
I missed this when it came out a few months ago but it's worth spending some time with. This step-by-step tutorial uses a text classification problem to show how to use the Data Version Control (DVC) framework. DVC is an open-source framework that helps to manage the reproducibility of machine learning projects.
This has become a popular introductory tutorial for Natural Language Processing (NLP). Starts by describing 8 key steps in an NLP pipeline and then continues with a step-by-step guide for building that pipeline in Python.
Great overview of ways to determine how healthy an open source project may be. Skim it now and then bookmark it for the next time you're thinking about either using or getting involved with an open source project.
— Data Viz —
Matthew Kay's presentation from this year's OpenVis Conf explores uncertainty visualization and shows how to make it an easier - and fun - design space. This is a great talk. For more from this conference, check out the OpenVis Conf Playlist on youtube.
deckard is a new R package that provides an interface to Uber's deck.gl data visualization framework. It enables users to render millions of points interactively from R and vizualise that data either in the Viewer, Markdown Docs or via Shiny. deckard is well-suited for geospatial data and is also being used to visualize modeling output.