— In the News —
American health care is undergoing a data-driven transformation - and Intermountain Healthcare is leading the way. This case study by MIT Sloan Management Review examines the data and analytics culture at Intermountain, a Utah-based company that runs 22 hospitals and 185 clinics. This is a highly recommended deep dive.
Algorithmic Hiring: Point / Counterpoint
The ubiquity of data has helped fuel several startups and lots of discussion about the value of algorithmic hiring. These two thought-provoking articles discuss opposite approaches and have implications beyond just hiring. Data vs gut... what works, when?
Two years after its launch, Crisis Text Line is swimming in data. But data for data's sake is meaningless. This is a great story about finding value in large datasets - both for the organization involved and for the people doing the analysis work.
Personalization’s image of us is like looking at yourself in the funfair’s house of mirrors. Personalization caricaturizes us and creates a striking gap between our real interests and their digital reflection...
Interesting read about why that is and how to fix it.
— Tools and Techniques —
Worthwhile followup article to last month's Top 10 Data Mining Algorithms In Plain English by Ray Li. This article includes suggested R packages for each of the top 10 algorithms, linked references, and sample code.
Great tutorial showing how to cluster a set of documents using Python. Includes a Github repo with interactive notebook.
Here's a good look into the thought processes and challenges involved with building analytics from the ground up at a mid-sized start-up.
— Resources —
The University of California at Irvine maintains these 300+ datasets that are curated specifically to support machine learning research. It was established in 1987 and has since been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. Each dataset is well documented, including a description, attribute information, and links to relevant papers.
— Data Viz —
This map of historic tornado hot spots is really just an Excel pivot table! Read on to see how to spatial-hack Excel in this devious way.
Nice rundown of some great data visualization books. Along with useful descriptions, this post contains page scans and lots of linked references. This is a great reading list!
— Archive Pick —
This introduction to Bayesian methods and probabilistic programming takes a computation/understanding-first, mathematics-second point of view. It's organized as a "book" of iPython chapters and its Github repo has been has been starred over 8000 times. This is incredibly well-done.