— In the News —
Tim Berners-Lee announced a new platform that gives users complete control over their data. The platform works on the existing web but within its ecosystem, you decide where you store your data. Partly because of the people involved, this project is worth paying attention to.
ICUs are one of the most expensive parts of the medical system. The sickest patients often receive round-the-clock care and are typically attached to multiple machines and monitors. There's also vast amounts of data, which makes ICUs a prime target for AI disruption. In this article, the founders of Autonomous Healthcare explore the problems and opportunities.
— Sponsored Link —
Mode Studio combines a SQL editor, Python & R notebooks, and a visualization builder in one platform. And it's free forever. Connect data from anywhere and analyze with the best language for the job, without having to jump between tools. Build custom visualizations or use our out-of-the-box charts. Share your analysis with a click—every report lives at a URL.
— Tools and Techniques —
In this post, Robert Chang shares what he's learned about machine learning that you won't get from school or online competitions. Includes things like problem definition, feature engineering, model debugging, productionization, and dealing with feedback loops.
Here's a great overview of what Julia is and why it's worth learning.
Gradient boosting is widely used in industry and there are many online tutorials that show how it works. But off-the-shelf loss functions are not always well-tuned to the problems being solved. This post explores the importance of custom loss functions in real-world problems and how to implement them using the LightGBM gradient boosting package. This is easy to follow and includes useful linked references.
Search Engine Optimization (SEO) is an important activity for online growth businesses but measuring the effectiveness of SEO efforts is tricky. In this post, Brian de Luna from Airbnb explores the issues and introduces a "market-level approach" to measure landing page effectiveness.
Andrew Ng's Machine Learning course is one of the best and most popular online introductions to machine learning. Unfortunately, the official programming assignments require Octave or Matlab. This repository contains python versions of the assignments using Jupyter Notebook. The notebooks contain embedded assignment instructions, python starter code and a script that submits them for grading.
Easily Build Scalable Web Scrapers with Scraper API. Handles proxies, browsers, and CAPTCHAs, so you can get the HTML from any web page with a simple API call. Get started with 1000 free API calls.
— Resources —
This new set of online data science courses from the Johns Hopkins Data Science Lab aims to make it easy for anyone with a browser to learn data science. Courses include project organization, version control, data tidying, data visualization, data access, data analysis, and communication in data science. The courses are self-paced and you can take them for free.
— Data Viz —
This deep dive shows how to use decision trees and introduces a new scikit-learn package for decision tree visualization and model interpretation. This is how to do decision trees.
Ian Johnson, a DataVis UXE engineer at Google, explores how a relatively new set of tools can change the way we explore large datasets.