— In the News —
Kaggle recently conducted an industry-wide survey to capture the state of data science and machine learning. They received over 16,000 responses and learned a ton about who is working with data, what’s happening at the cutting edge of machine learning across industries, and how new data scientists can best break into the field. Here are the highlights.
Amanda Richardson is the Chief Data and Strategy Officer for HotelTonight and is a keen observer of how other companies are increasingly leveraging data to make decisions. What she sees are a number of trends and approaches to data that aren’t just lazy, but also ineffective, misleading, and costing companies a ton of opportunity.
— Sponsored Link —
MLconf San Francisco is next Friday. The event will host talks from Slack, Tinder, AWS, Google Magenta, OpenAI, Facebook, Uber and some cool startups. Save $75 off with promo code "DE75" when registering.
— Tools and Techniques —
Unit tests can make or break your algorithm and can save you weeks of debugging and training time. Here are some practical considerations (with code examples) for writing unit tests for your machine learning code.
Google opened up an internal data analysis tool this week that combines text & code and sends results to a single collaborative document. It works a lot like Google Docs but it's a Jupyter Notebook environment that requires no setup to use. Definitely check this out.
Great visual introduction to the statistical concept of Hierarchical Modeling.
A/B testing is an essential part of decision-making at Stack Overflow and its used for practically every new feature, including design changes and machine learning models. In this post, Julia Silge explores how their A/B tests are designed and how they assess the inevitable uncertainties.
Bounter is a Python library that enables extremely fast probabilistic counting of item frequencies in massive datasets, using only a small fixed memory footprint. Unlike dict or Counter, Bounter can process huge collections where the items would not fit in RAM.
— Resources —
From the Awesome Series, here's a curated list of tools and resources that are related to the use of machine learning for cyber security.
— Deep Learning —
Here's an open-source Docker image with a full reproducible deep learning research environment. It contains most popular deep learning frameworks: theano, tensorflow, sonnet, pytorch, keras, lasagne, mxnet, cntk, chainer, caffe, torch.
— In Case You Missed It —
Be sure to catch the most popular links from last week's issue...
— About —
Data Elixir is curated and maintained by @lonriesberg. If some awesome person forwarded this issue to you, subscribe for free at dataelixir.com and get it delivered every week.