Issue 155
— In the News —
The 2017 State of Data Science & Machine Learning
Kaggle recently conducted an industry-wide survey to capture the state of data science and machine learning. They received over 16,000 responses and learned a ton about who is working with data, what’s happening at the cutting edge of machine learning across industries, and how new data scientists can best break into the field. Here are the highlights.
The Four Cringe-Worthy Mistakes Too Many Startups Make with Data
Amanda Richardson is the Chief Data and Strategy Officer for HotelTonight and is a keen observer of how other companies are increasingly leveraging data to make decisions. What she sees are a number of trends and approaches to data that aren’t just lazy, but also ineffective, misleading, and costing companies a ton of opportunity.
MLconf San Francisco is next Friday. The event will host talks from Slack, Tinder, AWS, Google Magenta, OpenAI, Facebook, Uber and some cool startups. Save $75 off with promo code "DE75" when registering.
— Sponsored Link —
MLconf - The Machine Learning Conference - November 10th
Unit tests can make or break your algorithm and can save you weeks of debugging and training time. Here are some practical considerations (with code examples) for writing unit tests for your machine learning code. Google opened up an internal data analysis tool this week that combines text & code and sends results to a single collaborative document. It works a lot like Google Docs but it's a Jupyter Notebook environment that requires no setup to use. Definitely check this out. Great visual introduction to the statistical concept of Hierarchical Modeling. A/B testing is an essential part of decision-making at Stack Overflow and its used for practically every new feature, including design changes and machine learning models. In this post, Julia Silge explores how their A/B tests are designed and how they assess the inevitable uncertainties. Bounter is a Python library that enables extremely fast probabilistic counting of item frequencies in massive datasets, using only a small fixed memory footprint. Unlike dict or Counter, Bounter can process huge collections where the items would not fit in RAM.
— Tools and Techniques —
How to unit test machine learning code
Colaboratory - shared Jupyter notebooks via Google
An Introduction to Hierarchical Modeling
From Power Calculations to P-Values: A/B Testing at Stack Overflow
Bounter - Counter for large datasets
From the Awesome Series, here's a curated list of tools and resources that are related to the use of machine learning for cyber security.
— Resources —
Awesome Machine Learning for Cyber Security
Here's an open-source Docker image with a full reproducible deep learning research environment. It contains most popular deep learning frameworks: theano, tensorflow, sonnet, pytorch, keras, lasagne, mxnet, cntk, chainer, caffe, torch.
— Deep Learning —
Deepo
Be sure to catch the most popular links from last week's issue...
— In Case You Missed It —
Data Elixir is curated and maintained by @lonriesberg. If some awesome person forwarded this issue to you, subscribe for free at dataelixir.com and get it delivered every week.
— About —
Sign up for Free
and join the thousands of data lovers who already start their week with us.
No spam, ever. We'll never share your email address and you can opt out at any time.
No spam, ever.