ISSUE 413 · November 15, 2022
Data’s day of reckoning
Data is sometimes said to be the "new oil" but for most businesses, that analogy doesn't work. In the most successful of data organizations, data might burn bright but, as Benn Stancil puts it here, in most businesses, data burns more like peat moss. Here are some reasons why, with some ideas for all the median businesses.
Web scraping datasets made easy - ScrapFly.io
The web is full of quality data though scraping it can be difficult. ScrapFly API can retrieve any web page or simplify the web scraping process through cloud web browsers - click buttons, input forms and retrieve the data. ScrapFly comes with a Python SDK making scraping in notebooks a breeze - Try ScrapFly for free!
Tutorials, Projects & Opinions
How Federated Learning Protects Privacy
Most machine learning models are trained by collecting vast amounts of data on a central server. This is a great visual explainer that shows how federated learning makes it possible to train models without any user's raw data leaving their device.
Forecasting with Structural AR Timeseries
The strength of a Bayesian model is largely the flexibility it offers for different modeling tasks. In this tutorial, Nathaniel Forde shows how to fit and predict a range of auto-regressive structural timeseries models and how to predict future observations of the models.
Method Chaining in Pandas: Bad Form Or a Recipe For Success?
Matt Harrison has written books on pandas and Python and regularly trains data science teams at top companies. And yet, his code is sometimes met with derision online. In this interview, he explores his approach to code, how to think about method chaining, and what separates naive code from good code.
Using Functional Analysis to Model Air Pollution Data
Functional analysis is one approach to understand how your data changes within a given timeframe, such as a day, or between timeframes such as many days. This is an easy-to-follow tutorial that shows how to apply functional analysis to some messy air pollution data using R.
How I learn machine learning
In a rapidly evolving field like machine learning, you need to figure out what works for you to navigate the never-ending task of staying up to date. In her latest post, Vicki Boykis shares her own process, including lots of links and resources along the way.
Tools & Code
This is a great tool if you're looking for Mastodon accounts to follow and want something more nuanced than a haphazard list of user handles. Debirdify searches a specific Twitter user's Lists and/or Followed Accounts for associated Mastodon handles and returns a Mastodon-friendly csv file.
Advanced NLP - Carnegie Mellon 2022
Graham Neubig's "Advanced NLP" is one of the best resources you'll find for current state-of-the-art techniques and algorithms in modern NLP. Follow the links for the slides and an awesome collection of readings and resources. Go here for the lecture videos 👉
Looking for Ambitious Machine Learning Engineers
Ratio is a revenue-generating startup that's looking for ambitious machine learning engineers to help automate a big part of the advertising space. The product is "like a self-driving car of marketing" and largely uses existing models from OpenAI. Remote OK.
In addition to office-based positions around the world, Data Elixir's Job Board currently has 35+ listings for remote positions, including roles for data scientists, data analysts, researchers, data architects, machine learning engineers, and more. The roles cover a variety of job levels, from Junior to Senior.
If you're HIRING, join the Data Elixir Talent Collective and get regular drops of outstanding data practitioners and leaders who are open to new opportunities 👉
Images by Daniel Coe / CC BY-NC-ND 2.0 / Links: Image 1 Image 2 Image 3
Visualizing Rivers and Floodplains with USGS Data
Awesome tutorial that shows how to create visualizations of the flow of water through rivers and floodplains using publicly available USGS data and open source tools. Includes links to tools, data, key resources, and a gallery of stunning visualiztions.
Galileo’s Telescopic Discoveries: