ISSUE 376 · March 1, 2022In the NewsIf you've been watching the news and wondering how a data/tech person can best help the situation in Ukraine, this Twitter discussion that Andrew Therriault started is a good place to start. Andrew is a data scientist and the founder of the data-for-good organization called Civin. InsightWhy becoming a data-driven organization is so hardA new survey of executives explores the challenges that many organizations face when trying to become "data-driven." In this article, Randy Bean considers the results from the survey and distills key steps that organizations can take to succeed. For a detailed report, follow the link to the "NewVantage Partners annual survey." Sponsored Link30-Day Trial of Innodata’s New Data Annotation PlatformThis new web-based annotation platform reduces the cost of AI/ML projects while enabling users to develop more accurate models. With easy-to-use workflows, customizable workbenches, real-time KPIs, and auto-annotation capabilities, your team can create high-quality training data on-prem or on the cloud. Sign up for a free 30-day trial today! Tutorials, Projects & OpinionsNotebooks In Production With MetaflowNotebooks in production?! Here's how — and why — new Metaflow features enable you to orchestrate notebook execution in your DAGs and visualize/debug production workflows. Learning with limited data - part 2: Active LearningHere's part 2 of what to do when faced with a limited amount of labeled data for supervised learning tasks. When the labeling budget is limited or the labeling cost is high, active learning is useful for selecting the most valuable samples to label next. Web Scraping with RThis step-by-step tutorial shows how to scrape web pages using R, starting with simple web scraping tasks and then continuing with scraping multiple pages. It's Time to Rethink Your Media DietFounded by investment bankers, The Daily Upside is our favorite source for premium business news that’s actually worth reading. The Daily Upside delivers crisp insights on market-moving stories and teases out nuances you won't read elsewhere. It's completely free, and the sole purpose is to make you a sharp, well-informed
investor. Code & ToolsStatistical ⚡️ ForecastThis new package is said to be the fastest yet autoarima implementation for Python. It's claimed to be 20x faster than pmdarima and 500x faster than MetaAI's Prophet. Includes a collection of widely used univariate time series forecasting models, including exponential smoothing and automatic ARIMA modeling. imodelsInterpretability is important in high-stakes fields like medicine and political science but complicated models are increasingly difficult to interpret. imodels is a new Python package for concise, transparent, and accurate predictive modeling. ResourcesProbabilistic Machine Learning: Advanced TopicsKevin Murphy just released the Advanced Topics edition of his popular Probabilistic Machine Learning series. It's not quite done but it's close and is free to download. The text covers Bayesian inference, causality, reinforcement learning, distribution shift, deep generative models, etc. For the other books in this series, see the pml-book website >> Data VisualizationColors for all!This new R package makes it easy to find the color palettes that will work best for your particular data and use-case. Just select a few parameters that describe your data and cols4all will score a variety of palettes on aspects that will make your visualizations more effective. |