ISSUE 432 · April 18, 2023TrendsThe Data DelusionFrom the early days of encyclopedias to the influence of data science in modern society, data and information storage have had a staggering impact on the world. This article explores the rise of numbers as an instrument of state power, the Technocracy movement, and the move from a culture of numbers to a culture of data. Sponsored LinkYou're invited! Webinar: How to expedite data analytics insights and reduce time-to-value with AWS & Rearc Leaders from AWS and Rearc showcase how incorporating third-party data in your analytics strategy streamlines data ingestion and saves time. Join our webinar to explore how data enrichment helps define business problems, find data dependencies, and accelerate insights. Date: April 19th Tutorials & OpinionsA Framework for Making Decisions with DataGreat post that walks through a five step framework for making better, faster, and more data-informed decisions. Detecting heart murmurs from time series data in RNice tutorial that shows how to use machine learning algorithms to make predictions using time series data. Shows how to calculate features, compare features between subjects, and how to fit a model with {tidymodels}. This is well organized and includes lots of code samples and visualizations along the way. Equality Of OddsThis new post from the popular MLU-Explain blog explores the concept of "Equalized Odds" (EO). EO is a fairness criterion that takes into account the underlying ground truth and aims to ensure that models make errors at similar rates within different groups of people. Using EO, you can measure and mitigate bias in a model's predictions. Building LLM applications for productionApplications that are based on large language models (LLMs) have suddenly become super popular but the ambiguity and flexibility of LLMs present many challenges for making the applications "production-ready." This post explores strategies for mitigating these challenges and provides examples of use cases worth exploring. Awesome Twitter AlgoFor anyone interested in recommendation systems or, more specifically, how Twitter works under the hood, this is an awesome exploration of the Twitter code that was recently released. It discusses the key components of Twitter's recommender system and includes links to key papers and related resources along the way. Sharpen your math, CS and data skills in 15 minutes a dayFor professionals and lifelong learners alike, Brilliant is one of the best ways to learn. The deets: Bite-sized interactive lessons make it easy to level up in everything from math and data science to AI and beyond. Join 10+ million people building skills every day. Start your 30-day free trial today! Tools & CodeIntroduction to rpolarsPolars is one of the fastest data table query libraries you can get. It optimizes queries for memory consumption and speed and is easy to use. rpolars is a package that allows R users to enjoy Polars benefits, potentially speeding up data processing tasks by 5-10x compared to dplyr. This post walks through its features and how to use it. AutoProfilerAutoProfiler is a tool that automatically updates and visualizes Pandas dataframes in Jupyter notebooks, providing useful profiling information like column distributions, sample values, and summary statistics. ResourcesBuilding reproducible analytical pipelines with RThis book teaches data scientists, analysts, and researchers how to apply best practices from software engineering and DevOps to make projects robust, reliable, and reproducible. Ultimately, the ideas presented here will improve the quality of your work. Free to read. |