ISSUE 426 · February 28, 2023
Nice selection of Polars posts, tutorials, talks, tools, a cheatsheet, and examples from around the web. This is an evolving collection and contributions are welcome.
Introduction to Data-Centric AI
Data-Centric AI is an emerging discipline that studies techniques to improve datasets for ML applications. The link goes to a new online MIT course that introduces the field and explores algorithms to find and fix common issues in ML data. The course is intended to be highly practical and is focused on impactful aspects of real-world ML applications. Free.
Webinar: Small Team, Big Impact
Join us for a panel webinar unlocking the power of data for startups and gathering insights from early-stage leaders on growth. Register Now.
Have a product, service, job, or event you'd like to share with over 55,000 subscribers?
Sponsor an Issue | Job Board | Talent Collective
Tutorials, Projects & Opinions
Setting up a new machine for data science
New machine? This guide walks through a typical data science setup and how to get everything installed correctly. It's written with MacOS Ventura in mind but for the most part, it's OS agnostic. Covers R, Julia, Python, related IDEs, terminal settings, command line tools, shortcuts, Git, Docker, Postgres and more.
Content Moderation - Patterns in Industry
Great post that explores techniques that are used in industry to learn and infer the quality of human-generated content such as product reviews, social media posts, and ads. Considers industry papers and tech blogs from a wide variety of businesses that rely on content moderation. This post covers a lot of ground but is easy to follow and is a nice survey of content moderation techniques in the real world.
pandas 2.0 and the Arrow revolution
pandas 2.0 will be released soon. This first post in a series describes one of the most important changes to look forward to.
If you have 3+ years of data science experience, join the Data Elixir Talent Collective where top companies apply to you. For details, check out the Collective 👉
Data Visualization Fundamentals and Best Practices
Learn the fundamentals of data visualization in this online course by Robert Kosara. This course will walk through basic chart types and show how to decide which to use; how to use aggregations, such as binning and smoothing; the difference between using charts for exploration and analysis vs. presentations; and more. Starts on March 7.
This step by step tutorial shows how to create orthographic weather maps using model data from the Global Forecast System (GFS) and ggplot2. There's a lot of detail here, making it easy to follow and modify for your own specific interests.