Data Elixir logo

ISSUE 421  ·   January 24, 2023

 

Trends

Google Research, 2022 & Beyond

In this annual retrospective, Jeff Dean highlights key trends in language, computer vision, multi-modal models, and generative machine learning models. This is a great post with lots of links to projects, workshops, papers, and repos along the way.
Google Research community via Jeff Dean

 

Sponsored Link

Crawl, Walk, Run: Becoming a Privacy-Centric Marketing Organization Available Now

Crawl, Walk, Run: Becoming a Privacy-Centric Marketing Organization Available Now

InfoTrust is thrilled to announce the publication of its newest book, Crawl, Walk, Run: Becoming a Privacy-Centric Marketing Organization which is available now on Amazon! 

It’s not enough to be legally compliant, readers will learn how to renovate their data approach with a formula for creating stronger customer relationships that push your company forward.

Purchase on Amazon

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

Mechanisms for Effective Machine Learning Projects

Here's a useful framework for improving a project's chance of success. This is written with machine learning projects in mind but the ideas here could be easily modified for other types of projects.
Eugene Yan

 

Why Polars uses less memory than Pandas

While Polars is mostly known for running faster than Pandas, if you use it right it can also use significantly less memory. This post shows how to optimize memory use in Pandas, how those compare to optimizations automatically provided by Polars, and when to intervene manually.
Itamar Turner-Trauring

 

Tools & Code

sketch

Sketch is an AI code-writing assistant that understands the context of your data. It installs as a pandas extension and offers utility functions that operate via natural language prompts. You can ask it questions about your data, ask it to create visualizations, and get python & pandas code directly. To see it in action, check out this 90 second demo video.
GitHub | Approximate Labs

 

Introducing anywidget

anywidget is a new Python library that makes it easy to create and share custom Jupyter Widgets. No fiddling with messy build configs, packaging, or bundlers. Just start coding!
Trevor Manz

 

Resources

Data Science Cheatsheets

This repo of graduate level cheatsheets covers data science, machine learning, computer science and more. This is an evolving collection and pull requests are encouraged.
GitHub | Merve Noyan

 

Career

Crafting a successful career framework

When you're planning for a career, don't just think about the next job. Work backwards from where you eventually want to be and think about the job after the next job. This is from Nikhyl Singhal's newsletter, The Skip, which has consistently insightful career advice, especially for Tech.
The Skip | Nikhyl Singhal

 

Data Visualization

6 easy ways to map population density in R

In this step-by-step tutorial, Milos Popovic shows six different ways to create population maps using R. Along with code, includes screenshots, rationale for using each map type, and links to related tutorials. Covers Choropleth maps, bubble maps, cartograms, dot density maps, grid maps, and spike maps. 
Milos Popovic

 

Distance and proximity analysis in Python

A commmon task in spatial data analysis is to calculate the proximity to some resource. In this tutorial, Kyle Walker shows how to use Python to calculate proximity using two different methods: 1. straight-line (Euclidean) distances in a projected coordinate system and 2. using driving times derived from a hosted navigation service.
Kyle Walker

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!