Data Elixir logo

ISSUE 425  ·   February 21, 2023

 

Trends

The 2023 MAD (ML, AI & Data) Landscape

Matt Turck's latest MAD Landscape post is hot off the press. This year's edition covers market trends, data infrastructure trends, ML/AI trends, and an interactive map that makes it easy to explore the landscape and find all the companies and projects on the web. This is a detailed look at the data ecosystem in 2023 and is highly recommended.
Matt Turck

 

Sponsored Link

Explore the exciting world of data with DataTalks.Club!

Explore the world of data with DataTalks.Club!

Our supportive community is the perfect place to enhance your skills, deepen your knowledge, and connect with peers who share your passion. Join us now and find everything you need to succeed in data!

 
 
 

Have a product, service, job, or event you'd like to share with over 55,000 subscribers?

Sponsor an Issue | Job Board | Talent Collective

 

Tutorials, Projects & Opinions

What Is ChatGPT Doing … and Why Does It Work?

Awesome introduction to the inner workings of large language models. This is a long read but it's easy to follow and worthwhile.
Stephen Wolfram

 

Creating a data cleaning workflow

Great, three part tutorial that walks through how to create a data cleaning workflow. Part 1 is a discussion of what makes a clean dataset and what alterations to think about. Part 2 describes workflow steps and documentation to consider. And Part 3 walks through a real-world example. There's a lot of insight and detail here.
Crystal Lewis

 

Sign Up To Pointer.io, A Reading Club For Software Developers

Pointer.io is a reading club and newsletter for software developers. It's a window into what other current and future CTOs are reading and thinking about. Super high quality engineering-related content, not just trendy topics or link bait. Subscribe For Free
// sponsored

 

Tools & Code

rang: Make Ancient R Code Run Again

Reproducibility is a big focus for the R community but that hasn't always been the case. Old code, especially, wasn't necessarily future-proof when it was written. Enter rang. rang is a new R package that helps make old code run again and it supports code all the way back to R 2.1.0 from 2005! Here's what it does and how to use it. 
Schochastics | David Schoch

 

🧬dstack

dstack is an open-source tool that lets you run reproducible ML workflows independently of the environment. It allows running ML workflows locally or in the cloud and, additionally, dstack facilitates versioning and reuse of data and models across teams.
GitHub | dstack

 

Career

Europe data salary benchmark 2023

This pay benchmark for jobs in Europe covers data analysts, data scientists, analytics engineers and data engineers across hundreds of companies. The post breaks out compensation by experience level, location, and a few Big Tech companies. Compensation in Europe tends to be less than the U.S. but even so, there are respectable numbers here.
Synq | Mikkel Dengsøe 

 
 

If you have 3+ years of data science experience, join the Data Elixir Talent Collective where top companies apply to you. For details, check out the Collective 👉

 

Data Visualization

Great Twitter thread with lots of resources and ideas for making charts with Matplotlib. Here are a few picks from the thread:

  • SciencePlots - Matplotlib styles for scientific plotting
  • plotnine - A Grammar of Graphics for Python
  • matplotx - Styles and useful extensions for Matplotlib
  • Seaborn - A library for making statistical graphics in Python
  • Aquarel - Styling Matplotlib made easy
  • TUEplots - Extend Matplotlib for scientific publications

See the thread for more 👉

 

ggplot tricks

Nice collection of tips and tricks for working with ggplot2. Organized into sections for Starting Up, Splicing Aesthetics, Half-geoms, Midpoints in Diverging Scales, Facetted Tags, and tips for Reusing Plots.
GitHub | Teun van den Brand

 

PyGWalker

PyGWalker is a Python library that enables exploratory data analysis in your notebooks. Essentially, it lets you turn a pandas dataframe into a Tableau-style interface for visual exploration. Supports Jupyter, Google Colab, and Kaggle Notebooks.
GitHub | Kanaries

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!