ISSUE 425 · February 21, 2023TrendsThe 2023 MAD (ML, AI & Data) LandscapeMatt Turck's latest MAD Landscape post is hot off the press. This year's edition covers market trends, data infrastructure trends, ML/AI trends, and an interactive map that makes it easy to explore the landscape and find all the companies and projects on the web. This is a detailed look at the data ecosystem in 2023 and is highly recommended. Sponsored LinkExplore the world of data with DataTalks.Club!Our supportive community is the perfect place to enhance your skills, deepen your knowledge, and connect with peers who share your passion. Join us now and find everything you need to succeed in data! Have a product, service, job, or event you'd like to share with over 55,000 subscribers?Sponsor an Issue | Job Board | Talent CollectiveTutorials, Projects & OpinionsWhat Is ChatGPT Doing … and Why Does It Work?Awesome introduction to the inner workings of large language models. This is a long read but it's easy to follow and worthwhile. Creating a data cleaning workflowGreat, three part tutorial that walks through how to create a data cleaning workflow. Part 1 is a discussion of what makes a clean dataset and what alterations to think about. Part 2 describes workflow steps and documentation to consider. And Part 3 walks through a real-world example. There's a lot of insight and detail here. Sign Up To Pointer.io, A Reading Club For Software DevelopersPointer.io is a reading club and newsletter for software developers. It's a window into what other current and future CTOs are reading and thinking about. Super high quality engineering-related content, not just trendy topics or link bait. Subscribe For Free Tools & Coderang: Make Ancient R Code Run AgainReproducibility is a big focus for the R community but that hasn't always been the case. Old code, especially, wasn't necessarily future-proof when it was written. Enter rang. rang is a new R package that helps make old code run again and it supports code all the way back to R 2.1.0 from 2005! Here's what it does and how to use it. 🧬dstackdstack is an open-source tool that lets you run reproducible ML workflows independently of the environment. It allows running ML workflows locally or in the cloud and, additionally, dstack facilitates versioning and reuse of data and models across teams. CareerEurope data salary benchmark 2023This pay benchmark for jobs in Europe covers data analysts, data scientists, analytics engineers and data engineers across hundreds of companies. The post breaks out compensation by experience level, location, and a few Big Tech companies. Compensation in Europe tends to be less than the U.S. but even so, there are respectable numbers here. If you have 3+ years of data science experience, join the Data Elixir Talent Collective where top companies apply to you. For details, check out the Collective 👉Data VisualizationGreat Twitter thread with lots of resources and ideas for making charts with Matplotlib. Here are a few picks from the thread:
ggplot tricksNice collection of tips and tricks for working with ggplot2. Organized into sections for Starting Up, Splicing Aesthetics, Half-geoms, Midpoints in Diverging Scales, Facetted Tags, and tips for Reusing Plots. PyGWalkerPyGWalker is a Python library that enables exploratory data analysis in your notebooks. Essentially, it lets you turn a pandas dataframe into a Tableau-style interface for visual exploration. Supports Jupyter, Google Colab, and Kaggle Notebooks. |