ISSUE 445 · July 25, 2023PodcastsTest Driven Data Analysis - with Nick RadcliffeTest-Driven Data Analysis (TDDA) is an approach to quality for data and for data analysis pipelines. In this podcast interview, Nick Radcliffe introduces the philosophy and the associated Python library, TDDA, that he's been developing since 2016. Sponsored LinkComplete customer profiles in your data warehouseRudderStack Profiles takes the SQL grunt work out of building customer profiles. You specify the customer traits, then Profiles runs the joins and computations for you to create complete profiles, so you can build better models, faster. And that’s just one use case. Get all the details. Posts & TutorialsGetting started with Vector DBs in PythonEspecially with the popularity of LLMs, vector databases are all the rage these days. Which should you choose? Here's a great overview of nine popular options for Python, including strengths of each, sample code, and useful links along the way. Emphasize what you want readers to see with colorOne of the biggest superpowers of visualization is to lead a reader’s eye to the data you want to emphasize. By using color to create a visual hierarchy, you can decide what your readers see first, second, third, and last. This is a great post with lots of examples along the way. From zero to hero: end to end data applications with SQL and JupyterIn this online course, you'll learn how to develop and deploy an end-to-end data application with SQL, Python and Jupyter notebooks. Covers exploratory data analysis, SQL basics, workflow reproducibility, data pipelines, deployment, and more. Introduction to Cloud-Based Geospatial AnalysisNice introduction to Cloud-Based Geospatial Analysis using Google's Earth Engine and the geemap Python package. Covers the basics of Earth Engine data types and how to visualize, analyze, and export Earth Engine data in a Jupyter environment using geemap. CareerSalary CalculatorAwesome salary calculator, based on a compensation prediction model that was built for a Kaggle competition in 2022. This was made specifically for people who work with data and there are a lot of options, including a variety of positions, locations, experience levels, and more. ResourcesCookbook Polars for RThis cookbook for R users offers a side-by-side comparison of polars, R base, dplyr, tidyr and data.table for common tasks and problems. Comprehensive Python CheatsheetExhaustive and concise — a truly Pythonic cheat sheet for the Python programming language. OutlierLLM Training PuzzlesThis is a collection of 8 challenging puzzles about training large language models (or really any NN) on many, many GPUs. The goal of these puzzles is to get hands-on experience with the key primitives and to understand the goals of memory efficiency and compute pipelining. |