ISSUE 434 · May 2, 2023
Tutorials & Opinions
Synthetic data could be better than real data
Getting computers to manufacture realistic data could help address privacy concerns in domains like medicine and education. And if researchers can find the right balance between accuracy and fakery, synthetic data could also help create better datasets to begin with.
p Values Are Useful for A/B Tests, Sometimes
When you're running A/B tests, does it make sense to use p-values? According to the American Statistical Association, the answer is "no" but don't take that as a hard rule. In this post, Harlan Harris explores the use of p-values in A/B tests and when it does and doesn't make sense to use them.
Use this to do data planning with your business teams
Arrive at clear definitions for business metrics with stakeholders. Record how metrics like leads, users, and customers are defined, and use it to document how other North Star metrics, like ARR, are being tracked. Get it here.
Tools & Code
Inspired by Python's faker package, charlatan makes it easy to create fake data using R. Use it to create dummy data for names, phone numbers, emails, DOI numbers, genes, and more!
Understanding Large Language Models
Great reading list for getting up to speed with LLMs. This list explores some of the most influential papers for understanding the design, constraints, and evolution of LLMs — starting at the beginning. For each paper there's a short summary, key diagrams, and a link to the paper.
The Little Book of Deep Learning
Nice introduction to deep learning, starting with the basics of machine learning and efficient computation. From there, it covers a variety of topics including model components, architectures, and applications.
Sharpen your math, CS and data skills in 15 minutes a day
For professionals and lifelong learners alike, Brilliant is one of the best ways to learn. The deets: Bite-sized interactive lessons make it easy to level up in everything from math and data science to AI and beyond. Join 10+ million people building skills every day. Start your 30-day free trial today!
How Academic Bullying Led This Data Scientist to Open Science
Unless you've been there, it's easy to be idealistic about jobs in academia. In this post, Paola Chiara Masuzzo shares her 12-year journey starting as an excited PhD student to being disillusioned with academia and the path she took to rediscover her love for science.
The Data Elixir Job Board currently lists 55 openings for a variety of roles, including data scientists, data analysts, machine learning engineers, researchers, and more. The roles cover a variety of levels, from entry-level to Director and most of the jobs are remote.
Making Middle Earth maps with R
Middle Earth might not be the most obvious place for a data visualization tutorial but this sure works well. Starting with a set of open-source shapefiles from J. R. R. Tolkien's Middle Earth, the tutorial walks through a variety of spatial data visualization techniques, including projections, map layers, distances, scaling and more!
Awesome ggplot2 🕶️
Great collection of curated ggplot2 resources including tutorials, packages, books, courses, galleries, related repos, and people to follow.