Data Elixir logo

ISSUE 371  ·   January 25, 2022

 

Insight

Good Data Citizenship Doesn’t Work

More data isn't necessarily better. The key is managing it in a way that's useful and doesn't detract you from reaching your goals. This post uses lots of examples to explore the issues & lessons learned along the way.
Benn Stancil and Mark Grover 

 

The Rise of A.I. Fighter Pilots

Artificial intelligence is being taught to fly warplanes. That's not really surprising but how can anyone trust it?
New Yorker | Sue Halpern

 

Sponsored Link

Free Course: Natural Language Processing (NLP) for Semantic Search

Free Course: Natural Language Processing (NLP) for Semantic Search

Learn how to build semantic search applications! This free course from Pinecone covers everything you need to build state-of-the-art language models for semantic search. Start reading→

 

Reach Data Elixir readers by sponsoring an issue. Click here for details.

 
 

Tutorials, Projects & Opinions

ML and NLP Research Highlights of 2021

There were a lot of advances in machine learning and natural language processing last year. This post doesn't cover everything but it's a great collection of interesting highlights across multiple impactful areas. Each section includes a summary of the highlight, why it's important, links to key papers, diagrams and thoughts about what's next.
Sebastian Ruder

 

My Machine Learning Process (Mistakes Included)

Writers generally clean out any mistakes in their posts, which can make it seem that everything always works out perfectly. Not this one. This tutorial shows how to train a machine learning model in the real-world, "with all the mistakes and fruitless efforts included."  
David Neuzerling

 

How vectorization speeds up your Python code

Python has a lot going for it but it's not the fastest. This is a great post that shows how to process a large amount of homogeneous data quickly in python using vectorization. Learn what that means, when it applies, and how to do it.
Python⇒Speed | Itamar Turner-Trauring

 

 

Binding Apache Arrow to R

Apache Arrow is an open-source library that simplifies working with flat and hierarchical data. This is a fun and gentle introduction to the inner workings of Apache Arrow and how to use it from within R. 
Notes from a data witch | Danielle Navarro

 

How Hasura improved conversion by 20% with PostHog 

In 2021 Hasura’s Engineering and UX teams started self-hosting PostHog to collect product insights without needing to share user data with third parties. Using PostHog’s funnel analysis and session recording tools, Hasura’s team was able to improve conversion by 20% overnight. 
// sponsored

 

Code & Tools

PRQL - Pipelined Relational Query Language

PRQL is a SQL alternative that compiles to SQL and works anywhere SQL does. Like SQL, it's readable, explicit and declarative but unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. For a lively discussion on the project, check out this post on Hacker News >> 
GitHub | Maximilian Roos

 

 

tinygp - the tiniest of Gaussian Process libraries

tinygp is an extremely lightweight library for building Gaussian Process (GP) models in Python, built on top of jax. It has a nice interface, it’s pretty fast and, thanks to jax, tinygp supports things like GPU acceleration and automatic differentiation.
tinygp

 

Resources

Python for Data Analysis: 
Data Wrangling with pandas, NumPy & Jupyter

The 3rd Edition of this popular book by Wes McKinney is coming out later in the year and in the meantime, there's an open-access version on the web. The first 6 chapters are available now and the rest are coming. 
Wes McKinney

 

Data Visualization

Plot Cheatsheets

These cheatsheets provide a nice reference for the Observable Plot visualization library. Available as online interactives or PDF downloads.
Observable | Mike Freeman

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email.

Unsubscribe