Data Elixir logo

ISSUE 415  ·   December 6, 2022

 

Trends

Extreme numbers get new names

Prolific generation of data has led to the creation of new prefixes. Guess how many yottabytes are in a quettabyte...
Nature

 

Python and the Future of Programming

Guido van Rossum is the creator of the Python programming language. In this interview with Lex Fridman, he discusses the current state of Python, fads, IDEs, machine learning, and what to expect in Python 4.0.
Lex Fridman Podcast

 

Sponsored Link

Pinecone - Long-term Memory for AI

Combine AI models with the Pinecone vector database to make your applications understand and act on what your users want… without making them spell it out. Pinecone provides the cloud infrastructure that makes vector search easy, fast, and scalable. Understand more about Pinecone and try it →

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

Building Airbnb Categories with Machine Learning and Human-in-the-Loop

Airbnb recently flipped the travel search experience on its head by having their inventory dictate user destinations. In the new scheme, users aren't asked where they want to go. Instead, they're presented with categories of destination types. It may sound simple but there's a lot going on behind the scenes to make it work. This first post in a series explores how ML is used to build the listings and solve related tasks.
The Airbnb Tech Blog | Mihajlo Grbovic et al.

 

The cloudy layers of modern-day programming

If you're building in the cloud these days, your work is hindered by millions of dependencies, in fragmented and proprietary environments that you can't introspect. Even smallish side projects aren't immune. This is a great post that peels back the layers to show how development has become so frustrating and how, ultimately, it's not all bad.
Vicki Boykis

 
 

How to Lead from The Center: A Guide for Data Leaders

As an industry, we’ve published countless articles, papers, videos, and diagrams about the modern data stack—describing what it is, what it isn’t, where it’s going, and how it’s evolving. Download this guide to learn how to turn your data into your true competitive advantage in a new era of BI transformation.
// sponsored

 

Tools & Code

mvSQLite

mvSQLite is an open-source distributed database with full SQLite compatibility. There are already many nice "multi-machine" SQLite flavors: rqlite, dqlite, and Litestream. What mvSQLite offers is unique: it is not just replicated but really distributed, it offers not only read but also write scalability and it provides the strictest consistency.
Heyang Zhou

 

Kùzu

Kùzu is an in-process property graph database management system (GDBMS) built for query speed and scalability. Kùzu is optimized for handling complex join-heavy analytical workloads on very large graph databases. Kùzu is being actively developed at University of Waterloo and is available under a permissible license.
KùzuDB

 

Career

Becoming An Analytics Engineer in 2023

Nice overview of the Analytics Engineer role, especially for data people who are thinking about a career move. Includes discussion of the specific skills and tools you need to know and links to key resources.
Balu Rama Chandra

 

Outlier

ChatGPT: Optimizing Language Models for Dialogue

Last week, OpenAI introduced a language model called ChatGPT that interacts in a conversational way and can carry on a dialogue. Already, there have been more than a million users and examples being shared online are impressive. This announcement offers a high-level view of how it works. Follow the links for details and demos. For interesting examples, check these out:

  • Explain a complicated regex
  • Write an academic essay
  • Build a shiny app
  • Build a VM inside ChatGPT

OpenAI

 
 

Hiring? Join the Data Elixir Talent Collective and get regular drops of outstanding data practitioners and leaders who are open to new opportunities.

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!