Data Elixir logo

ISSUE 444 · July 18, 2023

 

Posts & Tutorials

Getting started with Code Interpreter

OpenAI's Code Interpreter is a general-purpose toolbox that gives GPT-4 superpowers. With Code Interpreter, GPT-4 can ingest up to 100MB of your data and then use that data in python scripts that it writes and executes. That allows GPT-4 to do all sorts of things it couldn’t do before. This post is a nice tour of what's possible now and how to use it.
One Useful Thing | Ethan Mollick

 

VScode + Docker + Python= ❤️ ❤️ ❤️

This is a great guide for setting up a Python development environment with VScode and Docker. It starts with a section that explains the advantages of each tool and how they work well together. From there, it's an easy to follow, step-by-step tutorial for setting everything up.
GitHub | Rami Krispin

 

Introduction to dimensionality reduction

Dimensionality reduction helps to simplify complex datasets and make them more manageable to work with. In this two-part introduction, Gabe Flomo shows how dimensionality reduction works and offers practical examples for common algorithms using python.

  • Part 1: Building an intuition around dimensionality reduction
  • Part 2: A practical guide to dimensionality reduction techniques

Hex | Gabe Flomo

 

Sponsored Link

Computational linguistics in the age of large language models

Computational linguistics in the age of large language models

What are the challenges facing LLMs? Amazon senior principal scientist and ACL 2023 general chair Yang Liu highlights the problem of “hallucination”, or generating false assertions, and explains how scientists are trying to address it.

 
 
 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tools & Code

An Open-source Plotting Library for Statistical Data

Lets-Plot is a python plotting library for statistical data. It's based on the Grammer of Graphics and largely follows the gpplot2 API. This looks like a very nice plotting library that supports a wide range of chart types and features, such as data sampling, formatting, geocoding, notebook compatibility, and much more.
JetBrains | Let's Plot

 

Introduction to theft

{theft} is an R package that provides a structured analytical workflow for the extraction, analysis, and visualisation of time-series features.  It's designed to be flexible and extensible and it provides standardized access to key R packages as well as python libraries.
Trent Henderson

 

Papers

Regulating Frontier AI: To Open Source or Not?

Two important new papers grapple with how to govern emerging and increasingly powerful "frontier AI" models. The first paper is a collaboration between Big Tech players like OpenAI, Google, Microsoft, etc. It argues for self-regulation with government oversight. The second paper, by Jeremy Howard, counters with an open-source approach. This is a great post that thoughtfully summarizes the issues.
Andrew Maynard

 

Financial Machine Learning

This new survey paper explores the nascent literature on machine learning in financial markets. Along with an extensive review, the paper highlights the best examples of what this line of research has to offer with recommendations for future research.
Bryan T. Kelly (Yale SOM) and Dacheng Xiu (University of Chicago)

 

Resources

Advanced Python Mastery

This course introduces Python's more advanced features and is intended to help you understand how to control the behavior of the language and bend it in ways that serve the needs of your application. This is an exercise-driven course that's free and self-paced.
David Beazley

 

R for Data Science (2e)

A new, second edition of R for Data Science was recently released and it's a big update. In this edition, visualization is more thoroughly covered, the programming section has been rewritten to focus on function writing and iteration, and there's a new section for accessing data from databases, spreadsheets, and the web. Free to read online.
Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir® is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!