ISSUE 441 · June 20, 2023
In this post from Stanford, researchers evaluate foundation model providers like OpenAI and Google for their compliance with the recently proposed EU regulations on AI. The post scores each of the models on the key issues and offers recommendations. Ultimately, the entire ecosystem will benefit from working towards compliance but it's not clear how, or if, that will happen.
Stanford Center for Research on Foundation Models
Discover how to create actionable insights using Yelp's robust data sets to analyze your marketplace, your customers, and grow your business. Explore use cases on how businesses leverage this rich data with AWS Data Exchange to make strategic business decisions.
Date: July 19, 2023
Time: 10:00 AM (PST)
10:00 AM (BST)
10:00 AM (SGT)
In this first post in a series on academic fraud, researchers explore a case where two different people independently faked data for two different studies in a paper about dishonesty! Besides the backstory, what's interesting are the techniques the researchers used to dissect Excel files. There's a lot more to those files than most people realize.
Data Colada | Uri Simonsohn, Leif Nelson, and Joseph Simmons
An effect size is a way to quantify the difference between two groups. While p-values can tell you whether an effect exists, effect sizes can tell you how large that effect is. But to be useful, effect sizes need to be corrected for a variety of statistical artifacts, such as measurement error. This post walks through nearly all artifact corrections and includes equations, code snippets and an interactive learning app.
Matthew B Jané
A raincloud plot combines visualizations of the overall shape of a distribution, the raw data values, and relevant statistics. This is a nice explainer that explores how raincloud plots are useful and things to think about for their design. The post introduces a larger project on raincloud plots that includes a paper and a notebook with examples.
This post explores the problem of drift in ML embeddings and a variety of techniques to monitor it. For each technique, there's a description of how it works, pros/cons, and experimental results.
Evidently | Elena Samuylova and Olga Filippova
DB-GPT is an experimental open-source project that uses local LLMs to enable you to interact with your data in natural language. Use it to generate SQL, diagnose SQL issues, provide natural language Q/A with knowledge bases, chat with documents, etc. Privacy and security are core objectives and all of your data stays in your own environment.
GitHub | csunny
GPT Engineer is an AI agent that can write an entire codebase with a prompt. Specify what you want it to build, the AI asks for clarification, and then builds it. It's made to be easy to adapt and it even learns how you want your code to look. This has been out for less than a week and already has more than 22K stars on GitHub.
GitHub | Anton Osika
This new book introduces the theory and practice of spatial statistics using R. Covers packages for working with spatial data, the various types of spatial data and how to access it, making maps, spatial autocorrelation, Bayesian spatial models, and more. Free to read online.
This notebook-based course introduces Julia's machine learning ecosystem and will teach you how to write reproducible, unit-tested Julia code along the way. Prior experience with Julia is not required. Covers Julia fundamentals (e.g. plotting, data frames, classical ML), deep learning, a personal project, and finishes with debugging and profiling.
TU Berlin | Adrian Hill