ISSUE 441 · June 20, 2023In the NewsDo Foundation Models Comply with the EU AI Act?In this post from Stanford, researchers evaluate foundation model providers like OpenAI and Google for their compliance with the recently proposed EU regulations on AI. The post scores each of the models on the key issues and offers recommendations. Ultimately, the entire ecosystem will benefit from working towards compliance but it's not clear how, or if, that will happen. Sponsored LinkWebinar: How to generate business intelligence leveraging Yelp's rich first-party data on AWSDiscover how to create actionable insights using Yelp's robust data sets to analyze your marketplace, your customers, and grow your business. Explore use cases on how businesses leverage this rich data with AWS Data Exchange to make strategic business decisions. Date: July 19, 2023 Posts & TutorialsData Falsificada (Part 1): "Clusterfake"In this first post in a series on academic fraud, researchers explore a case where two different people independently faked data for two different studies in a paper about dishonesty! Besides the backstory, what's interesting are the techniques the researchers used to dissect Excel files. There's a lot more to those files than most people realize. Artifact corrections for effect sizesAn effect size is a way to quantify the difference between two groups. While p-values can tell you whether an effect exists, effect sizes can tell you how large that effect is. But to be useful, effect sizes need to be corrected for a variety of statistical artifacts, such as measurement error. This post walks through nearly all artifact corrections and includes equations, code snippets and an interactive learning app. What Makes Raincloud Plots Tick?A raincloud plot combines visualizations of the overall shape of a distribution, the raw data values, and relevant statistics. This is a nice explainer that explores how raincloud plots are useful and things to think about for their design. The post introduces a larger project on raincloud plots that includes a paper and a notebook with examples. 5 methods to detect drift in ML embeddingsThis post explores the problem of drift in ML embeddings and a variety of techniques to monitor it. For each technique, there's a description of how it works, pros/cons, and experimental results. Tools & CodeDB-GPT - Database Interactions with Private LLMsDB-GPT is an experimental open-source project that uses local LLMs to enable you to interact with your data in natural language. Use it to generate SQL, diagnose SQL issues, provide natural language Q/A with knowledge bases, chat with documents, etc. Privacy and security are core objectives and all of your data stays in your own environment. GPT EngineerGPT Engineer is an AI agent that can write an entire codebase with a prompt. Specify what you want it to build, the AI asks for clarification, and then builds it. It's made to be easy to adapt and it even learns how you want your code to look. This has been out for less than a week and already has more than 22K stars on GitHub. ResourcesSpatial Statistics for Data ScienceThis new book introduces the theory and practice of spatial statistics using R. Covers packages for working with spatial data, the various types of spatial data and how to access it, making maps, spatial autocorrelation, Bayesian spatial models, and more. Free to read online. Julia programming for MLThis notebook-based course introduces Julia's machine learning ecosystem and will teach you how to write reproducible, unit-tested Julia code along the way. Prior experience with Julia is not required. Covers Julia fundamentals (e.g. plotting, data frames, classical ML), deep learning, a personal project, and finishes with debugging and profiling. |