ISSUE 440 · June 13, 2023
Generative AI has become wildly popular but it can have unintended consequences and potentially cause real harm. Here's how to identify and mitigate the risks for your organization.
Harvard Business Review | Kathy Baxter and Yoav Schlesinger
Confused about finding the right data ingestion tool for your team? Download the free 5-day step-by-step guide to find your perfect data ingestion tool.
Data Jockey is a passion project that applies a wide variety of data tools and methodologies to music collections. The goal is to do things like match similar songs, build a recommendation engine, create playlists from thousands of songs, derive clusters, etc. There's a lot here, organized as a collection of jupyter notebooks.
GitHub | GeorgeMcIntire
In this latest post from the popular MLU-Explain series, Jared Wilber explores the fundamentals of feed-forward neural networks. This is an interactive explainer, starting with basic building blocks.
MLU-Explain | Jared Wilber
A common task in applied statistics is to fit and interpret a number of statistical models at once. For instance, suppose you're using the Palmer Penguins data and you want to compare bill length, bill depth, flipper length, and body mass between species. You could manually run a separate model for each, but here’s an easier way that automates the process using R and produces plots and a nice table of results.
Group Sequential Testing allows for intermediate opportunities to judge the outcome of A/B tests and act accordingly. That means decisions can be made faster and, as Nils Skotara explains in this post, it can also significantly increase the quality of those decisions.
Booking Data Science | Nils Skotara
Stakeholders should be your partners in building, testing, and iterating on metrics. Record how metrics like leads, users, and customers are defined, and use it to document how other North Star metrics, like ARR, are being tracked. Get the fillable template here.
Weave is an interactive toolkit for ML and data app development. Start with a table of data and create beautiful and performant interactive dashboards, easily switching between code and UI along the way. Get started with Weave open source with just two lines of code.
Weights & Biases
Three popular R packages for handling spatial data won't be available after October. Whether you're a user or a developer, if you use these packages, your workflows will break. Here's what you need to know, including solutions and resources.
geocompx | Jakub Nowosad
This new book by Brad Congelio shows how to do analytics with NFL data. It uses R but if you're not an R user, there's a section that covers what you need for this book. Covers NFL data access, data wrangling, NFL analytics, visualization and advanced modeling. Free to read online.
This paper introduces FinGPT, an open-source large language model for the finance sector. FinGPT takes a data-centric approach to help researchers and practitioners develop FinLLMs that are much more practical than proprietary models like BloombergGPT.
arXiv | Hongyang Yang, Xiao-Yang Liu, Christina Dan Wang