ISSUE 440 · June 13, 2023TrendsManaging the Risks of Generative AIGenerative AI has become wildly popular but it can have unintended consequences and potentially cause real harm. Here's how to identify and mitigate the risks for your organization. Sponsored LinkStep-by-step guide to find your data ingestion toolConfused about finding the right data ingestion tool for your team? Download the free 5-day step-by-step guide to find your perfect data ingestion tool. Posts & TutorialsData JockeyData Jockey is a passion project that applies a wide variety of data tools and methodologies to music collections. The goal is to do things like match similar songs, build a recommendation engine, create playlists from thousands of songs, derive clusters, etc. There's a lot here, organized as a collection of jupyter notebooks. Neural NetworksIn this latest post from the popular MLU-Explain series, Jared Wilber explores the fundamentals of feed-forward neural networks. This is an interactive explainer, starting with basic building blocks. Fitting many statistical models at once using dplyrA common task in applied statistics is to fit and interpret a number of statistical models at once. For instance, suppose you're using the Palmer Penguins data and you want to compare bill length, bill depth, flipper length, and body mass between species. You could manually run a separate model for each, but here’s an easier way that automates the process using R and produces plots and a nice table of results. Sequential Testing at Booking.comGroup Sequential Testing allows for intermediate opportunities to judge the outcome of A/B tests and act accordingly. That means decisions can be made faster and, as Nils Skotara explains in this post, it can also significantly increase the quality of those decisions. TEMPLATE | Data planning with your business teamsStakeholders should be your partners in building, testing, and iterating on metrics. Record how metrics like leads, users, and customers are defined, and use it to document how other North Star metrics, like ARR, are being tracked. Get the fillable template here. Tools & CodeWeave - Interactive Data Exploration ToolkitWeave is an interactive toolkit for ML and data app development. Start with a table of data and create beautiful and performant interactive dashboards, easily switching between code and UI along the way. Get started with Weave open source with just two lines of code. Changes to popular R packages for spatial dataThree popular R packages for handling spatial data won't be available after October. Whether you're a user or a developer, if you use these packages, your workflows will break. Here's what you need to know, including solutions and resources. ResourcesAn Introduction to 🏈 NFL Analytics with RThis new book by Brad Congelio shows how to do analytics with NFL data. It uses R but if you're not an R user, there's a section that covers what you need for this book. Covers NFL data access, data wrangling, NFL analytics, visualization and advanced modeling. Free to read online. PapersFinGPT: Open-Source Financial Large Language ModelsThis paper introduces FinGPT, an open-source large language model for the finance sector. FinGPT takes a data-centric approach to help researchers and practitioners develop FinLLMs that are much more practical than proprietary models like BloombergGPT. |