ISSUE 433 · April 25, 2023In the NewsInside the secret list of websites that make AI like ChatGPT sound smartIn spite of their rapid popularity, the details of how chatbots are trained are mostly hidden. In this article, The Washington Post dives into a dataset of 15 million websites that were used to train some high-profile AIs to reveal the range of input that are shaping what AIs "know." Tutorials & OpinionsMore Design Patterns For Machine Learning SystemsDesign patterns don't just make it easier to write good code. They also communicate the problem being addressed and how the code or component is intended to be used. In this continuation of a series, Eugene Yan explores common design patterns that are used in industry to build machine learning systems. Data analysis with SQLite and PythonIn a workshop at PyCon last week, Simon Willison presented a three-hour tutorial on data analysis using SQLite and Python. Here's the 9-page handout, covering the basics of using the sqlite3 module, sqlite-utils, Datasette and even a bit on Datasette Lite. Writing performant code with tidy toolsWhen computational efficiency is the priority, switching from functions in dplyr and tidyr to the backend tools underlying them can result in substantial speedups. A data analyst workflow, part 1: SQL & tidyverseNice tutorial that shows how a data analyst can use either SQL or the tidyverse for the initial stages of data exploration and then double down on tidyverse with ggplot2 for a deeper exploration. ResourcesRecommended Resources for Starting A/B TestingThere are a lot of resources for A/B testing, but where do you start? In this post, Emily Robinson and Eddie Wharton have curated some of the best resources on the web to help teams design, implement, and analyze A/B tests effectively. How to Run SurveysGreat guide on effectively running surveys for data collection. It covers things like sample selection, survey design, data collection and data analysis. It also highlights considerations to make sure your results are accurate, reliable, and useful. Data VisualizationVisually Accessible Data VisualizationCreating data visualizations that are accessible to people with common types of color blindness will enable more people to understand your visualizations and products. In this post, Derek Torsani walks through the challenges of creating accessible charts and shows the strategies used by his team at Plaid. |