ISSUE 417 · December 20, 2022Happy Holidays everyone! Data Elixir is taking a break next week and will be back in your Inbox on January 3rd. However you celebrate the coming new year, I wish the very best for you and your families! -Lon Sponsored LinkBring SQL, Python, no-code, and R together in one UIFrom exploratory analyses to beautiful data apps to ML modeling and data science, Hex streamlines the entire analytics workflow so your team can focus on generating insights, driving decisions, and moving things forward. No more jumping between tools, struggling with versions, or sharing via screenshot. Try Hex free with a 14 day trial. Tutorials, Projects & OpinionsTikTok’s Secret SauceIt's easy to think that the most important part of a recommender is the underlying algorithm but in TikTok's case, the real innovation is less obvious. Even if recommenders on other social media platforms are equally accurate, it feels more accurate on TikTok. Here's why. Mitigating the winner’s curse in online experimentsExperimentation plays a key role in testing and deploying new ideas at Etsy, but it can lead to overestimating the impact of winning treatments. It's a phenomenon that's known as the "winner's curse." This post explores how Bayesian statistics can be used to mitigate this effect and more accurately account for the business impact of experiments. NormConf: Selected talks and lessons learnedIn case you missed NormConf last week or just didn't have time for everything, here's a nice set of notes and replays from a handful of selected presentations. Includes Vicki's opening keynote, Katie Bauer's talk on working with PMs, Josh Will's talk on model training, and Chris Albon's talk on invisible work. A Failed Machine Learning ProjectProjects generally only show up in public if they work. This post takes a different approach and explores how a simple ML project failed. Code & ToolsSematicSematic is an open-source development toolkit to help prototype and productionize machine learning pipelines. It uses native python and helps to speed up your workflow, monitor all inputs & outputs, and can execute your pipelines locally or in the cloud. ResourcesHigh-Dimensional ProbabilityThis course by Roman Vershynin explores high-dimensional probability and its applications in data science. There are 41 lectures here with detailed notes and homework assignments. The series is loosely based on his book with the same title, which can be downloaded for free. List of AI and ML Conferences in 2023Looking for a conference to attend, sponsor or submit a talk to? This latest list from Tryolabs includes 42 machine learning conferences around the world. Each listing is linked to the associated website and filters make it easy to search by continent or category. Data VisualizationR packages for visualising spatial dataNice write-up of R packages that are commonly used for visualizing spatial data. Covers a variety of uses, such as creating and using themes, interactive maps, working with elevation data, plotting raster objects, shading for 3D effects and more. |