ISSUE 371 · January 25, 2022InsightGood Data Citizenship Doesn’t WorkMore data isn't necessarily better. The key is managing it in a way that's useful and doesn't detract you from reaching your goals. This post uses lots of examples to explore the issues & lessons learned along the way. The Rise of A.I. Fighter PilotsArtificial intelligence is being taught to fly warplanes. That's not really surprising but how can anyone trust it? Sponsored LinkFree Course: Natural Language Processing (NLP) for Semantic SearchLearn how to build semantic search applications! This free course from Pinecone covers everything you need to build state-of-the-art language models for semantic search. Start reading→ Tutorials, Projects & OpinionsML and NLP Research Highlights of 2021There were a lot of advances in machine learning and natural language processing last year. This post doesn't cover everything but it's a great collection of interesting highlights across multiple impactful areas. Each section includes a summary of the highlight, why it's important, links to key papers, diagrams and thoughts about what's next. My Machine Learning Process (Mistakes Included)Writers generally clean out any mistakes in their posts, which can make it seem that everything always works out perfectly. Not this one. This tutorial shows how to train a machine learning model in the real-world, "with all the mistakes and fruitless efforts included." How vectorization speeds up your Python codePython has a lot going for it but it's not the fastest. This is a great post that shows how to process a large amount of homogeneous data quickly in python using vectorization. Learn what that means, when it applies, and how to do it. Binding Apache Arrow to RApache Arrow is an open-source library that simplifies working with flat and hierarchical data. This is a fun and gentle introduction to the inner workings of Apache Arrow and how to use it from within R. How Hasura improved conversion by 20% with PostHogIn 2021 Hasura’s Engineering and UX teams started self-hosting PostHog to collect product insights without needing to share user data with third parties. Using PostHog’s funnel analysis and session recording tools, Hasura’s team was able to improve conversion by 20% overnight. Code & ToolsPRQL - Pipelined Relational Query LanguagePRQL is a SQL alternative that compiles to SQL and works anywhere SQL does. Like SQL, it's readable, explicit and declarative but unlike SQL, it forms a logical pipeline of transformations, and supports abstractions such as variables and functions. For a lively discussion on the project, check out this post on Hacker
News >> tinygp - the tiniest of Gaussian Process librariestinygp is an extremely lightweight library for building Gaussian Process (GP) models in Python, built on top of jax. It has a nice interface, it’s pretty fast and, thanks to jax, tinygp supports things like GPU acceleration and automatic differentiation. ResourcesPython for Data Analysis: |