ISSUE 327 · March 16, 2021In the NewsHow Facebook got addicted to spreading misinformationArtificial Intelligence at scale depends on vast resources of data and talent that few organizations are able to pull together. Facebook does it but an organization as large as Facebook is easily tangled in complex internal dynamics that can hinder efforts for "Responsible AI ." This is a well-researched, epic article that explores how Facebook failed. Gartner cites ‘glut of innovation’ in DS and MLA recent Gartner report evaluates DSML platforms and, among other things, cites a “glut of compelling innovations.” Here are the highlights. Sponsored LinkNeed high-quality training data for ML & AI?Let your data teams focus on core research and business objectives while our in-house, advanced workforce handles the data annotation work for you. iMerit is trusted by Microsoft, Autodesk, and TripAdvisor for their computer vision and natural language processing data annotation needs. Get a free quote today. Tutorials, Projects & OpinionsHow to Write BetterEugene Yan consistently writes some of the most useful data science articles you'll find on the web. His posts are well researched and his writing is super clear and organized. In this two-part series, he offers his framework for thinking through and organizing writing projects — especially as they relate to technical and design docs:
Eugene Yan We Failed to Set Up a Data Catalog 3x. Here’s Why.When the data team at Atlan decided to create a data catalog, they thought it would be easy. After three failed attempts over five years, they finally got it right. This post gives a high-level view of each iteration, how they approached it, and lessons learned along the way. Pattern-based spatial analysisPattern-based spatial analysis allows for spatial search, change detection, and clustering of areas with similar patterns. It's commonly used with satellite imagery for environmental monitoring, real estate planning, healthcare analytics, etc. This five part introduction is easy to follow and includes lots of examples, code snippets and screenshots. Instrumental variables analysis for non-economistsInstrumental Variables analysis is a popular way to do causal inference but it's often explained using concepts that are mostly familiar to economists. In this introduction, Chris Said uses a visual approach to explain concepts in a way that tends to be more intuitive for data scientists and engineers. Defensible Machine LearningML companies have the potential to revolutionize business but they're hard to scale and they look and feel different from typical software businesses. Here's a simple framework to help founders and investors think through three types of defensible machine learning companies. Especially if you're thinking about ML startups, this is a must-read. Code & ToolsLuminaire - A hands-off Anomaly Detection LibraryLuminaire is a python package that provides ML-driven solutions for monitoring time series data. Luminaire provides anomaly detection and forecasting capabilities that incorporate correlational and seasonal patterns as well as uncontrollable variations in the data over time. image2csvimage2csv is a Python application converts images of an array of numbers to corresponding csv files. In other words, take a screenshot of a table and image2csv will create a csv file using the data in the table. ResourcesData Pricing: from Economics to Data ScienceThis survey paper explores ways to systematically assess the value of data. At 82 pages, it's not a light read but it's well-organized and is a fantastic resource for anyone interested in putting a monetary value on data products. For a good introduction, check out Ben Lorica's recent interview with the author on The Data
Exchange Podcast. From the ArchivesHow to send an 'E mail' - from 1984 via DatabaseThis short video that's been bouncing around the web this week offers a glimpse into the world of email in 1984. Ultimately, it shows why email wasn't a big thing back then! We've come a long way since 1984 but then again, if you watch closely when he enters his password, you might think that some things haven't changed much at all. Sign up to get Data Elixir's data science newsletter in your Inbox >> Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email. To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >> |