ISSUE 338 · June 1, 2021InsightNasty, brutish and short: The life of the modern CDOWhat does a Chief Data Officer do exactly? This short post offers a glimpse into the world of a CDO and how they're often not set up for success within their organizations. To regulate AI, try playing in a sandboxIn the EU's recently proposed AI regulations, the word "sandbox" is mentioned 38 times. The idea is to develop and test AI applications in low-stakes, monitored environments before rolling them out to the general public. Experts agree that it's a step in the right direction but it's no silver bullet. Sponsored LinkMitigate Bias, Duplication, and Costly Errors with High-Quality DataAt Alegion, we understand that when you don't have high-quality training data, it can cost companies millions of dollars. That is why we have created a whitepaper on how our team of experts achieves quality training data, and how you can too. Read more about our four prioritized phases, the significance of quality metrics, and how to implement it all. Tutorials, Projects & OpinionsUnderstanding the data (error) generating processes for data validationIn stats, we talk about the data generating process (DGP), yet data validation is often conducted without a theory of error generation. This post explores some failure models in ELT and implications for data consumers on effective validation. JavaScript for Data AnalysisJavaScript may not be widely considered as important for working with data but the tools are evolving and JavaScript already excels at enabling collaboration and communication on the web. In this post, Mike Bostock, creator of D3.js and the Observable platform, makes a case for JavaScript as a key language for data and what to expect in the coming years. Interactive Gaussian Process VisualizationThis interactive visualization of Gaussian processes helps make it easy to understand prior and posterior kernel functions. The description regarding Gaussian processes is high-level but follow the link in the "Background" section for an in-depth tutorial. Can you build an ML model to monitor another model?Can you train a machine learning model to predict your model's mistakes? It sounds reasonable on the surface. Machine learning models make mistakes. Let's take these mistakes and train another model to predict the missteps of the first one! Sort of a "trust detector," based on learnings from how our model did in the past. Session-based Recommender SystemsSession-based recommendation algorithms provide recommendations based solely on a user's interactions in a single session. That's useful because it doesn't require a history of that user's preferences. This introduction shows how these algorithms work, how to evaluate them, and things to consider if you're building your own. Apply today: Master of Data Science for Public PolicyPassionate about turning data into valuable insights for the common good? Join the next cohort of the Master of Data Science for Public Policy (MDS) at the Hertie School in Berlin, Germany. This two-year programme taught entirely in English welcomes students from any academic discipline who are interested in using data to solve real-world challenges. Are you curious about the MDS? Get in touch. Code & ToolsAgentPy - Agent-based modeling in PythonAgentPy is an open-source library for the development and analysis of agent-based models in Python. The framework integrates the tasks of model design, interactive simulations, numerical experiments, and data analysis within a single environment, and is optimized for interactive computing with IPython and Jupyter. NocoDB - The Open Source Airtable AlternativeNocoDB is an open-source Airtable alternative that turns your relational databases into "smart-spreadsheets." It connects to cloud services like S3 for storage and things like Slack, Twilio, Discord, etc via an App Store for workflow automations. This project was open-sourced just last week and already has over 10K stars! Also, see the Hacker News discussion >> Sign up to get Data Elixir's data science newsletter in your Inbox >> Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email. To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >> |