ISSUE 341 ยท June 22, 2021InsightExperts doubt ethical AI will be broadly adopted anytime soonThis new report shows that a majority of tech innovators, business & policy leaders, and researchers & activists believe that the evolution of AI will continue to be focused on optimizing profits and social control. That's easy to understand, especially considering the range of responses here. Even so, many are optimistic about how AI will ultimately be beneficial. This is a thought-provoking longread. Sponsored LinkUsing external data in a post-vaccinated worldJoin this webinar to discover how AWS Data Exchange provides the data to help you understand consumer behavior for better data insights and business decisions in a post-pandemic economy. Tutorials, Projects & OpinionsWhat is a "data scientist" or "machine learning engineer", really?Job titles tend to be overloaded and aren't super helpful for understanding the work that someone does. What does a "Data Scientist" actually do? How about a "Data Analyst?" Or an "ML Engineer?" In this project, Paige Bailey uses survey results from Kaggle, Anaconda and Stack Overflow to explore who does what, and how. Based on the responses, she found 6 distinct clusters; and no apparent correlation to job
title. Understanding p-values Through SimulationsP-values are often misinterpreted or misused. This is a fantastic interactive tool for learning about and/or explaining how p-values work. Applied NLP Thinking: How to Translate Problems into SolutionsApplied NLP is different than research NLP. With Applied NLP, you'll never ship anything valuable if you abstract out the application goals and treat the work as an optimization problem. This post is a non-technical introduction to thinking through real-world NLP problems and how to approach them. Using Geospatial Data in RGreat tutorial for working with geospatial data using R. Starts with an overview of how geospatial data corresponds to places in the real world and then introduces a workflow that includes data management, data wrangling with simple features, and ways to visualize geospatial data. This is well-organized and includes lots of code snippets, linked references, and screenshots along the way. StreamSets: Build Smart Data Pipelines for FreeGet early access to the new StreamSets DataOps Platform. Quickly build and deploy streaming, batch, CDC, ETL and ML pipelines across hybrid and multi-cloud. Plus, handle data drift automatically so jobs keep running even when schemas and structures change. Get early access now. Code & ToolsKats - One stop shop for time series analysis in PythonKats is a lightweight, easy to use, and extendable framework that aims to be a one-stop shop for time series analysis, including detection, forecasting, feature extraction/embedding, multivariate analysis, etc. Kats is from Facebook's Infrastructure Data Science team and is open-sourced with an MIT license. Exploring Data Lineage with OpenLineageOpenLineage is an open platform for collecting and analyzing data lineage. It tracks metadata about datasets, jobs, and runs, and gives users the information they need to identify the root cause of complex issues. Here's how to get started. ResourcesMachine Learning Interviews BookRegardless of what side of the table you're on, this is an awesome new resource for anyone who's interviewing for machine learning roles. Along with a wide range of sample questions, this includes discussion of typical roles, types of companies, interview processes, how to think through an offer, do's and don'ts, useful resources and much more. This is an open-source book that's being actively developed. ![]() Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email. To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >> |