— Notes —
Data Science Tech Stacks
If you haven't already participated in the Data Science Tech Stacks project, here's another chance to get insights into the tech stacks of organizations around the world.
So far, ~400 Data Elixir readers have taken the survey and on a 0-10 scale, most respondents have indicated that the stack of tools they use is a "7" or less at meeting their needs. But 34 readers have given their stacks a 9+ rating. Interested in what those stacks look like? Take the survey and find out >>
By the way, all responses are anonymized and if you’ve previously completed the survey, I’ll resend links to the updated responses next week.
— In the News —
This paper offers a wide-ranging look at the opportunities for Machine Learning to help combat climate change. Suggestions range from research questions to promising business opportunities. This is a well-organized long-read, with a lot of great ideas and references.
When there are problems in a large organization, it's harder than ever to know what's really going on. So teams with names like "Organizational Analytics" sift through the digital exhaust of calendars and email clients and try to piece together the reasons for organizational problems. But data alone isn’t insight. In this New York Times article, Neil Irwin explores how Microsoft finally got to the heart of employee discontent.
— Sponsored Link —
Come to Strata Data Conference in New York and be at the epicenter of data and business. Special for Data Elixir members—Save 20% on Gold, Silver, and Bronze passes with code DE20.
Passes start at $716 with code DE20 before Best Price ends June 28.
— Tools and Techniques —
Interactive notebooks are great for providing narrative alongside code but they're not so great for non-technical audiences. Voilà helps bridge the gap by turning Jupyter notebooks into standalone web applications and dashboards.
edaviz is a Python library for data exploration and visualization in Jupyter. Currently, edaviz is in private beta but the demo video looks promising.
Krishna Gade, the CEO of Fiddler Labs, explores the challenges that organizations face when building AI applications. Includes insights regarding development activities and specific gaps in developer tools.
PyTorch Hub provides basic building blocks for improving machine learning research reproducibility. It consists of a pre-trained model repository, has built-in support for Colab, integration with Papers With Code and contains a broad set of models that include Classification and Segmentation, Generative, Transformers, etc.
Dimensional Research recently conducted a global survey of hundreds data scientists and other AI professionals in large companies across 20 industries to determine their experiences with ML development projects. The survey suggests that most organizations find ML model training to be more challenging than they expected.
— Resources —
The Second Edition of Hadley Wickham's Advanced R book is a comprehensive update that includes new chapters and deeper coverage of key topics. It's available to read for free online and the print version will be released later this month.
This curated list of applied machine learning and data science notebooks covers a wide variety of industries and applications. Very well organized and growing.
— Career —
As a co-founder of Sharpest Minds, Jeremie Harris talks with a lot of data scientists after their job interviews. In this post, he lays out the specific technical and nontechnical skills that companies want most but that most applicants don’t have.