Data Elixir logo

ISSUE 404  ·   September 13, 2022

 

Insight

Organizations need to deliberately create data

People sometimes say that "data is the new oil," but that line of thinking confines models to the data that's available for extraction. A better approach is to figure out what data you need and then figure out how to create it. This is a great post that explores the limitations of extracted data and how teams gain by deliberately creating the data they need.
Data Creation | Yali Sassoon

 

Takeaways from Gartner Data & Analytics Summit

Nice overview of highlights and four big ideas from the recent Gartner Data & Analytics Summit.
Prukalpa

 

Sponsored Link

TechCrunch x iMerit ML DataOps Summit 2022

Register: TechCrunch x iMerit ML DataOps Summit

Join 2,000+ data scientists, engineers, and ML Professionals virtually at the iMerit ML DataOps Summit to hear from leaders at the forefront of deploying ML DataOps solutions that power machine learning and artificial intelligence. Register for free.

 

Reach Data Elixir readers by sponsoring an issue. for details.

 

Tutorials, Projects & Opinions

5 questions to categorize machine learning interpretability approaches

After reading hundreds of papers and writing a book on machine learning interpretation, Christoph Molnar has identified some useful categories of interpretation techniques. In this post, he organizes his thinking into five simple questions that will help you assess the ML interpretation approaches that are suitable for different use-cases.
Mindful Modeler | Christoph Molnar

 

Getting Started with Apache Arrow in R

Nice collection of R resources, cheatsheets, and a tutorial for using Apache Arrow to work with data that's larger than memory. This is aimed at experienced R users who are new to Arrow.
Voltron Data

 

Want a data science project? 

This is the best take I've seen on the recently released treasure trove of hospital pricing data. Over 100TB of data was released and by all accounts, it's a mess. In this post, Randy Au explores what's available, what needs to happen next and how, ultimately, the lack of tools and structure has more to do with real-world data handling issues than maliciousness. There are important problems and opportunities here.
Counting Stuff | Randy Au

 

Djinn by Tonic.ai - AI-driven synthetic data models 

Whether it's privacy controls or a lack of high quality data slowing you down, Djinn's AI-driven synthetic data models create private and augmented data within minutes of setup. Answer nuanced scientific questions, optimize business processes, and make better decisions.
// sponsored

 

Code & Tools

PySearch: Python Function Search by Description

PySearch is a free search engine for querying python libraries using natural language descriptions. Just select the libraries you want to search and then use natural language or keywords to describe what you're looking for. Check out the examples to see it in action.
PySearch

 

Data Visualization

Mapping wind data with R

Mapping wind data with R

Great R tutorial that shows how to access, reshape and visualize wind data as streamlines. This is a step-by-step tutorial that includes code and links to key resources along the way.
Milos Popovic

 

Which fonts to use for your charts and tables

Sans-serif or serif typefaces? Lining or oldstyle figures? Narrow or wide? With lots of examples, this post explains which fonts work best for various types of data visualizations.
Datawrapper | Lisa Charlotte Muth

 

Career

The Difficult Life of the Data Lead

As data teams get bigger, more Data Leads are needed but Data Leads have one of the hardest roles in data. They have to manage a team, work with stakeholders and still stay hands-on. In this post, Mikkel Dengsøe explores the challenges and ideas for making the role better.
Mikkel Dengsøe

 

Join the Data Elixir Talent Collective

The Data Elixir Talent Collective is a reverse job board where top companies apply to you. Choose to be anonymous or public and get matched with opportunities that fit your specific interests. 

This is a free resource but membership is limited. To apply, you need 3+ years experience in data science, analytics, machine learning, visualization, or a related field. For more info, APPLY HERE.

If you’re hiring, apply now to find top candidates faster, sourced from the Data Elixir community. We're creating the highest signal-to-noise hiring resource for roles in the data ecosystem. Already, there are more than 100 mid to senior level candidates from a wide variety of organizations; from fast moving startups to big companies, like Google, Amazon, Apple, NVIDIA and more. If you're hiring, APPLY HERE.

 
 

Sign up to get Data Elixir's  data science newsletter in your Inbox >>

 
« Previous Issue   Next Issue  »  
 
 
 
Data Elixir logo

Data Elixir, LLC
P.O. Box 21255
Boulder, CO 80308

Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions, send a note!