Tools and Techniques
David Robinson from Stack Overflow shares some great reasons for starting a data science blog. Check it out and if you decide to create one, let me know and I'll include a list of new blogs in an upcoming issue of Data Elixir.
Data comes in a variety of states. Some datasets are organized and well-labeled while other datasets are messy and come with lots of caveats. Here's a worthwhile scheme for thinking through the state of your data and communicating that state to stakeholders in a way that's easy to understand and useful.
Pandas is one of the most important Python libraries for data science. Here's a step by step approach for mastering it.
Here's a great introduction to the "tidyverse." Includes an overview of “tidy” data principles, examples, key things to know, and worthwhile references.