— Insight —
Paco Nathan summarizes the findings from a few recent data science surveys that cover topics such as technologies being used, management practices, and challenges. There's a lot of good info here. One of the surveys had more than 11,000 responses from enterprise organizations around the world. Includes links to detailed reports.
In a recent conversation with Rachel Thomas of fast.ai, Chris Wiggins shared some ideas for influencing tech companies towards responsibility and ethics. From his perspective, "most engineers and data scientists still do not realize the collective power that they have."
— Sponsored Link —
Come to Strata Data Conference in San Francisco and be at the epicenter of data and business. Special for Data Elixir members—Save 20% on Gold, Silver, and Bronze passes with code DE20.
Planning to be in Europe? Code DE20 also works for Strata London coming up April 29-May 2.
— Tools and Techniques —
Great article by Ben Lorica and Mike Loukides that explores machine learning security vulnerabilities, including common attacks and ways to lock things down. The biggest mistake people make is thinking that the models are just like any other type of software.
Want to become a better Pythonista? PyCoder’s is a free, weekly email newsletter for those interested in Python development and various topics around Python and the community.
Nice introduction to JupyterLab, the next generation of Jupyter.
This new Dataquest tutorial introduces Poisson Regression and shows how to use it in real-world applications.
What are some very useful, lesser known R packages for Data Science?
There have been a few Reddit discussions recently about favorite R packages that aren't well known. Here are a few of the highlights:
- CausalImpact - Inferring Causal Effects using Bayesian Structural Time-Series Models
- naniar - Provides tidy data structures, summaries, and visualizations for missing data
- mlr - Machine Learning infrastructure in R
- esquisse - RStudio add-in that makes it fast and easy to do interactive data exploration.
Realize the promise of data analytics and find the opportunity in the numbers. The Master of Management Analytics from Smith School of Business is essential training to unleash the potential of data and generate competitive advantage.
— Resources —
The second edition of Joe Blitzstein's Introduction to Probability has just been released and is available to read for free online. This is a big book with lots of examples and exercises that also works well as a reference. A cheatsheet for the first edition came out a few years ago as a 10-page PDF that was aptly called, The Only Probability Cheatsheet You'll Ever Need
— Data Viz —
This is a super useful website that will help you decide what types of visualization techniques are best suited for your data. Includes visualization examples in a variety of categories, including deviation, correlation, ranking, distribution, spatial, etc. Click on the headings for use-cases with examples and then click on "Edit" to open a specific example in the Vega editor.
This step-by-step tutorial shows how to create an interactive visualization that you can embed in a Twitter post! This is a super creative idea with a lot of potential to be useful.
— Career —
After working as a data scientist at Airbnb for a few years, Jason Goodman offers some practical advice for people starting out in a data science career.