No images? Click here ISSUE 265 ยท December 31, 2019In the NewsOne Nation, TrackedThis New York Times investigation is a must-read deep dive into a data industry that's perfectly legal but should alarm us all. Via apps on our phones, dozens of companies log the movements of tens of millions of people and can show where we spend our days, who we spent last night with, where we worship, whether or not we visit a psychiatrist, etc. Follow the links for info about locking down your phone. AI is rushing into patient care - and could raise risksMedical applications are a hot growth area for AI but in the "fail fast and fix things later" world of entrepreneurial tech, serious mistakes are being made. Sponsored LinkML Best Practices: The Keys To Achieving Quality Data AnnotationTo maximize the potential in ML models, high-quality data is key. When measuring the quality of labeled data consider IOU, Accuracy, Recall, Precision, and F1 Score. An enterprise-grade labeling platform that employs a holistic annotation approach can scale quality annotation without diminishing complexity and compromising accuracy. Tools and TechniquesThe 'largest stock profit or loss' puzzle: efficient computation in RIn his latest post, David Robinson starts with a SQL interview question and shows how to make the solution as computationally efficient as possible using a tidyverse approach. This is a good think-through of different approaches. Want to make good business decisions? Learn causality.Great post on the StitchFix tech blog that uses a common marketing question to show how being causal information driven is more effective than being naively data driven. Urban sensing: quantifying the sensing power of vehicle fleetsSmart cities are powered by sensors that measure things like air pollution, traffic congestion, temperature, humidity and road quality. Covering a large area is expensive but what if sensors could be mobile? How many taxis would it take to effectively "scan" an entire city? This is a nice exploration of the problem and how to measure urban sensing. ResourcesSQL for Data Science CourseThis free course by Kristen Kehrer starts with simple database queries and continues through data cleaning and feature engineering. Includes videos, cheat sheets and an interactive SQL browser for following along. NLP RecipesThis collection of Jupyter notebooks showcases best practices and examples for common scenarios involving text and language. The intention is to significantly reduce development time for both researchers and practitioners. Data VizThe Curious Absence of Uncertainty from Many (Most?) VisualizationsConsider the last few visualizations you encountered outside of scientific publications. Did they depict uncertainty? Probably not. Here's what Jessica Hullman discovered when she asked why. Inquiry into Matplotlib's Figures, Axes and GridspecMatplotlib is notoriously difficult to learn but if you're already familiar with it, this is a great guide to help you level-up your skills. CareerBecoming an Independent ResearcherThis post is a hard reality check on the path to Machine Learning Research. There's a lot of opportunity here but the competition and stakes are high. This followup thread from Julian Togelius is also worthwhile. Job Board
![]() Data Elixir is curated and maintained by @lonriesberg. For additional finds from around the web, follow Data Elixir on LinkedIn, Twitter or Facebook. |