ISSUE 339 · June 8, 2021In the NewsMachine learning is booming in medicine. It’s also facing a credibility crisis.Experts say fatal flaws in the methodology of many machine learning papers make it hard to have faith in published claims. Sponsored LinkOnline Data Science Programs from Drexel UniversityFind your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more. Tutorials, Projects & OpinionsHow Airbnb Achieved Metric Consistency at Scale - Part 2This second post in a series walks through the core design principles of Airbnb's in-house metric platform called "Minerva." Specifically, here's how Airbnb defines, versions, and back-fills datasets to effectively manage its data at scale. This a useful post that includes lots of details and rationale for why things work they way they do. Building Effective Data Science TeamsWhether you're the first data person at your organization or leading a team of hundreds, creating an effective data team depends on a lot more than the tech you're using. In this transcript of a recent panel discussion, data science leaders from a variety of organizations share their insights. Saturn Cloud: End-to-End Data Science & Machine Learning PlatformJoin thousands of data scientists creating faster and more scalable Python projects with Hosted Dask, GPUs, and more. Switch from working on a laptop to a cloud instance with hosted Jupyter, use jobs and deployments to run repeating tasks, host APIs, and deploy dashboards. Join for free here Code & ToolsJupyterLite - browser-based interactive computingJupyterLite is a WebAssembly (Wasm) powered Jupyter distribution that runs entirely in the browser. This is an early release, built from the ground-up using JupyterLab components and extensions. Real-time collaboration is said to be Coming Soon! Follow the links for a live demo. Stemma: Helping you trust your dataAmundsen is the leading open-source data catalog. It has the largest and fastest-growing community and the most integrations. Stemma brings you Amundsen and more, with enterprise management and richer metadata. ResourcesData Science at the Command Line, 2nd editionThis hands-on guide shows how command line tools can help you be more be efficient and productive with data. The new second edition will be published in October and is available to read online while it's being finalized. Read the first edition here >> R for Public HealthNice collection of resources for people who are interested in working with R in a public health context. Includes courses, books, and key packages to know. Data VisualizationVisualizing Distributions with Raincloud PlotsGreat tutorial that shows how boxplots can be problematic, how to improve them, and alternative techniques that can be used to show summary statistics, as well as the true distribution of the raw data. Includes lots of examples and sample code using ggplot2. Scientific visualizations in the browser for PythonThey can be super effective but if you're not a web expert, building dynamic visualizations for the web can be daunting. In this guide, Patrick Mineault presents a variety of web visualization frameworks, starting with the lowest barrier to entry as possible. It's written from the perspective of a Python user but for the most part, Python isn't required. Sign up to get Data Elixir's data science newsletter in your Inbox >> Data Elixir is curated and maintained by Lon Riesberg. If you have questions or suggestions for the newsletter, just reply back to this email. To find specific content from prior issues or to research topics, check out the catalogued Archives on Data Elixir's Search Page >> |