ISSUE 364 ยท November 30, 2021InsightStorm in the stratosphere: how the cloud will be reshuffledDo you think cloud stack consolidation is inevitable? Here's a reasonable take on how the next few years could play out. The Sobering Truth About Your Business IdeasThrough rigorous measurement and analysis, data science has shown that most business ideas will fail. It's not because of the data team. Sponsored LinkeBook: 101 Ways to Use Third-Party Data to Make Smarter DecisionsAWS Data Exchange has created a new eBook designed as a broad compilation of use cases submitted by AWS Marketplace Data Providers. Gain access to the strategies, insights, and behaviors necessary to take companies to the next level with cost-effective third-party data solutions in the cloud. Tutorials, Projects & OpinionsDoing Data Science for Social Good, ResponsiblyThere are a lot of "data for good" projects out there and they can be incredibly helpful for both participants and the organizations being served. But it's not all sunshine and roses. Here's what to watch for. Predicting viewership for Doctor Who episodesUsing a tidymodels workflow can make many modeling tasks more convenient, but sometimes you want more flexibility and control of how to handle your modeling objects. Here's how. Data Quality Validation for Python DataframesIn this post, Miguel Cabrera explores libraries that check the data quality of Pandas and Spark dataframes. Covers basic usage and the pros and cons of Great Expectations, Pandera, and Deequ/PyDeequ. Financial market data analysis with pandasNice introductory tutorial that walks through a simple time-series analysis of the stock market using pandas. Motion planning, scaling data pipelines, ML edge cases? Hear the latest.Top AI and ML speakers from Facebook AI, Cruise, Zoox, GE Healthcare (and more) unveil how to successfully deploy machine learning data operations. Register now for the iMerit ML DataOps Summit, a one-day free virtual event on December 2nd in partnership with TechCrunch.
Code & ToolsBookNLPBookNLP is a natural language processing pipeline that scales to books and other long documents in English. Handles a variety of tasks such as part-of-speech tagging, entity recognition, event tagging, and more. CareerWhat you can and cannot control in your job huntWhen you're looking for a job, most of the hiring process is out of your control. This post, from a data science interviewer perspective, focuses on what you can control and things you can do to stand out. Data Visualization![]() pybaobabdtIf you're building Decision Trees or Random Forests in Python, this new package uses colors and link widths to make it easy to interpret the visualizations, even when they're large. These are much nicer to work with than typical node-link visualizations. |