— In the News —
Using deep learning techniques, researchers are now able to read the text of the genome and intrerpret how it works. In other words, they can look at a mutation and figure out why it causes a particular disease, rather than just noting the connection. This is a game-changing approach that will transform medicine.
The operating system from the movie Her might be a ways off still but there have been some great advances in machine learning software this year. This is a well-linked article that highlights the year's biggest accomplishments.
Schools are increasingly turning to consulting firms that use Big Data to screen candidates. Aspiring teachers are simplified into dozens of data points, from their SAT scores to their appreciation for art to their ability to complete geometric patterns. That data is then fed into an algorithm that spits out a score, predicting how effective each candidate might be. This is surely just the beginning of what the future holds for anyone looking for a new job. Read on to find out how well it's working.
— Tools and Techniques —
For us, data science is more than a skill or profession. It is a calling. A way of life.
We challenge you to use data science for social good.
Solve a critical problem by mapping the ocean's microscopic ecosystem - a key step in measuring the health of the world's oceans. Accept the challenge. Make history.
Interested? Here's a tutorial to get you started >>
Using large amounts of unannotated text, word2vec learns relationships between words automatically and got quite a bit of attention when it was released a year ago by researchers at Google. This article explores how the space has changed since then.
— Resources —
Partially Derivative is a weekly podcast about data and data science. This week they discuss 10 things they're looking forward to in 2015. Their list includes things like ggplot for Python, in-memory database manipulation, data science for the Internet of Things, reproducible research, and some interesting thoughts about the diversification of data science skills. This is their best episode so far.
Here's a shortlist of the best free books, online courses, tutorials, papers, websites, and tools for getting started with Deep Learning. Highly recommended.
Data MOOCs starting soon!
Thanks to Richard from the Data Detectives of Boulder for reminding me about the upcoming MOOC season. Coursera has a number of data-related courses including things like Pattern Discovery, Exploratory Data Analysis, Machine Learning, R Programming, and Statistical Inference. Udacity has some similar offerings and adds Data Visualization with D3.js, Hadoop, and Data Wrangling. Classes start soon:
— Data Viz —
HTML widgets work just like R plots except they produce interactive web visualizations. A line or two of R code is all that's needed. Current widgets include bindings for Leaflet Maps, Dygraphs, D3 network graphs, DataTables, three.js and sparklines. The package is available on GitHub and the project has an active developer community.
This is a brilliant set of tools. There are a variety of interactive visualizations here for exploring public data from Boston's subway system. The data itself is static but this is a great demonstration of the variety of ways that data can be sliced and diced for differing use cases. Along with the interactives, the description is easy to follow and the code is available on GitHub. Highly recommended.
There are some really great data viz conferences these days. In case you weren't able to attend EYEO, Tapestry, Strata, Visualized, OpenVis, the Graphical Web, the International Journalism Festival, etc, etc, here's your chance to catch the most important keynotes of the year.