Data Science and Machine Learning
-
Session Language |English
We all heard the phrase "data is the new oil", but just like crude oil, data needs to be processed before it can be put to use. When it comes to unstructured data such as images, videos, and text, often this preparation stage includes data annotation by human workforce. Like any manual effort, this is a time-consuming and expensive process. To make matters worse, deep artificial neural networks, the current state of the art for both computer vision and natural language processing, are the algorithms that require the largest amounts of data to train. One way to save money and time on data annotation is to get what is normally the end product, the model, involved in the process. In this talk we are going to look at some of the approaches of model-assisted data annotation (namely, auto-labeling and active learning) as well as the common use cases that these methods are best suited for.