Labelbox’s $110 Million Fundraise and the Importance of Labeled Data

January 13, 2022

Labelbox, a startup that focuses on data labeling, announced its Series D funding on January 6, 2022, raising $110 million. Softbank’s Vision Fund II led this round.

However, their focus goes beyond labeling data and encompasses the entire training data space. 

Data Labeling Software ➜

Why is training data important?

Within the world of AI, data reigns supreme. Looking at the most common flavor of AI, known as machine learning, there are two primary varieties:

  • Supervised learning: Supervised learning requires labeled data. For example, one might have images of printers that have been labeled as “printers.” Using this data (known as training data), one can train an algorithm to detect whether or not a given image is of a printer.
  • Unsupervised learning: Unsupervised learning does not require labeled data. In such a case, the algorithm detects patterns in the data without any explicit labels. For example, an algorithm might detect spam emails without any labeled data that explicitly designates emails as spam or not.

Regarding the former, there can be no supervised learning without training data. But where do these labels come from? 

In some cases, these labels may come for free, as part and parcel of the data itself. For example, a retail company that tracks its sales will likely have a plethora of labeled data for their sales, with labels for their customer demographics, the price they paid for a given product, etc. However, in many cases, companies might have an abundance of data, but no labels to speak of. Thus, a business might be looking to train a computer vision algorithm that detects stop signs and may even have thousands of images of the signs, but where will they get the labels from?

Data labeling software to the rescue

If a business has a robust unlabeled dataset, data labeling software can come in handy. It allows them to create labels for their data, using either internal or external labelers. These tools provide a platform for labeling data of different varieties, such as images, video, and audio. Some platforms, such as Labelbox, have ​​model-assisted labeling to import prelabeled data for labeling teams to review and adjust directly.

Once the data has been labeled, the labels can go through a QA process to ensure accuracy. After this, the labeled data can be used as training data for supervised learning. Such algorithms can be deployed in applications and more. In the example above, with the labeled stop sign data, a business can train a computer vision algorithm to detect stop signs, which can help ensure autonomous vehicles automatically halt at stop signs.

According to its announcement, 80% of Labelbox’s business in 2021 came from enterprises. However, data labeling is not just an enterprise solution meant only for the big players. No matter the size of your company, as long as you have a large dataset (1,000 records as a bare minimum), you can benefit from this software.    

At G2, we saw that over 60% of the reviews for data labeling software in Q4 2021 came from small businesses.

When data labeling isn’t enough

A word of caution: just sticking a label on something will not cut it. At G2, we are seeing data labeling solutions like Labelbox remarket themselves or expand into a training data platform, providing features such as:

  • Data management
  • Global view of labeling activity 
  • Dedicated workspaces for separate AI initiatives

Anyone can send off some data to a third-party data labeling service, but a training data platform is the way to go to ensure accuracy, security, and efficiency.

Read more: G2's guide to annotation

 Edited by Shanti S Nair

Labelbox’s $110 Million Fundraise and the Importance of Labeled Data Labelbox, a data labeling startup, raised $110 million in its Series D funding, indicating the data labeling space is hot with investor interest.
Matthew Miller Matthew Miller is a research and data enthusiast with a knack for understanding and conveying market trends effectively. With experience in journalism, education, and AI, he has honed his skills in various industries. Currently a Senior Research Analyst at G2, Matthew focuses on AI, automation, and analytics, providing insights and conducting research for vendors in these fields. He has a strong background in linguistics, having worked as a Hebrew and Yiddish Translator and an Expert Hebrew Linguist, and has co-founded VAICE, a non-profit voice tech consultancy firm.