January 13, 2022
by Matthew Miller
Labelbox, a startup that focuses on data labeling, announced its Series D funding on January 6, 2022, raising $110 million. Softbank’s Vision Fund II led this round.
Labelbox fits squarely into G2’s category for Data Labeling Software, which allows businesses to:
However, their focus goes beyond labeling data and encompasses the entire training data space.
Within the world of AI, data reigns supreme. Looking at the most common flavor of AI, known as machine learning, there are two primary varieties:
Regarding the former, there can be no supervised learning without training data. But where do these labels come from?
In some cases, these labels may come for free, as part and parcel of the data itself. For example, a retail company that tracks its sales will likely have a plethora of labeled data for their sales, with labels for their customer demographics, the price they paid for a given product, etc. However, in many cases, companies might have an abundance of data, but no labels to speak of. Thus, a business might be looking to train a computer vision algorithm that detects stop signs and may even have thousands of images of the signs, but where will they get the labels from?
If a business has a robust unlabeled dataset, data labeling software can come in handy. It allows them to create labels for their data, using either internal or external labelers. These tools provide a platform for labeling data of different varieties, such as images, video, and audio. Some platforms, such as Labelbox, have model-assisted labeling to import prelabeled data for labeling teams to review and adjust directly.
Once the data has been labeled, the labels can go through a QA process to ensure accuracy. After this, the labeled data can be used as training data for supervised learning. Such algorithms can be deployed in applications and more. In the example above, with the labeled stop sign data, a business can train a computer vision algorithm to detect stop signs, which can help ensure autonomous vehicles automatically halt at stop signs.
According to its announcement, 80% of Labelbox’s business in 2021 came from enterprises. However, data labeling is not just an enterprise solution meant only for the big players. No matter the size of your company, as long as you have a large dataset (1,000 records as a bare minimum), you can benefit from this software.
At G2, we saw that over 60% of the reviews for data labeling software in Q4 2021 came from small businesses.
A word of caution: just sticking a label on something will not cut it. At G2, we are seeing data labeling solutions like Labelbox remarket themselves or expand into a training data platform, providing features such as:
Anyone can send off some data to a third-party data labeling service, but a training data platform is the way to go to ensure accuracy, security, and efficiency.
| Read more: G2's guide to annotation→ |
Edited by Shanti S Nair
Matthew Miller is a former research and data enthusiast with a knack for understanding and conveying market trends effectively. With experience in journalism, education, and AI, he has honed his skills in various industries. Currently a Senior Research Analyst at G2, Matthew focuses on AI, automation, and analytics, providing insights and conducting research for vendors in these fields. He has a strong background in linguistics, having worked as a Hebrew and Yiddish Translator and an Expert Hebrew Linguist, and has co-founded VAICE, a non-profit voice tech consultancy firm.
Ever wondered how machines learn from the data we feed them? It’s not a simple case of writing...
by Tanuja Bahirat
Boston-based DataRobot, a company which provides users with a data science and machine...
by Matthew Miller
As we’ve discussed previously (see The Data Toolbox: The Expanding Domain of AI & Analytics),...
by Matthew Miller