Labelbox, a startup that focuses on data labeling, announced its Series D funding on January 6, 2022, raising $110 million. Softbank’s Vision Fund II led this round.
Labelbox fits squarely into G2’s category for Data Labeling Software, which allows businesses to:
- Integrate a managed workforce, data labeling service, or both
- Ensure labels are accurate and consistent
- Give the user the ability to view analytics that monitor accuracy, labeling speed, or both
- Allow the annotated data to be integrated into data science and machine learning platforms to build machine learning models
However, their focus goes beyond labeling data and encompasses the entire training data space.
Why is training data important?
Within the world of AI, data reigns supreme. Looking at the most common flavor of AI, known as machine learning, there are two primary varieties:
- Supervised learning: Supervised learning requires labeled data. For example, one might have images of printers that have been labeled as “printers.” Using this data (known as training data), one can train an algorithm to detect whether or not a given image is of a printer.
- Unsupervised learning: Unsupervised learning does not require labeled data. In such a case, the algorithm detects patterns in the data without any explicit labels. For example, an algorithm might detect spam emails without any labeled data that explicitly designates emails as spam or not.
Regarding the former, there can be no supervised learning without training data. But where do these labels come from?
In some cases, these labels may come for free, as part and parcel of the data itself. For example, a retail company that tracks its sales will likely have a plethora of labeled data for their sales, with labels for their customer demographics, the price they paid for a given product, etc. However, in many cases, companies might have an abundance of data, but no labels to speak of. Thus, a business might be looking to train a computer vision algorithm that detects stop signs and may even have thousands of images of the signs, but where will they get the labels from?
Data labeling software to the rescue
If a business has a robust unlabeled dataset, data labeling software can come in handy. It allows them to create labels for their data, using either internal or external labelers. These tools provide a platform for labeling data of different varieties, such as images, video, and audio. Some platforms, such as Labelbox, have model-assisted labeling to import prelabeled data for labeling teams to review and adjust directly.
Once the data has been labeled, the labels can go through a QA process to ensure accuracy. After this, the labeled data can be used as training data for supervised learning. Such algorithms can be deployed in applications and more. In the example above, with the labeled stop sign data, a business can train a computer vision algorithm to detect stop signs, which can help ensure autonomous vehicles automatically halt at stop signs.
According to its announcement, 80% of Labelbox’s business in 2021 came from enterprises. However, data labeling is not just an enterprise solution meant only for the big players. No matter the size of your company, as long as you have a large dataset (1,000 records as a bare minimum), you can benefit from this software.
At G2, we saw that over 60% of the reviews for data labeling software in Q4 2021 came from small businesses.
When data labeling isn’t enough
A word of caution: just sticking a label on something will not cut it. At G2, we are seeing data labeling solutions like Labelbox remarket themselves or expand into a training data platform, providing features such as:
- Data management
- Global view of labeling activity
- Dedicated workspaces for separate AI initiatives
Anyone can send off some data to a third-party data labeling service, but a training data platform is the way to go to ensure accuracy, security, and efficiency.
Edited by Shanti S Nair