Data is everywhere, and if we want to be able to do anything with it, we must find it first. Google has built a multibillion-dollar business on this premise by indexing the world’s information and allowing you to find it with a painfully simple interface.
Although much of Google’s search algorithm is proprietary, the most recent update incorporates an open-source, neural network-based technique for natural language processing (NLP) called Bidirectional Encoder Representations from Transformers (BERT), which Google released last year.
In short, it works by processing words in relation to all the other words in a sentence, rather than one by one in order.
In a blog post announcing the update, Google noted:
Particularly for longer, more conversational queries, or searches where prepositions like “for” and “to” matter a lot to the meaning, Search will be able to understand the context of the words in your query.
Image courtesy of Google
Currently, this improvement is only available in English in the United States, with plans for expansion to other locales and languages.
Mohammed Terry Jack (research engineer at London-based NLP startup, wluper) remarked that in most cases you won’t notice the difference, but for those that you do (such as negations), it is much better.
AI in the wild
At G2, we’re particularly excited by this development as it reflects two key trends:
- The productionalization of AI research: BERT has gone from an open-source project to a production-ready tool, which is improving Google search results.
- The monetization of open source: As we’ve previously discussed, businesses can build great, profitable technology on top of open source. Google’s release of BERT into the open-source wild allowed the company to hone it, leveraging the helpful assistance of the open-source community.