Microsoft has announced updates to Bing’s search infrastructure incorporating large language models (LLMs), small language models (SLMs), and new optimization techniques.
This update aims to improve performance and reduce costs in search result delivery.
In an announcement, the company states:
“At Bing, we are always pushing the boundaries of search technology. Leveraging both Large Language Models (LLMs) and Small Language Models (SLMs) marks a significant milestone in enhancing our search capabilities. While transformer models have served us well, the growing complexity of search queries necessitated more powerful models.”
Performance Gains
Using LLMs in search systems can create problems with speed and cost.
To solve these problems, Bing has trained SLMs, which it claims are 100 times faster than LLMs.
The announcement reads:
“LLMs can be expensive to serve and slow. To improve efficiency, we trained SLM models (~100x throughput improvement over LLM), which process and understand search queries more precisely.”
Bing also uses NVIDIA TensorRT-LLM to improve how well SLMs work.
TensorRT-LLM is a tool that helps reduce the time and cost of running large models on NVIDIA GPUs.
Impact On “Deep Search”
According to a technical report from Microsoft, integrating Nvidia’s TensorRT-LLM technology has enhanced the company’s “Deep Search” feature.
Deep Search leverages SLMs in real time to provide relevant web results.