Bing Search Updates: Faster, More Precise Results

Bing Search Update: Faster, More Precise Results

Before optimization, Bing’s original transformer model had a 95th percentile latency of 4.76 seconds per batch (20 queries) and a throughput of 4.2 queries per second per instance.

With TensorRT-LLM, the latency was reduced to 3.03 seconds per batch, and throughput increased to 6.6 queries per second per instance.

This represents a 36% reduction in latency and a 57% decrease in operational costs.

The company states:

“… our product is built on the foundation of providing the best results, and we will not compromise on quality for speed. This is where TensorRT-LLM comes into play, reducing model inference time and, consequently, the end-to-end experience latency without sacrificing result quality.”

Benefits For Bing Users

This update brings several potential benefits to Bing users:

Faster search results with optimized inference and quicker response times
Improved accuracy through enhanced capabilities of SLM models, delivering more contextualized results
Cost efficiency, allowing Bing to invest in further innovations and improvements

Why Bing’s Move to LLM/SLM Models Matters

Bing’s switch to LLM/SLM models and TensorRT optimization could impact the future of search.

As users ask more complex questions, search engines need to better understand and deliver relevant results quickly. Bing aims to do that using smaller language models and advanced optimization techniques.

While we’ll have to wait and see the full impact, Bing’s move sets the stage for a new chapter in search.

Featured Image: mindea/Shutterstock

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *