Salesforce has released the world’s first benchmark for large language models (LLMs) in CRM. This benchmark gives businesses the tools to test LLMs on accuracy, cost, speed and trust and safety – the key metrics for CRM.
The new benchmark helps businesses navigate the growing landscape of generative AI models for common sales and service use cases like prospecting, lead nurturing and sales opportunity summaries. A public leaderboard accompanies this framework so you can find the best LLM for your CRM. Salesforce is committed to expanding the benchmark by including new use case scenarios and incorporating evaluations of fine-tuned LLMs.
“With AI, enterprise leaders need to find the right balance of performance, accuracy, responsibility and cost to unlock the full potential of generative AI,” said Silvio Savarese, EVP & Chief Scientist at Salesforce AI Research. “Salesforce’s LLM Benchmark for CRM is a big step forward in how businesses evaluate their AI strategy. It gives visibility into AI deployment and can get value faster for CRM use cases. Ongoing commitment is to evolve this benchmark to align with technological advancements.”
The need for a benchmark comes from the limitations of existing LLM evaluations which often lack business relevance and human expert assessments. Salesforce’s new framework addresses these gaps by using real-world CRM data and expert evaluations so businesses can make informed decisions about integrating generative AI into their CRM systems.
This includes factuality, completeness, conciseness and instruction-following, each on a 4 point scale. Accurate predictions and recommendations make teams more effective and customer experiences better. Even models with lower initial accuracy can be improved with prompt engineering and fine-tuning. LLMs are categorized as high, medium or low cost based on operational expenses. So customers can evaluate cost vs their budget and resource strategy. Speed measures the LLM’s responsiveness and processing time. Faster response times means better user experience and sales and service teams can respond to customer needs faster. This measures the model’s ability to protect customer data, comply with privacy regulations, secure information and avoid bias and toxicity.
Built by Salesforce AI Research, the benchmark’s unique feature is the human expert evaluations. It uses real CRM data so businesses can make tactical decisions about AI in their business. The use of real-world datasets from Salesforce and its customers makes it different from others.
The Tableau Dashboard allows you to filter by CRM area, use case type and evaluation method (manual or automated). You can also filter by LLM provider.
Einstein, Salesforce’s Autonomous AI Service Agent
In addition to the LLM, Salesforce has released Einstein, its first ever autonomous AI service agent. This replaces traditional chatbots and understands and acts on any service issue without pre-programmed scenarios, so customer service gets a big boost.
“Einstein Service Agent is the future where human and digital agents work together to elevate customer experiences,” said Kishan Chetan, General Manager of Salesforce’s Service Cloud. “Our first ever autonomous AI agent will not only do service tasks on its own but also change how service teams work, making them more efficient and productive.”
Einstein uses LLMs to analyze customer messages, autonomously decides what to do and crafts responses that match a company’s brand voice and guidelines. This means service teams can offload the mundane and have their human agents focus on the complex stuff.
Salesforce’s new benchmark and Einstein are part of the company’s push to advance AI in CRM. By giving businesses the tools to test and deploy generative AI models, Salesforce wants to help businesses make decisions that drive growth and better customer service outcomes.
Source : Salesforce News