LLM Benchmarks - Search News

This LLM framework takes a first stab at benchmarking Big AI’s compliance with the EU AI Act

While most countries’ lawmakers are still discussing how to put guardrails around artificial intelligence, the European Union is ahead of the pack, having passed a risk-based framework for regulating ...

VentureBeat

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A team of Abacus.AI, New York University, ...

Geeky Gadgets

New AgentBench LLM AI model benchmarking tool and leaderboards

If you are interested in learning more about how to benchmark AI large language models or LLMs. a new benchmarking tool, Agent Bench, has emerged as a game-changer. This innovative tool has been ...

Business Wire

Cognite Launches the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents

AUSTIN, Texas & OSLO, Norway--(BUSINESS WIRE)--Cognite, the global leader in AI for industry, today announced the launch of the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents. The ...

Security

Simbian launches new security benchmark with AI SOC LLM Leaderboard

Simbian today announced the “AI SOC LLM Leaderboard,” a comprehensive benchmark to measure LLM performance in Security Operations Centers (SOCs). The new benchmark compares LLMs across a diverse range ...

Forbes

NVIDIA H100 Dominates New MLPerf v3.0 Benchmark Results

To know how a system performs across a range of AI workloads, you look at its MLPerf benchmark numbers. AI is rapidly evolving, with generative AI workloads becoming increasingly prominent, and MLPerf ...

Infosecurity-magazine.com

Academics Develop Testing Benchmark for LLMs in Cyber Threat Intelligence

Large language models (LLMs) are increasingly used for cyber defense applications, although concerns about their reliability and accuracy remain a significant limitation in critical use cases. A team ...

VentureBeat

MLPerf 3.1 adds large language model benchmarks for inference

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More MLCommons is growing its suite of MLPerf AI benchmarks with the addition ...

Geeky Gadgets

AI Benchmarks Are Broken : The Leaderboard Illusion

What if the tools we trust to measure progress are actually holding us back? In the rapidly evolving world of large language models (LLMs), AI benchmarks and leaderboards have become the gold standard ...

SiliconANGLE

Nvidia claims first place in MLCommon’s first benchmarks for LLM inference, but Intel is a close second

MLCommons, the open engineering consortium for benchmarking the performance of chipsets for artificial intelligence, today unveiled the results of a new test that’s geared to determine how quickly ...

XDA Developers on MSN

I turned my self-hosted LLM from a glorified chat box into a real AI assistant

After months of testing local LLMs, I found that productivity depends on tools, not just models.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results