(NEW YORK) — A team of graduate students at Columbia University announced plans to develop the Roar-ee Benchmark, a new framework designed to evaluate and compare large language models in the rapidly evolving AI landscape.
The student-led initiative will bring together AI and machine learning masters and PhD candidates who recognized a critical gap in the current LLM assessment ecosystem. As corporations struggle to identify optimal models for their specific use cases, the team aims to provide focused, academically-backed analysis.
"With the emergence of new LLM models and updates, we felt a LLM benchmark, hosted and developed by a team of AI and LLM masters and PhD students at Columbia, would provide real insight for business exploring the best use case model for their industry, and what better skilled place than Columbia University," said Tom Bustamante, a project team member.
Named after the Columbia mascot, Roar-ee Benchmark is being developed in partnership with Next Realm AI and Data Lair, who recently highlighted the initiative during a livestream presentation on November 1, 2025, discussing the potential benefits and use cases for industry enterprises.
Riddhi Dinesh Oza, a master’s student in Data Science at Columbia University and member of the Data Science Institute Student Council (DSISC), expressed enthusiasm about the initiative as one of the project's founding team members. She emphasized that Columbia's robust resources—including its Data Science Institute and exceptional student body—create an ideal environment for establishing a credible LLM benchmark.
What is an LLM Benchmark?
LLM benchmarks serve as standardized testing environments that measure model performance across various tasks, providing essential data for developers and businesses looking to make informed deployment decisions.
The team is already taking on interest from skilled Columbia students, and expects to release their initial framework before year-end, positioning Roar-ee as a trusted resource in the competitive AI sector.