S&P AI Benchmarks by Kensho

Finance Fundamentals

An industry-specific benchmark intended to help AI teams evaluate their Large Language Models' (LLMs) ability to understand and solve realistic finance problems.

Methodology

This benchmark tests LLMs for real world applications in business and finance, as these fields can require transparent and precise reasoning capabilities, along with a wide breadth of technical knowledge.

This benchmark is composed of three categories of assessment: core domain knowledge, quantity extraction, and quantitative reasoning. This evaluation is developed by financial and machine learning professionals, including Kensho's R&D team (NLP (researchers) and S&P Global team members from Market Intelligence, Ratings, and Corporate Finance teams.

Our evaluation set comprises 600 questions spanning three task categories. Our primary ranking criterion is the average accuracy across all categories. You are also able to view the category-level average score.

The research behind this finance benchmark draws upon various sources, including MMLU, FinQA, and TAT-QA. These works, along with detailed citations and motivations outlined in our paper, have significantly contributed to the development of this benchmark.

While there are existing LLM benchmark datasets (e.g., MT-Bench, BIG-bench, SuperGLUE, etc.), our Finance Fundamentals benchmark and corresponding leaderboard focus exclusively on measuring a model’s ability to understand and perform tasks concerning finance and business. Our questions allow for precise, objective evaluations since the responses are either numerical or options in the case of multiple-choice questions.

Domain Knowledge

Quantity Extraction

Quantitative Reasoning

Domain Knowledge: These tasks evaluates the model’s proficiency in financial domain knowledge, encompassing business and financial terminology, practices, and formulae. The 131 questions are drawn from FinKnow, a collection of multiple-choice question-answering datasets from CFA practice exams and MMLU, designed to assess an AI model’s proficiency in financial domain knowledge. These tasks encompass questions on business ethics, microeconomics, and professional accounting.

The questions in this task are formatted as below and include a field `options` with the multiple choice options. Answers for questions in this category are expected to be indices corresponding to the provided options.

{
	"id": str,
	"task": str,		    # 'FinKnow'
	"question": str,
	"options": List[str]
}

Finance Fundamentals

Methodology

Ready to find your place on the Finance Fundamentals Leaderboard?Submit now

Ready to find your place on the Finance Fundamentals Leaderboard?
Submit now