The Evolution of Service Level Agreements: Why AI Evaluations Matter in Mortgage

By Tela G. Mathias

Traditional service level agreements (SLAs) are how we measure technology performance in the mortgage industry, and really in all software solutions. These agreements historically focused on quantifiable metrics such as system uptime, response times, and service availability. The attempts to scale and increase adoption of generative artificial intelligence (genAI)-based solutions in mortgage has created a need for more sophisticated performance measures that go beyond traditional operational metrics.

Some typical "traditional" service level agreements.

While traditional SLAs effectively measure whether a web-based loan origination system (LOS) loads quickly or if an automated underwriting system (AUS) remains accessible, they fall short in evaluating the quality and reliability of generative artificial intelligence (genAI). A system can maintain perfect uptime while delivering inaccurate or biased results. This gap between operational performance and actual effectiveness necessitates a new framework for measuring AI system performance.

AI evaluations shift how we measure technology performance in mortgage lending. These systematic assessment methods focus on the quality and reliability of AI outputs, rather than just system operational performance. For instance, let’s imagine a hypothetical genAI agent whose objective is to resolve complaints from consumers regarding escrow shock, an unexpected and significant increase in a homeowner's monthly mortgage payment due to changes in their escrow account requirements.

A typical escrow shock related complaint from the CFPB database.

This agent monitors email to identify complaints of this type, runs a root cause analyzer, creates a management action plan, kicks off a workflow for a human in the loop, and presents the contextual plan to the operator for review and communication to the homeowner. We might need an evaluation framework to measure:

Compliant classification accuracy. Did the AI system find the right complaints? Did it miss any?
Root cause analysis quality. Did the AI system correctly determine the root cause of the complaint?
Action plan effectiveness. Did the AI system create the right action plan? Did it correctly report the complaint to CFPB? Did it annotate the system correctly?

These types of metrics fit well within an organization’s responsible AI (RAI) framework and help us evaluate our performance against the reliability pillar, especially.

A responsible AI framework, sandwiching the typical RAI pillars between guardrails and evaluations.

The emergence of open-source evaluation tools has made it at least feasible, even if technically challenging, for mortgage companies to implement RAI frameworks. Tools like promptfoo enable systematic testing of large language models, helping organizations:

Validate model outputs against established criteria
Identify potential security vulnerabilities
Ensure consistent performance across different scenarios
Monitor and maintain compliance with industry standards

As genAI continues to transform mortgage lending, the industry should adopt evaluation-based performance metrics that match the sophistication of these new technologies. This evolution from traditional SLAs to evaluation frameworks will help ensure that AI systems operate reliably and deliver trustworthy, compliant, and fair results.

Organizations that adapt their performance measurement approaches to include evaluations will be better positioned to leverage AI technologies effectively while maintaining the high standards of accuracy and fairness. I believe in and will encourage regulators and housing agencies to look for evaluation-based performance frameworks in genAI based systems.

‍

Article

The Evolution of Service Level Agreements: Why AI Evaluations Matter in Mortgage

Similar posts

Tela Mathias

Article

From Program Management to Program Efficiency and Innovation

Tela Mathias

Article

The Evolution of Service Level Agreements: Why AI Evaluations Matter in Mortgage

Tela Mathias

Article

An Impassioned Plea for AI-Ready Mortgage Policy Data