By Tela G. Mathias
Traditional service level agreements (SLAs) are how we measure technology performance in the mortgage industry, and really in all software solutions. These agreements historically focused on quantifiable metrics such as system uptime, response times, and service availability. The attempts to scale and increase adoption of generative artificial intelligence (genAI)-based solutions in mortgage has created a need for more sophisticated performance measures that go beyond traditional operational metrics.
While traditional SLAs effectively measure whether a web-based loan origination system (LOS) loads quickly or if an automated underwriting system (AUS) remains accessible, they fall short in evaluating the quality and reliability of generative artificial intelligence (genAI). A system can maintain perfect uptime while delivering inaccurate or biased results. This gap between operational performance and actual effectiveness necessitates a new framework for measuring AI system performance.
AI evaluations shift how we measure technology performance in mortgage lending. These systematic assessment methods focus on the quality and reliability of AI outputs, rather than just system operational performance. For instance, let’s imagine a hypothetical genAI agent whose objective is to resolve complaints from consumers regarding escrow shock, an unexpected and significant increase in a homeowner's monthly mortgage payment due to changes in their escrow account requirements.
This agent monitors email to identify complaints of this type, runs a root cause analyzer, creates a management action plan, kicks off a workflow for a human in the loop, and presents the contextual plan to the operator for review and communication to the homeowner. We might need an evaluation framework to measure:
These types of metrics fit well within an organization’s responsible AI (RAI) framework and help us evaluate our performance against the reliability pillar, especially.
The emergence of open-source evaluation tools has made it at least feasible, even if technically challenging, for mortgage companies to implement RAI frameworks. Tools like promptfoo enable systematic testing of large language models, helping organizations:
As genAI continues to transform mortgage lending, the industry should adopt evaluation-based performance metrics that match the sophistication of these new technologies. This evolution from traditional SLAs to evaluation frameworks will help ensure that AI systems operate reliably and deliver trustworthy, compliant, and fair results.
Organizations that adapt their performance measurement approaches to include evaluations will be better positioned to leverage AI technologies effectively while maintaining the high standards of accuracy and fairness. I believe in and will encourage regulators and housing agencies to look for evaluation-based performance frameworks in genAI based systems.