Combining Machine Learning with Credit Risk Scorecards

With all the hype around artificial intelligence, many of our clients are asking for evidence that AI can get better results in areas where other types of analysis are already used, such as credit risk assessment. With 25 years of experience in AI and machine learning, we can certainly provide this proof.

My colleague Scott Zoldi recently wrote a blog on how we use AI to build credit risk models . In this article, I would like to delve deeper into one of the examples that he has given, to show some of the explorations we do to make sure we get all the power of machine learning without losing the transparency that is important in the area of ​​credit risk.

How do you build a model with limited data?

A typical credit risk dashboard model generates a score reflecting the probability of default, using various customer characteristics as inputs to the model. These characteristics may be customer information deemed relevant to assess the probability of failure, provided that the information is also authorized by the regulations. The entry is grouped into different ranges of values ​​and a score weight is assigned to each of them. When scoring an individual, the score weights corresponding to the individual's information are summed to produce the score.

When constructing a scorecard model we need to "categorize" characteristics into ranges of values, and garbage cans are designed to maximize the separation between good known cases and bad cases known. This separation is measured using the weight of evidence (WoE), a logarithmic ratio of the fraction of good cases, and the fraction of bad cases in the bin. A WoE of 0 means that the trash has the same distribution of good and bad cases as the global population. The further this value is from 0, the more the bin is concentrated in one type of case relative to the other, compared to the overall population. A dashboard will usually have a few bins, with a regular distribution of WoE.

As Scott has described in his article, our project was to develop credit risk models for a portfolio of real estate stocks. Home mortgage loans have slowed significantly after the recession and for this reason we have had some bad examples in the sample development, and only a default rate of 0.2%. It was difficult to build models using traditional dashboard techniques.

The main reason is the inability of a dashboard model to interpolate information. The information must be explicitly provided to the dashboard template and the standard way to do this is to provide good and bad numbers for each box to calculate a reliable WoE. If the number of good or bad numbers is not enough, as in this case, this approach ends up giving rise to a noisy and hectic distribution of WoE in classes, which leads to performance card models. poor performance.

Entering the machine learning

Next, we used an automatic learning algorithm called Tree Ensemble Modeling or TEM. TEM involves the construction of several models tree, where each node of the tree is a variable that is divided into two sub-trees.

Each tree model that we build in TEM is built on a subset of the learning dataset, and only uses a handful of features entrance. This limits the degrees of freedom of the tree model, gives a shallow tree accordingly and ensures that the division of variables is limited. This allows us to respond more diligently to the requirement regarding the minimum number of positive and negative cases.

The following diagram shows an artistic rendition of a TEM, representing several shallow trees in a group or ensemble. The final output of the score, produced by Ensemble Modeling is usually an average of the scores of all the tree models making up the Ensemble.

Such a model can have thousands of trees and tens of thousands of parameters that have no simple interpretation. Unlike a scorecard, you can not tell a borrower, a regulator or even a risk analyst why someone has scored as he has done. This inability to explain why someone got a particular score is a big limitation of an approach like TEM.

However, by constructing a machine learning model, we were able to confirm that our dashboard approach was losing a significant amount of predictive power. Although it is not practical to use, the automatic learning score has outperformed the scorecard. Our next challenge was to try to reduce the performance gap between TEM models and scorecards.

<img class="alignleft size-full wp-image-32659" src="" alt=" Performance Chart "width =" 534 "height =" 554 "/>

Scorecarding of machine learning

FICO has already taken up this challenge several times: how to merge the practical advantages of a performance card (explicit, ability to capture domain knowledge and ease of execution in a production environment) with the In-depth knowledge of machine learning and AI, who can discover scorecard development models can not?

Over the years, we have developed practical ways to solve this problem. For example, we have developed mechanisms to impute domain knowledge into neural networks and other models of machine learning.

To impute the explicability, we constructed a tool called Scorecardizer . You can guess what he's doing with the name! Scorecardizer recodes patterns and previews uncovered using machine learning or AI and turns them into a set of dashboards. The tool tries to match the score distribution generated by an automatic learning algorithm like TEM, instead of relying on the WoE approach that we talked about earlier. Thus, instead of providing good and bad data points and directly computing the WoE, the score distribution in each bin derived from the machine learning model eventually provides an estimate of WoE.

Significantly, the final model is almost as predictive as the model of machine learning. A late validation of the final model shows that it works well over a period of time, as shown in the following figure.

<img class="alignleft size-full wp-image-32660" src="" alt=" Performance Chart "width =" 572 "height =" 541 "/>

The end result of the Scorecardizer solution is a strong and pleasant model. Our hybrid approach has overcome the limitations imposed by fewer serious cases. While previously it was impossible to build powerful scorecards for problematic spaces with such cases, the Scorec

The "ardizer" approach allows us to do this, whenever an automatic learning algorithm can be built to extract more signal from such datasets.

Scorecardizer is just one approach that FICO uses to leverage the power of AI in highly regulated areas where reasons related to loss of score is needed. It represents our commitment to extend AI to new areas for our customers, what we have been doing for 25 years. To see how we do it today, check out our page on Artificial Intelligence and Machine Learning .


Leave a Reply