proza libre test 1

proza libre test 1

ML & DS shades of Credit Risk Management

In the introductory article we review 6 inputs to risk modeling, appreciation of reserves and why they matter, and demonstrate the specific features of ML & DS application in credit risk management without diving deep into the subject area. You will better understand risk definition, understanding of credit pipeline, review 6 inputs to risk modeling, appreciation of reserves and why they matter.

Then, we will review the bank-sensitive issues of modelling methodology, processing of credit risk components and approaches to calibration and validation. 
The publication is based on our expertise in development and implementation of analytical models in the banking industry.

So, let’s get down to business!

What do we put at risk?

Simply, credit risk is about predicting the likelihood of customers breaching their loan agreements.
Let's focus on the following three tasks that arise in credit risk management: 
  1. Rating modelling
  2. Credit offering
  3. Calculation of Expected Losses

Why do we take these three tasks?
  • They are always essential 
  • They are quite easy to transfer to other industries (telecommunications, industry, insurance)
  • There is plenty space for ML & DL methods
The general classification of risks of financial institutions and the context you can find in the review [1].

Credit Pipeline

The scheme of the credit workflow is as follows:

The above scheme is simplified. For instance, the workflow review is limited to a credit product, leaving aside marketing issues, such as marketing optimization, product cannibalization, customer attrition and etc. This pipeline does not contain pre-scoring, expert rating adjustment, application of stop factors by underwriters. Stop factors imply restrictions related to bank product structure. For example, when the customer is in the list of bankrupts or there is delinquency on loans granted by other banks.

Rating Modelling

The objective of rating modelling is to develop a customer rating model for further categorization. Rating takes into account various negative events, such as decline in creditworthiness of the borrower, bankruptcy etc.

We can have 6 categories and number all the subsequent "based on" sections.

Expert Opinion:
  1. Application scoring is used for new customers and customers with either short or ancient or irrelevant history at the financial institution. In order to develop a rating model of this type, it is important to get data from the customer profile and registration form, payment history from other financial institutions provided by a credit bureau as well as information whether the customer has been included into various negative lists. For instance, the black book of legal entities kept by the Central Bank. Application scoring is used to take the decision to grant a loan to the applicant.
  2. Behavioral scoring is applied to assign rating to the customers with actual history. The behavioral attributes within the bank, such as customer’s turnover, payment discipline on other products of the bank, are playing the leading role for this model. Behavioral scoring is used to calculate and adjust the amount of reserved funds, but we will talk about it later on.

Based on requirements to the model output:
  1. Relative rating. The quality of categorization or relative order of customers within the rating is crucial. The absolute value has no impact on the final decision.
  2. Absolute rating. The absolute scoring value as well as the algorithm of its recalculation into default probability are significant. The banks often set the default probability threshold when the loan can be granted to the customer, Therefore it is required to thoroughly define the absolute value of default probability for every customer.

Based on the impact of expert opinion on final decision:
  1. Statistical model. The weights are defined based on statistical analysis of retrospective data. Expert adjustments are made at the attributes selection and sample preparation stages.
  2. Expert model. The final weights of the factors are set manually or semi automatically considering historic default cases. Altman’s Z-score is a classic example. [2].

Based on the level of automation of the decision-making process:
  1. The assigned rating is automatically transmitted down the conveyor without any manual adjustments for the majority of customers. Part of the customers undergo manual checks as part of online monitoring of the model operation.
  2. The assigned rating serves as an additional tool for the owner of the model and for the underwriter.

Based on the level of utilization of information on external environment:
  1. Stand-alone approach. The factors of the model do not take into account customer’s interaction with other customers. The basis is made of behavioral attributes on financial products. Impact of the external environment is considered either through adjustment procedure or through the set of check boxes on negative information with other customers without details.
  2. Supply chain finance approach considers information on customer’s ties with other borrowers. In the first place, it takes into account transactions history, economic and/or legal affiliation with other customers or family ties for individuals. The more information is available, the more accurate  the forecast (not only at the customer level but also at the level of the deal [3]).

Based on the level of engagement into the general workflow:
  1. The output of the rating model is used locally. The task is not integrated with other processes as a rule. It may lead to additional requirements to rating maintenance, such as adjustments. For instance, rating of the company is adjusted depending on the level of government support.
  2. The output of the model serves as an input for the next process, that is an integral part of a larger application. It is important to consider the specific features of this external process, since it may influence the requirements to development and validation of the rating model. 

Overview of the solution For general overview of the solution of this problem please refer to [1][4][5], [6]. The details of the project we will cover in the next article of the series dedicated to development methodology.

We shall also briefly mention the current trend on increasing quality of the rating-based models under development:
  1. Search of new information and/or data sources. For example, geo analytics [8], social networks [9], Operator of Fiscal Data (OFD).
  2. Utilization of advanced algorithms for modelling purposes. XGBoost increasingly replaces standard scorecards based on logistics regression.
  3. Utilization of advanced algorithms to find interconnections (graph analytics) and generate specific attributes (text-mining).
  4. Operationalization of models, i.e. model integration into automatic pipeline development-implementation-monitoring-relearning, to decrease risks of modelling and process automation, so called ModelOps solutions [10].

Rating modelling rarely forms a separate task, it is increasingly viewed together with other tasks as part of an applied solution to more general problems (including credit offering). So, let us get down to the credit offering.

Credit Offering or How to Make an Offer One Cannot Refuse

Rating model output, that is the absolute value of probability of default (PD), might be used to solve the credit offering issue. First and foremost, credit offering implies the task to set the initial limit for a customer.

Of course, the PD value alone, that is forecast of default probability, is not enough to define the optimal limit. It is important to see the acceptable limit domain in order to make a reasonable offer to customers. The amount shall as a minimum indirectly reflect customer needs and debt service ability.

Turnover of the customer's own funds on non-credit products might serve as a reference point.
What else should we know? Cost structure of the credit helps better understand the task. The scheme of the credit cost structure is given below (refer to [11]):
“Resource” is the cost of money used to grant a loan, for instance, deposit rate that attracts investors and ensures the required money supply while “Margin” is the expected return on the loan. “Risk” is deduction for borrower default. “Expenses” stands for acquisition and service costs. 
In the above structure rating modelling can be applied to define the size and structure of the “Risk” bar. “Resource” mostly depends on the key rate of the Central Bank. “Expenses” and “margin” are product components and are often defined in the product data sheet. In other words, “Risk” is only one of the components influencing the final profitability of the deal.
What could be done with other components? It seems like a new optimization task. Let us try to shape it. It is important to outline that there might be multiple options and we shall be guided by the business task and the context of the development process in the first place.
Let us start with the simple scenario and then we will show the potential points of solution development. The simplest task is to optimize profitability of the deal.
Let us assume that the amount of the loan agreement equal to L (limit). This agreement has the forecasted probability of default PD. The customer’s debt amounts to L at the moment of default.
Then the optimization task is as follows:

We can see that PD is fixed and has linear dependence on L. One can argue that there is nothing to optimize.
However, in real-life,  PD depends on L because of the following considerations: the higher  limit, the harder it is to service the debt and, therefore, the higher the default probability. In this particular case, indeed, we have an optimization task. However, there are some intricacies. The sample includes customers with different income, thus, it is not enough to take only absolute values. It is better to build dependencies based not on the limit, but on the level of debt load,i.e. parameter Income of the customer (Ic) to 

Dependence PD (L / Ic) can be learned from historic or pilot data.
The optimization task can be also affected by product stops. For instance, an acceptable level of risk or probability of default might be set in the product data sheet. Then, optimization is made within the preset limits.

For more information you can google the following keywords: risk-based limit, credit-limit management profit-based approach.

The money has been offered and the loans have been granted to customers. Some of the loans are overdue. How can we manage this issue? We make cash reserves as a safety bag.

Reserves and DS Role in Reserves Calculation

Risk definition is the key business of any bank. Bank decides whether it is ready to work with the customer or not depending on risk appetite. In any case, bank must form appropriate reserves out of cash or liquid securities to mitigate possible losses. In the worst-case scenario, the bank will lose the entire portfolio, although it is scarcely probable, therefore it is not efficient to have full reserves. Minimum balance is required.

First, it is required to define the amount of money to be reserved. Thus, we have a task to provision the required capital to cover expected loss (EL).

Historic Note:
Losses in rubles is the product of three components:
  1. Probability of Default (PD)
  2. Exposure at Default (EAD) is the amount of borrower’s debt at the moment of default
  3. Loss Given Default (LGD) is the share of this amount that remains unpaid

We will meet the above formula further in the series of articles, since it is a refrain of reservation problem in credit risk management.
Upon such decomposition of expected loss (EL or ECL), it is possible to model each of the above variables, namely PD via binary classification model, LGD via regression model, EAD via regression model. Thus, at the various modeling stages (calibration and validation) it is possible to use data science methods and machine learning algorithms. Hello DS & ML!

For those of you who like challenging tasks:

Specifications and manuals are left behind and the books are read. So, where is DS? As we have promised, DS is in detail components, but it is a different story! The specific features of modelling of PD, LGD and EAD components will be covered in detail in the next article of the series. At the end of the introductory article let us give the table with statistic methods and machine learning application options broken down by risk management tasks.


Upon writing the introductory article, we, the authors have come to the following conclusion: it is very difficult to tell in brief even about three tasks arising from credit risk calculation. Why?

There is thoroughly worked-out methodology for these tasks, that reinforces ML & DS application ideas. These ideas develop approaches to give relevant answer to market challenges that become more complicated. The instruments, that are based on such approaches, evolve from complementary techniques into vital decision-making tools. All these factors combined let us transfer the best practices and insights of risk modelling to other industries, such as telecommunications, insurance and industry. The details coming up soon in the next articles of the series.


  • Default is failure to meet obligations under the loan agreement. Usually, 90 days past due is considered as the default.
  • PD stands for probability of default.
  • EAD stands for exposure at default. The amount of credit obligations under agreement as of the date of default. In fact, it is the account balance at the date of the default, where the balance = the principal + accrued interest + accrued fees.
  • LGD stands forloss given default. It is the share of EAD, that the customer cannot return on the recovery horizon.
  • EL stands for expected loss under agreement.
  • EСL stands for expected credit loss, i.e. expected loss under agreement during its entire lifetime.
  • Underwriter is an officer responsible for risk assessment and final decision on credit application.
  • Stop factor is the restriction that prevents banks from providing a credit product to customers.
  • SCF stands for supply chain finance. It is the system of interaction between supplier and its counterparties.
  • RWA stands for risk-weighted assets and is used to assess capital adequacy.
  • IRB stands for internal rating-based approach to credit risk of the bank used to assess regulatory capital adequacy ratio, which is based on the internal ratings of the borrowers, i.e. the ratings set by the bank.
  • IFRS 9 stands for international financial reporting standard, that alongside with other provisions also implies assessment of expected credit loss adjusted for the lifetime of the agreement and stages of impairment.
  • VaR stands for Value at Risk. It is the amount that with specified probability will not be exceeded by the losses within the certain period of time.