09 August 2013

Default Risk Modeling: Little Experiment (Part 2: Modeling – Start)

In Part 1 of this series I started a little experiment of modeling U.S. corporate default rates with the discussion of various default risk indicators and what economic sense they make; read it first if you haven’t done so yet. This time I’m going to start with the modeling. Who am I kidding? It will not possibly be a serious statistical model as I only use 22 data points (years 1991-2012). Still, there are some interesting insights for the time being.

(Warning: this piece is not for those who hate subjects like math, statistics and econometrics. Yet if you want to be in finance, better get at least a basic understanding of those.)


The variables in the following tables and graphs are named as follows:
  • DF – result variable, i.e. the U.S. corporate default rate that we are aiming to model;
  • DF_t.1 – the U.S. corporate default rate a year before; time-lagged result variable as a potential predictor of the future; explanatory variable;
  • Credit_standards_t.1_4Q – change in credit standards for the large and middle-market firms during the past three months as reported by the banks most recently before the start of the modeled year (for modeling purposes: in December of the last year); explanatory variable;
  • Real_GDP_growth_Forecast – GDP forecast for the modeled year; explanatory variable;
  • VIX_t.1_Dec – fear index VIX representing market's expectation of stock market volatility, data as of last month before the start of the modeled year (for modeling purposes: in December of the last year); explanatory variable;
  • Corp_Bond_Spread_10y_t.1_Dec – corporate bond spread (Moody’s yield on seasoned corporate bonds – all industries, Baa minus Market yield on U.S. Treasury securities at 10-year constant maturity) in the last month before the start of the modeled year (for modeling purposes: in December of the last year); explanatory variable;
  • PMI_t.1_Dec – Purchasing Managers’ Index (PMI) for the U.S. in the last month before the start of the modeled year (for modeling purposes: in December of the last year).
I’ll consider all of the selected explanatory variables for the inclusion in the model, but sure: not all of them will end up there. Importantly: note that for forecasting the default rates for the coming year, we can only use the data and information that we have at hand today; thus, all explanatory variables include a proper time lag.

Criteria for the model

We want our model:
  • To have reasonable predictive power, i.e. even if simplification of the reality, a model should provide us more valuable information than a naïve assumption such as default rates remaining constant at long-term average level.
  • To be as simple as possible (but not simpler). Often complex models are assumed to perform better; however at the end a model should be understandable also for business users. At the same time we’d need to include all the important information for making reasonable predictions.
  • Make sense by design, i.e. be in line with the theory and assumptions; for example: we don’t want to see counterintuitive relations between the explanatory variables and the modeled variable, and we don’t want to see variables that don’t add any value to the model.
  • To be robust – I mean not too sensitive to any particular input variable measuring of which involves a degree of subjectivity (such as GDP forecast or credit standards reported by the banks).

There are several theoretical requirements to the econometric models. Besides a model’s explanatory power and its input variables being statistically significant, one has to look at the model residuals. “Residuals” refer to the forecast errors, the differences between the actual default rates and the model calculated (predicted or forecasted) values. Here is a quick checklist of criteria for a model’s residuals:
  • normally distributed…
  • … with zero mean…
  • … and constant dispersion,…
  • … not correlating with model input variables…
  •  … and not correlating with each other at different points in time.
There are also a number of formal tests and quantitative methods for identifying various model issues, usually named by someone’s name. I’m taking a look to some of those indicators but mostly rely on the graphical/visual methods and sort of layman’s calculations. For one thing, the sample is very small and the modeled phenomenon rather specific meaning that expert judgment is probably the best judgment.

Methodical issues

(In fact, one should think about those even before starting with the data collection.)

The key question is: are we dealing with a case of time series analysis or alternatively, are the sequential default rates independent measurements that just happen to be made over time? There are a whole lot of fancy methods for the time series modeling; perhaps we should choose something from this toolbox?

For answering, one might take a look to the bigger picture of the corporate default rates. This is a graph that I found: global speculative-grade default rates from 1920-2012 by Moody’s Investors Service, see below. (Source: “Annual Default Study: Corporate Default and Recovery Rates, 1920-2012”)

We can say that it looks like a time series. After all, there exist widely accepted theories about credit cycles. However, we don’t see regular cycles; thus, theorists might call it sort of …hmm, cyclestationary process where cycles are irregular / stochastic …or look at the default rate distributions as driven by an unobserved Markov chain, interpreted as the “credit cycle”.

Interestingly though, since 1980s we can point to certain regularities; perhaps there is some rule of credit cycles becoming shorter and shorter? I don’t know about that but I do know that there is a reason and that this reason has something to do with the ever increasing non-financial debt…

Anyways, there do not seem to be definite autocorrelations in the modeled time series of the U.S. corporate default rates since 1991. In other words, successive data points are not correlated except the first one:

Hence, I’m treating the default rates as more or less (ok, rather a bit less) independent measurements that happen to be made over time and can be explained by a set of explanatory variables. Relationships between the successive data points should be sufficiently captured by the variable labeled DF_t.1.

Having made this decision and collected the data, I have to choose the functional form of my model.

As the model is ought to be simple, I at once dismiss all the interaction terms even if they might add some to the model’s in-sample performance. (I don’t think that they would – rather it would be a compromise between the model’s sensitivity and robustness.) In other words, I assume that the simultaneous influence of the model input variables to the default rate is additive. You know: I’m talking about sort of Y = X1+X2+X3 function rather than Y = X1+X2*X3.

As the most basic form of modeling, we might think of the linear regression and the ordinary least squares (OLS) as the method for estimating the unknown parameters in a linear regression model. That’s what the Excel does when you choose Data -> Data Analysis -> Regression. The resulting model would look something like this:


(Y is the default rate that we are modeling; X1, X2 etc. are the model input variables; β0 is the model intercept; β1, β2 etc. are the model parameters that define relationships between the model input variables and the result variable.)

But when you think about the definition of the default rate – percentage of non-defaulted companies at point t0 (e.g. 31 Dec 2012) that defaults during the year (e.g. in 2013) – then you realize that we have a little theoretical problem here. Namely, by definition, default rates can neither be below 0% (no one defaults) nor above 100% (everyone defaults). Yet a linear model can by design predict a counterintuitive default rate of -0.6%, for example.

One way for dealing with this little theoretical problem is to ignore it in the modeling process, and only later replace all negative default rate forecasts with zero and all above 100% forecasts with 100%. The need for doing so could reasonably be assumed fairly exceptional.

Another way is to use a different functional form for the model that would already take care of the issue. It is an industry standard to assume that default probabilities are described by the logistic function. An observed default rate can be thought of as an average realized probability of default; thus, we might use the logistic regression instead of the linear regression. In that case, the default rates would be modeled as follows:

Y=1/ (1+EXP(-Z))

(EXP means exponent; Z is the same equation that was described above: β0+β1*X1+β2*X2+…

One disadvantage of the logistic regression is that the meaning of model parameters and the importance of individual variables are less obvious than in linear regression; explaining the model to non-quantitative people most probably proves quite a challenge to anyone trying to do it. Try to comprehend an explanation like this:
“For every one unit increase in variable X1, the odds of defaulting versus not defaulting would increase by EXP(β1) times.”  
Most probably you’d need a calculator. You’d also need to know the difference between the probability of default and the odds of defaulting. (Later is the division of two probabilities: probability of default and probability of non-default, i.e. probability of survival.)

I have seen both – linear regression and logistic regression – being used for modeling the default rates. For the sake of simplicity, I’ll stay to the linear regression even if knowing its limitations at the extremes. There are, however, reasons why banks may prefer logistic regression, such as possibility to smoothly include macro variables into the statistical PD-models that rank-order borrowers based on their idiosyncratic characteristics.

Exploring correlations

Before “running” the model calculation algorithms, one ought to explore a bit pair-wise correlations within the data. Here we go: the numerical correlation matrix followed by the visual representation. That gives us plenty to look at, doesn't it?

 (Click to enlarge)

The good news is that several relationships between the explanatory variables and the default rate appear to be strong (see the first column). I’d specifically point out credit standards and VIX as indicators of the next year’s default risk. Purchasing Managers’ Index (PMI) also seems pretty good. Some others are so-so. The scatterplot matrix reveals that linear model should be pretty ok.

At the same time …OMG: all the explanatory variables are this strongly correlated with each other that we are bound to have troubles with the modeling issue called multicollinearity. Usual ways of manipulating the data by de-trending, seasonal adjustments, taking differences or logarithms etc. do not help: we don’t have a definite trend or seasonality to get rid of.

Leaving aside modeling-related considerations, one ought to understand the economics behind: why virtually anything is correlated with everything? Is it a pure chance or causality? My short answer is that it is a causality rooted in the money creation principles and in the way how we measure things. At the same time casual relationships are not exactly functional as subjective human factor is involved in responses to the monetary and economic stimuli. Reactions to the various incentives in turn lead to the feedback loops, vigorous or vicious depending on the current standing of the economy.

For an illustration, this is how (credit) booms tend to end:
The central bank (the Fed in the U.S.) hikes interest rates when seeing one or another part of the economy overheating, were it tech sector or housing market or whatever. Then it hikes rates once more and then again as nothing happens at first; it takes time until market participants react – a year or two or even longer. But as money supply gradually tightens and base interest rates rise, defaults among weakest borrowers pick up, and investors and creditors become more and more careful. This leads to increased volatility on the financial markets, tightening credit standards etc. When noting signs of cooling, central banks stop tightening monetary policy, but often it’s too late already… Downward spiral has been triggered: tightened credit standards lead to declining economy and result in more and more defaults; banks will find themselves facing capital constraints, and restrict credit availability even further. (If everyone suddenly becomes aware of the bubble that had been built up in boom years, Great Depressions and Great Recessions can happen.) Apparently, private consumption declines, PMI goes down, unemployment rises and all the bad effects of a financial and economic crisis can be observed. Then there is no way but to ease monetary policy – and ease a lot. It is being assumed or hoped that at some point the vicious cycle becomes to an end and the vigorous cycle can be re-started.

There are at least two implications to the default risk modeling:
  • Inclusion of correlated variables may be justified, at least in certain extent. There is logic behind various indictors reinforcing each other – they will do so also in the future.
  • Wait a minute! I probably have forgotten an important variable from the list of variables to be considered for the model: Federal Funds Rate.
Adding missing variable: the Federal Funds Rate

How to exactly include the Federal Funds Rate into a default risk model deserved a topic of its own. Indeed, if we simply took the Fed’s rates and tried to relate them with the statistics of defaults, we’d get nonsense results (a la: the higher the rate the lower the default risk). So there is no other way but to look closer into the data.

Federal Reserve Bank of St. Louis is kind enough to provide us – among many other nice graphs – with a figure depicting Effective Federal Funds Rate since 1954 with shaded areas indicating U.S. recessions on the background:

At around late 1970s / early 1980s we can see a decisive trend change in the rates that resulted from the Volcker’s radical steps to break the back of inflation in 1970s. Ever since, the Fed is at least pursuing stable prices. Yet when non-financial debt burden is high – which it is – and high inflation is to be avoided, there is no way but interest rates trending downward. This is an unspoken feature of the design of our monetary system. (You may recall my first posts under the label “Basics”. Note that with some corrections in crisis times, over time the weakest borrowers have become weaker and weaker as an inevitable side effect of the build-up of non-financial debt burden – even if this isn’t reflected in an upward trend in default rates given long-term decline in interest rates.) See it in the data: in each recovery beginning from early 1980s, the new peak in Fed funds rate is lower and lower.

So the thing for us to do is to find the trend line. My logic is that rates staying above the trend long enough are leading to more defaults down the road because of triggering the vicious cycle. I’m not implying that Fed is deliberately causing crises; hikes in rates are rather a reaction to something having “boiled over” here or there; I use Fed funds rate as I don’t know any better more or less universal indicator for identifying booms that are likely to become busts soon. Anyways, the following figure provides an illustration. 

(Click to enlarge)

There have been three periods when default rates exceeded the calculated average of 1.88%: 1990-1991, 1999-2002 and 2008-2009. As expected, each of them has been preceded by a period of above trend base rates. Even though there is a variation, on average it has taken two consecutive years of above trend Fed funds rate before the default rates pick up significantly, and three years before they peak; despite of the Fed lowering interest rates, defaults remain to elevated levels for one more year or so from inertia.

Based on these insights, I create a qualitative variable with the name FedFundsRate_Q which takes the following values:
 “2” if Fed funds rate in the beginning of the year had been above the trend line for at least three preceding years (it doesn’t matter if rates are above or below trend in the current year);
“1” if Fed funds rate in the beginning of the year had been above the trend line for the two preceding years, or it is the year immediately following a year of level “2” given that rates are now below trend;
“0” otherwise.
The next figure, the scatter plot of the newly created qualitative variable and the U.S. corporate default rate over my observation period (1991-2012) shows what I got:

Apparently, we have found a relation between the Fed funds rate and the U.S. corporate default rate.

As a remark: we are witnessing something very odd about the current state of affairs. Namely, the trend line for the base interest rate has gone below zero since the beginning of 2012! This means that even if Mr. Bernanke and/or his successor(s) will keep base rate at its current near zero level (let alone rate hikes), it would be too high for preventing another credit crunch at around 2015 unless Fed literally continues buying junk loans and -bonds, and/or officially abandons the idea of stable prices all together....

The modeling exercise will continue in Part 3

Making otherwise proprietary financial expertise available to those who bother to pay attention – as best I can…

No comments:

Post a Comment