Value-at-risk and GARCH

Christian Groll

Seminar für Finanzökonometrie, Ludwig-Maximilians-Universität M�nchen.

All rights reserved.

Contents

Although the previous parts of the script shall not be understood as rigorous scientific investigation of the topics treated, they nevertheless demonstrate some first indications that simple regression models or curve fitting techniques do not provide the desired informational gains in financial applications. The deterministic and predictable part of stock prices is usually too small, compared with the large uncertainty that is involved. Hence, stock market prices are commonly perceived as stochastic, so that they are best described in the framework of probability theory. This part of the course shall introduce some of the basic concepts used to capture the stochasticity in stock prices.

Required functions

garchestimation

Monte Carlo simulation

As part of the standard MATLAB program the function rand() enables simulation of uniformly distributed pseudo random variables. This is one of the most important tools for financial econometrics. However, the key to understanding this importance is to understand that these uniformly distributed simulated values can be transformed to samples of any given distribution function by a simple transformation.

% init params
n = 1000;     % sample size
nBins = 20;   % number of points per bin

% simulate from uniform distribution
simU = rand(n, 1);

% check appropriateness
hist(simU, nBins)

% calculate expected number of bins
expNum = n/nBins;

% include horizontal line at expected number of points per bin
line([0 1], expNum*[1 1], 'Color', 'r')
title(['sample size: ' num2str(n) ' / expected points per bin: '...
    num2str(expNum)])

In order to simulate random numbers of different distributions, one has to make use of the fact that the inverse cumulative distribution function F^(-1), applied to a uniformly distributed random variable simU, generates a random vector with cumulative distribution equal to F. This theorem is called inverse probability integral transformation (PIT), and can be best understood by looking at a graphical visualization. Here, we use the normal distribution as an example.

% init params
n2 = 10;    % sample size
mu = 4;     % params distribution
sigma = 1;

% show first n2 points of simU on y-axis
close
scatter(zeros(n2, 1), simU(1:n2), 'r.')
hold on;

% plot cumulative distribution function
grid = 0:0.01:8;
vals = normcdf(grid, mu, sigma);
plot(grid, vals)

% scatter resulting values after inverse PIT
X = norminv(simU, mu, sigma);
scatter(X(1:n2), zeros(n2, 1), 'r.')

% plot lines between points to indicate relation
for ii=1:n2
    % horizontal line, beginning in uniform sample point
    line([0 X(ii)], [simU(ii) simU(ii)], 'Color', 'r',...
       'LineStyle', ':')

    % vertical line, beginning in new point, up to function
    line([X(ii) X(ii)], [0 simU(ii)], 'Color', 'r', 'LineStyle', ':')
end
title('Inverse probability integral transformation')

As can be seen, points in the middle of the [0 1] interval encounter a very steep slope of the cumulative distribution function, so that the distances between those points are effectively reduced. Hence, the associated area on the x-axis will exhibit an accumulation of sample points. In contrast to that, points in the outer quantiles of the distribution will encounter a very shallow slope, and hence will be taken further apart of each other.

In order to show that this distortion happens exactly in the required way to produce sample points of the predetermined distribution, one can compare the relative frequencies of the sample points with the probability density function of the targeted distribution.

figure('position', [50 50 1200 600])

% plot histogram, normalized to entail area of one
subplot(1, 2, 1);
[counterU locationsU] = hist(simU, nBins);  % count points in bins
width = diff(locationsU(1:2));          % get bin width
bar(locationsU, counterU/(n*width), 1);   % Rescale bins to sum up
        % to area 1. Without adjustment, area is number of points
        % time bin width.
hold on;

% include underlying uniform distribution
plot([0 1], [1 1], 'r', 'LineWidth', 2)
title('relative frequencies of uniform distribution')

% plot histogram of transformed sample points, normalized
subplot(1, 2, 2);
[counterX locationsX] = hist(X, nBins);
width = diff(locationsX(1:2));
bar(locationsX, counterX/(n*width), 1);
hold on;

% include underlying normal distribution
grid = mu-4:0.1:mu+4;
plot(grid, normpdf(grid, mu, sigma), 'r', 'LineWidth', 2);
title('relative frequencies of normal distribution')

Hence, given any distribution function, we now are capable of simulating samples of it. However, in practice we usually encounter a different problem : we observe some data, but do not know the underlying distribution function, which has to be estimated based on the data.

We will now implement code to estimate the parameters of a given distribution function based on maximum likelihood. In order to get an impression of how good the estimation step performs, we will adapt it to simulated data with known underlying distribution first.

The likelihood of a given sample point is defined as the value of the probability density function. Furthermore, the likelihood of a given sample with more than one point is defined as the product of the individual likelihoods.

% init params example
n = 1000;   % number sample points
mu = 2;
sigma = 1;
points = normrnd(mu, sigma, n, 1);    % simulate data points

% likelihood first point only
lh1 = (sigma^2*2*pi)^(-0.5)*exp((-0.5)*(points(1)-mu)^2/sigma^2);
fprintf(['\nThe likelihood of the first point is given by '...
    num2str(lh1) '.\n'])
fprintf(['It also can be calculated using an '...
    'existing MATLAB function:\n'...
    'normpdf(points(1), mu, sigma) = ' ...
    num2str(normpdf(points(1), mu, sigma))...
    '.\n'])
The likelihood of the first point is given by 0.34788.
It also can be calculated using an existing MATLAB function:
normpdf(points(1), mu, sigma) = 0.34788.
% calculate lh for all individual points
lhs = normpdf(points, mu, sigma);

% multiply first k points for each k up to 10
cumulatedLikelihoods = cumprod(lhs(1:10))'
cumulatedLikelihoods =

  Columns 1 through 7

    0.3479    0.0805    0.0145    0.0031    0.0013    0.0002    0.0001

  Columns 8 through 10

    0.0000    0.0000    0.0000

As can be seen, through multiplication the absolute value of the overall likelihood becomes vanishingly small very fast. This imposes numerical problems to the process of calculation, since MATLAB only uses a finite number of decimal places. Hence, parameter estimation is always done by maximization of the log-likelihood function. Note that multiplication of individual likelihoods becomes summation under the transformation to log-likelihoods.

% calculate log-likelihood for sample points
llh = -0.5*log(sigma^2*2*pi)-0.5*(points-mu).^2/sigma^2;

% show cumulate sum
cumulatedSum = cumsum(llh(1:10))'
cumulatedSum =

  Columns 1 through 7

   -1.0559   -2.5198   -4.2323   -5.7628   -6.6836   -8.3598   -9.4273

  Columns 8 through 10

  -11.3262  -12.9851  -16.6507

As can be seen, using log-likelihood does not cause the same numerical problems. Now, let's create an anonymous function that takes the parameters of the normal distribution and the sample points as input, and calculates the negative log-likelihood. This function subsequently can be used in the optimization routine.

% anonymous function
nllh = @(params, data)sum(0.5*log(params(2)^2*2*pi)+...
    0.5*(data-params(1)).^2/params(2)^2); % note: in order to
            % fulfill the requirements of the subsequent
            % optimization, the argument that shall be optimized
            % has to be given first, with all parameters
            % contained in one vector!

% check correctness with existing MATLAB function
negLogLikelihoods = ...
    [nllh([mu sigma], points) normlike([mu sigma], points)]
negLogLikelihoods =

   1.0e+03 *

    1.4363    1.4363

Obviously, both values are identical.

Since the parameter sigma of the normal distribution may not take on negative values, we are confronted with a constrained optimization problem. Hence, we have to use the function fmincon.

% simulate random points from normal distribution
points = normrnd(mu, sigma, 1000, 1);

% standard normal as first guess
params0 = [0 1];

% specify bounds on parameters
lb = [-inf 0.1];
ub = [inf inf];

% set options for optimization
opt = optimset('algorithm', 'interior-point', 'TolX', 1e-14);

% three ways to perform optimization: using anonymous function
paramsHat1 = ...
   fmincon(@(params)nllh(params, points), params0, ...
   [], [], [], [], lb, ub, [], opt);

% different way, using anonymous function
paramsHat2 =...
    fmincon(nllh, params0, ...
    [], [], [], [], lb, ub, [], opt, points);

% using existing MATLAB function
paramsHat3 =...
    fmincon(@(params)normlike(params, points), params0, ...
    [], [], [], [], lb, ub, [], opt);

% showing wrong syntax using existing function
try
    paramsHat4 =...
        fmincon(normlike, params0, ...
        [], [], [], [], lb, ub, [], opt, points);
catch err
    fprintf(['MATLAB displays the following error:\n"'...
        err.message '"\n']);
end
Local minimum possible. Constraints satisfied.

fmincon stopped because the size of the current step is less than
the selected value of the step size tolerance and constraints are 
satisfied to within the default value of the constraint tolerance.




Local minimum possible. Constraints satisfied.

fmincon stopped because the size of the current step is less than
the selected value of the step size tolerance and constraints are 
satisfied to within the default value of the constraint tolerance.




Local minimum possible. Constraints satisfied.

fmincon stopped because the size of the current step is less than
the selected value of the step size tolerance and constraints are 
satisfied to within the default value of the constraint tolerance.



MATLAB displays the following error:
"Requires two input arguments"

While the first three optimization calls stop regularly, the fourth syntax produces an error as predicted. Comparison of the estimated parameters of the successful optimizations show equally good results for all three.

% compare results of different optimizations with real values
compareToRealParams = [mu sigma; paramsHat1; ...
    paramsHat2; paramsHat3]
compareToRealParams =

    2.0000    1.0000
    1.9978    0.9765
    1.9978    0.9765
    1.9978    0.9765

Now that we successfully estimated the underlying normal distribution, we want to try the same estimation procedure for a Student's t-distribution.

% init params
n = 1000;
nu = 4;

% simulate values per inverse PIT
X = tinv(rand(n, 1), nu);

% determine individual likelihoods for vector of observations
lh = @(nu, X)gamma((nu+1)/2)*(nu*pi)^(-0.5)/gamma(nu/2)...
    *(1+X.^2/nu).^(-(nu+1)/2);

Note: function lh returns a vector itself. See for example:

outputDimension = size(lh(4, X))
outputDimension =

        1000           1

Define function nllh_t that calculates the negative overall log-likelihood based on the output of function lh.

% taking log of each entry and summing up
nllhT = @(nu, X)-sum(log(lh(nu, X)));

Specify optimization settings and conduct optimization.

% init guess
param0 = 10;

% boundary constraints
lb = 0.1;
ub = inf;

% optimization
paramHat = fmincon(nllhT, param0, [], [], [], [], ...
    lb, ub, [], [], X);
Warning: To use the default trust-region-reflective algorithm you must supply
the gradient in the objective function and set the GradObj option to 'on'.
FMINCON will use the active-set algorithm instead. For information on applicable
algorithms, see Choosing the Algorithm in the documentation. 

Local minimum possible. Constraints satisfied.

fmincon stopped because the predicted change in the objective function
is less than the default value of the function tolerance and constraints 
are satisfied to within the default value of the constraint tolerance.



No active inequalities.

Note that MATLAB displays a warning message, indicating that the currently used searching algorithm might not be accurate. You can switch to any of the suggested algorithms by changing the respective property of the optimization options structre via optimset(). Nevertheless, the results obtained are not too bad after all.

% compare with real parameter
compareToRealParam = [nu paramHat]
compareToRealParam =

    4.0000    3.8345

Application to real data

Now we want to examine whether observed returns in the real world approximately follow a normal distribution. As first example we will chose the German stock market index for our investigation.

% specify ticker symbol as string variable
tickSym = '^GDAXI';     % specify stock data of interest

% specify beginning and ending as string variables
dateBeg = '01011990';       %  day, month, year: ddmmyyyy
dateEnd = datestr(today, 'ddmmyyyy');  % today as last date

% load data
daxCrude = hist_stock_data(dateBeg, dateEnd, tickSym);

% process data
[DAX.dates DAX.prices] = processData(daxCrude);

% calculate percentage log returns
DAX.logRet = 100*(log(DAX.prices(2:end))-log(DAX.prices(1:end-1)));

Fitting parameters of normal distribution to data.

% init bounds
lb = [-inf 0.001];
ub = [inf inf];

% optimization settings
opt = optimset('display', 'off', 'algorithm', 'sqp');

% estimate normal distribution parameters
paramsHat = ...
   fmincon(nllh, params0, ...
   [], [], [], [], lb, ub, [], opt, DAX.logRet);

% compare parameters with estimation from existing function
[muHat sigmaHat] = normfit(DAX.logRet);
paramComparison = [paramsHat; muHat sigmaHat]
paramComparison =

    0.0306    1.4536
    0.0306    1.4537

In order to examine the goodness-of-fit of the estimated normal distribution, some visualizations can be helpful. First, a normalized histogram of the data shall be compared to the probability density function of the fitted distribution. Moreover, as the largest financial risks are located in the tails of the distribution, the figure also shall show a more amplified view on the lower tail.

% show normalized histogram with density of normal distribution
[counterX locationsX] = hist(DAX.logRet, 50);
width = diff(locationsX(1:2));

% show bar diagram in first subplot
figure('position', [50 50 1200 600])
subplot(1, 2, 1);
bar(locationsX, counterX/(numel(DAX.logRet)*width), 1);
hold on;

% init grid
grid =...
    paramsHat(1)-5*paramsHat(2):0.01:paramsHat(1)+5*paramsHat(2);

% include underlying normal distribution
plot(grid, normpdf(grid, paramsHat(1), paramsHat(2)), ...
    'r', 'LineWidth', 2);
title('relative frequencies with normal distribution')

% show lower tail region amplified
subplot(1, 2, 2);
hold on;
bar(locationsX, counterX/(numel(DAX.logRet)*width), 1);
plot(grid, normpdf(grid, paramsHat(1), paramsHat(2)), ...
    'r', 'LineWidth', 2);
axis([paramsHat(1)-7*paramsHat(2) paramsHat(1)-2*paramsHat(2)...
    0 0.02])
title('lower tail')
hold off

As the amplification of the comparison of the lower tails shows, the relative frequency of extremely negative realizations is much larger than what would be suggested by the normal distribution. This indicates that the shape of the normal distribution is not able to provide a good fit to the realizations of stock returns observed. Another way to visualize the deviations between normal distribution modeling and reality is the QQ-plot.

In order to compare the quantiles of the empirical returns with the quantiles of a given distribution function using MATLAB's qq-plot function, we need to specify the respective distribution by handing over simulated sample points with given distribution.

close   % close previous figure
figure('position', [50 50 1200 600])
subplot(1, 2, 1)

% generate sample points of given distribution
sample = normrnd(paramsHat(1), paramsHat(2), numel(DAX.logRet), 1);

% qq-plot: empirical returns vs. estimated normal distribution
qqplot(DAX.logRet, sample);
title('built-in qqplot')
xLimits = get(gca, 'xLim');
yLimits = get(gca, 'yLim');

To avoid the imprecision caused by the determination of the function via sample values, and to get a more detailed understanding of the concept of qq-plots, we now want to reproduce the shown graphic with own code. Thereby the main idea of a qq-plot is to plot quantiles corresponding to an equidistant grid of probabilities against each other. While quantiles of the normal distribution can be calculated, quantiles refering to the distribution of the observed data points are approximated with the respective empirically estimated values.

% arrange entries in chronological order
sorted = sort(DAX.logRet);

% init grid of equidistant probabilities
associatedCDFvalues = ((1:numel(sorted))/(numel(sorted)+1));

% associated ecdf values increase with fixed step size
format rat
associatedCDFvalues(1:4)
format short
ans =

       1/5761         2/5761         3/5761         4/5761  

% get associated normal quantile values
convertToNormalQuantiles = ...
    norminv(associatedCDFvalues, paramsHat(1), paramsHat(2));

% generate qq-plot
subplot(1, 2, 2)
scatter(sorted, convertToNormalQuantiles, '+'); shg
xlabel('empirical quantiles')
ylabel('quantiles normal distribution')
set(gca, 'xLim', xLimits, 'yLim', yLimits);
title('manually programmed qq-plot')

% include straight line to indicate matching quantiles
line(xLimits, xLimits, 'LineStyle', '-.', 'Color', 'r')

Again, the qq-plot shows that the empirical quantiles in the lower tail are lower than suggested by the normal distribution. This is a commonly observable pattern of financial data, and it is refered to as fat tails.

VaR

With these insights in mind, we now want to estimate the value-at-risk for the German stock index. The value-at-risk associated with a given asset is nothing else than a lower quantile of the distribution of returns. Hence, a value-at-risk of confidence level 95% denotes nothing else than a return level that is fallen short of at maximum 5% of the times. This value will be estimated on three different ways.

% init different value-at-risk confidence levels
quants = [0.005 0.01 0.05];

% estimate VaR based on fitted values of normal distribution
varNorm = norminv(quants, paramsHat(1), paramsHat(2));

Estimate DoF parameter for Student's t-distribution.

% determine individual likelihoods for a points vector
lh = @(nu, X)gamma((nu+1)/2)*(nu*pi)^(-0.5)/gamma(nu/2)...
    *(1+X.^2/nu).^(-(nu+1)/2);

% define function taking log of each entry and summing up
nllhT = @(nu, X)-sum(log(lh(nu, X)));

% init guess
param0 = 10;

% boundary constraints
lb = 0.1;
ub = inf;

% set options for optimization
opt = optimset('algorithm', 'sqp');

% optimization
paramHatT = fmincon(nllhT, param0, [], [], [], [], ...
    lb, ub, [], opt, DAX.logRet);
Local minimum possible. Constraints satisfied.

fmincon stopped because the size of the current step is less than
the default value of the step size tolerance and constraints are 
satisfied to within the default value of the constraint tolerance.



Associated value-at-risk estimations are given as the quantiles of the fitted Student's t-distribution and can be calculated by inversion.

% apply inverse cumulative distribution function
varT = icdf('t', quants, paramHatT);

As third alternative, the value-at-risk can be estimated directly based on the observed returns as empirical quantiles.

% estimate VaR based on empirical quantiles
varEmp = quantile(DAX.logRet, quants);

% compare different VaR estimations
compareVaRs = [quants' varNorm' varT' varEmp']

% display labels of columns below
fprintf('\nConfLevels Norm Distr T Distr   Emp Quants\n')
compareVaRs =

    0.0050   -3.7135   -5.0768   -5.2577
    0.0100   -3.3509   -4.0550   -4.4108
    0.0500   -2.3603   -2.2208   -2.3123


ConfLevels Norm Distr T Distr   Emp Quants

As can be seen, the estimated values of VaR are quite different for different estimators. In contrast to the normal distribution, the Student's t-distribution assigns more probability to tail regions, which results in more extreme quantile values. However, this result does not yet automatically imply that a Student's t-distribution is the more appropriate distribution. In order to check both distributions for their goodness, we have to compare them in a backtesting procedure.

Backtesting

For reasons of space we want to conduct the visual analysis only for one level of confidence. However, figures of the other confidence levels easily can be accessed through changing the determining index variable lev.

% init confidence level as index of variable quants
lev = 2;

figure('position', [50 50 1200 600]);

% show exceedances for normal distribution
subplot(1, 2, 1);

% indicate exceedances with logical vector
exceed = DAX.logRet <= varNorm(lev);

% show exceedances in red
scatter(DAX.dates([logical(0); exceed]), DAX.logRet(exceed), '.r')
        % first date entry left out since DAX.dates refers to
        % longer time series of prices
hold on;

% show non-exceedances in blue
scatter(DAX.dates([logical(0); ~exceed]), ...
    DAX.logRet(~exceed), '.b')
datetick 'x'
set(gca, 'xLim', [DAX.dates(2) DAX.dates(end)]);

% include VaR estimation
line([DAX.dates(2) DAX.dates(end)], varNorm(lev)*[1 1], ...
    'Color', 'k')
title(['Exceedance frequency ' ...
    num2str(sum(exceed)/numel(DAX.logRet), 2) ' instead of '...
    num2str(quants(lev))])

% same plot with results based on Student's t-distribution
subplot(1, 2, 2);
exceed2 = DAX.logRet <= varT(lev);
scatter(DAX.dates([logical(0); exceed2]), ...
    DAX.logRet(exceed2), '.r')
hold on;
scatter(DAX.dates([logical(0); ~exceed2]), ...
    DAX.logRet(~exceed2), '.b')
datetick 'x'
set(gca, 'xLim', [DAX.dates(2) DAX.dates(end)]);
line([DAX.dates(2) DAX.dates(end)], varT(lev)*[1 1], ...
    'Color', 'k')
title(['Exceedance frequency ' ...
    num2str(sum(exceed2)/numel(DAX.logRet), 2) ' instead of '...
    num2str(quants(lev))])

For further investigation of the goodness of the respective estimation approaches, all exceedance frequencies are compared to in a table.

% calculate exceedance frequencies for normal distribution
normFrequ = ...
    [sum((DAX.logRet <= varNorm(1))/numel(DAX.logRet));...
    sum((DAX.logRet <= varNorm(2))/numel(DAX.logRet));...
    sum((DAX.logRet <= varNorm(3))/numel(DAX.logRet))];

% calcualte exceedance frequencies for Student's t-distribution
tFrequ = ...
    [sum((DAX.logRet <= varT(1))/numel(DAX.logRet));...
    sum((DAX.logRet <= varT(2))/numel(DAX.logRet));...
    sum((DAX.logRet <= varT(3))/numel(DAX.logRet))];

% calculate exceedance frequencies for empirical quantiles
empFrequ = ...
    [sum((DAX.logRet <= varEmp(1))/numel(DAX.logRet));...
    sum((DAX.logRet <= varEmp(2))/numel(DAX.logRet));...
    sum((DAX.logRet <= varEmp(3))/numel(DAX.logRet))];

% display table
fprintf('\nExceedance frequencies:\n')
fprintf('Norm distr   t distr   Emp distr\n')
for ii=1:numel(quants)
    fprintf('%1.5f     %1.5f     %1.5f\n', ...
        normFrequ(ii), tFrequ(ii), empFrequ(ii));
end
Exceedance frequencies:
Norm distr   t distr   Emp distr
0.01458     0.00642     0.00503
0.01997     0.01215     0.01007
0.04792     0.05538     0.05000

This analysis shows two commonly observable characteristics. First, the exceedance frequencies associated with the empirical estimator are almost exactly the size that we want to observe. However, this should not lead you to erroneously conclude that the empirical estimator is unambiguously the best estimator, since the perfect fit is only achieved in-sample, that is it is achieved only on the data that we used for estimation. This is a classical overfitting problem, since we adjusted the estimated quantiles to exactly match the observed data points in the first place. Examining the out-of-sample properties of the empirical estimator will show that it is not performing equally well when applied to "unseen" data.

And second, while all three estimators seem to approximately match the demanded exceedance frequencies, the occurrences of the exceedances are clustered through time. That is, exceedances are not distributed uniformly over time, but observing an exceedance at one day increases the likelihood of an additional exceedance within the next days. This weakness of all estimators arises from the fact that we dropped all information of the sample with respect to the time dimension, since we treated returns as i.i.d. realizations of a certain random variable. Hence, estimating risk more adequately requires modeling the observations as time series in order to be able to match temporarily evolving high-volatility phases.

Simulation study: empirical quantiles

In order to examine the properties of the empirical quantiles estimator we now will perform a short simulation study. The idea is to sample observations of a given distribution function, apply the empirical quantiles estimator to the sample data, and check whether we are able to approximately retrieve the quantile values of the original underlying distribution.

% init params
reps = 10000;       % number of repetitions
sampSize = 2500;    % approximately 10 years of data
dist = 't';         % underlying distribution
param = 5;          % parameters: dimension depending on distr.
nBins = 30;         % number of bins in histogram plot

% get real quantile values
realQuants = icdf(dist, quants, param);

% preallocate result matrix
estQuants = zeros(reps, numel(quants));  % for each repetition we
            % will get estimated values for each confidence level

% using for-loop, because of limited memory
for ii=1:reps
    % simulate data sample
    simU = rand(sampSize, 1);
    sampData = icdf(dist, simU, param);

    % get empirical quantiles for this repetition
    estQuants(ii, :) = quantile(sampData, quants);
end

% plot results
figure('position', [50 50 1200 600])
for ii=1:numel(quants)
    subplot(1, numel(quants), ii);
    hist(estQuants(:, ii), nBins)

    % include line indicating real quantile value
    yLimits = get(gca, 'yLim');
    line(realQuants(ii)*[1 1], yLimits, 'Color', 'r')

    % calculate mean squared error
    mse = sum((estQuants(:, ii)-realQuants(ii)).^2)/reps;

    title(['Quantile: ' num2str(quants(ii), 2) ',  MSE: ' ...
        num2str(mse, 3)])
end

As indicated by the mean squared error as well as by the different scales of the x-axis, the goodness of the empirical quantile estimator decreases with decreasing quantile values. This is no surprise, since a sample of size 2500 on average only produces 12.5 observations below it's 0.005 quantile. Since the number of expected observations below the quantile value is the relevant size to the estimation procedure, the average performance of the estimator is rather poor.

Autoregressive processes

The main idea of time series analysis is to not consider observations as independent over time, but to admit an effect of past returns to the distribution of todays return. The most common way thereby is to render first and second moments conditional. We will start with time-varying first moments. A very helpful device to detect serial dependence is to estimate the autocorrelation function of the process.

% init params
nLags = 20;     % number of lags

% estimate autocorrelation function
autoCorrCoeff = zeros(1, nLags);
for ii=1:nLags
    % get correlation between days of distance ii
    autoCorrCoeff(ii) = ...
        corr(DAX.logRet(1:end-ii), DAX.logRet(ii+1:end));
end

% plot estimated autocorrelation function
close
figure('position', [50 50 1200 600])
subplot(1, 2, 1)
stem(1:nLags, autoCorrCoeff, 'r.', 'MarkerSize', 12)
set(gca, 'yLim', [-0.2 1]);
set(gca, 'xGrid', 'on', 'yGrid', 'on')

% plot autocorrelation using existing MATLAB function
subplot(1, 2, 2)
autocorr(DAX.logRet)

Though there might be evidence for autocorrelation at lags 3 and 4, there is rather no clear pattern observable. We now will try to replicate the observed autocorrelation function. First, we will investigate the properties of an AR(1) process by simulating a sample path of it.

% init params
a = 0.4;    % autoregression coefficient
n = 10000;   % path length
sigma = 0.8;    % standard deviation residual
y0 = 0;         % starting value

% simulate data
epsilons = normrnd(0, sigma, n, 1);
y = zeros(n, 1);
y(1, 1) = y0;

% because of the recursive nature of autoregressive processes we
% have to use for-loop
for ii=2:n
    y(ii) = a*y(ii-1)+epsilons(ii);
end

% plot path
close
figure('position', [50 50 1200 600])
subplot(1, 2, 1);
plot(y(1:150))
title('Simulated sample path')

% plot associated empirical autocorrelation function
subplot(1, 2, 2);
autocorr(y)

As can be seen, the autoregressive term of lag 1 causes a significant non-zero autocorrelation function in both first two lags. Since we observed a negative third lag and positive fourth lag, we will take a look at a fourth order autoregressive process.

% init params
a3 = -0.1;
a4 = 0.1;
y0 = 0;
y1 = 0;
y2 = 0;
y3 = 0;

% simulate data
epsilons = normrnd(0, sigma, n, 1);
y = zeros(n, 1);
y(1:4, 1) = [y0; y1; y2; y3];

for ii=5:n
    y(ii) = a3*y(ii-3)+a4*y(ii-4)+epsilons(ii);
end

% plot sample path
close
figure('position', [50 50 1200 600])
subplot(1, 2, 1);
plot(y(1:150))
title('simulate sample path')

% plot empirical autocorrelation function
subplot(1, 2, 2);
autocorr(y)

As both graphics indicate, there lies a great flexibility in AR-processes when it comes to replicating first-moment features of real data. However, it is very difficult to distinguish between stochastically evolved artificial patterns and real existing autocorrelated structures.

% Although the empirical autocorrelation function seems to
% indicate that AR-processes could be able to replicate some
% first-moment characteristics of observable financial time
% series, there are some considerable differences in
% the patterns of the second moments. One way to see this is by
% looking at the empirical autocorrelation functions of squared
% returns.

% compare autocorrelation functions of squared returns
close
figure('position', [50 50 1200 600])
subplot(1, 2, 1);
autocorr(DAX.logRet.^2)
title('real data')
subplot(1, 2, 2)
autocorr(y.^2)
title('simulated AR-data')

As can be seen, squared returns exhibit long-lasting and persistent autocorrelations. This result can be interpreted as follows: correlation between squared returns is an indication of correlation between absolute return sizes. That is, if we observe high absolute returns at previous days, there is a high likelihood of a high absolute return for today, too. In other words: if previous returns exhibit high volatilities (high absolute values), today's absolute return is more likely to be high itself, while low volatility of previous days is likely to be followed by a return with small absolute value itself. This observation matches with the volatility clusters observed before.

Fitting AR(2) processes

This is part of the homework.

GARCH(1,1)

Since autoregressive processes are not able to replicate the persistent volatilities observable at financial time series, we need to come up with another model, called GARCH (generalized autoregressive conditional heteroscedasticity). The idea associated with GARCH models is to render second moments conditional. That is, the model shall capture the pattern of financial time series that periods of high volatility are followed more likely by another high absolute return. Hence, the model treats the volatility prevailing at a certain point in time as conditional on the past realizations, with positive relationship between past volatility and conditional current volatility. Today's variance h(t) is modelled as linear function of yesterday's variance h(t-1) and yesterday's squared observation y(t-1).^2.

y(t) = sqrt(h(t))*epsilon(t)
h(t) = k + GARCH*h(t-1) + ARCH*y(t-1)^2

In order to get a first impression about the model, we will first simulate some data with given parameters.

Since conditional distributions depend on yesterday's observation and standard deviation, the first simulated value of our path also needs some specification of the values of the previous days.

% starting values
y0 = 0;
sigma0 = 1;

% init params
sampSize = 10000;       % path length

% GARCH parameters: h(t)=k+GARCH*h(t-1)+ARCH*y(t-1)^2
garch = 0.6;
arch = 0.2;
k = 0.2;

% preallocate y and sigma
y = zeros(sampSize, 1);
sigmas = zeros(sampSize, 1);

% simulate standardized residuals
epsilons = randn(sampSize, 1);

% take first values as given
y(1) = y0;
sigmas(1) = sigma0;

for ii=2:sampSize
    % calculate new sigma
    sigmas(ii) = ...
        sqrt(k+garch*sigmas(ii-1)^2 + arch*y(ii-1)^2);

    % multiply standardized residual with current standard dev.
    y(ii) = epsilons(ii)*sigmas(ii);
end

% visualization of simulated data patterns
close
figure('position', [50 50 1200 800])

% show sample path
subplot(3, 2, 1:2);
plot(y(1:600))
title('sample path')

% show simulated series of conditional standard deviations
subplot(3, 2, 3:4);
plot(sigmas(1:600))
title('conditional standard deviations')

% show autocorrelation functions
subplot(3, 2, 5);
autocorr(y)
title('autocorr returns')
set(gca, 'yLim', [-0.2 0.6]);
subplot(3, 2, 6);
autocorr(y.^2)
set(gca, 'yLim', [-0.2 0.6]);
title('autocorr squared returns')

When looking at the visualization, some points become evident. First, the time series does exhibit volatility clusters of small scale. While there do appear short periods with relatively high standard deviations, the high volatility phases seem to disappear faster than in reality. And second, the model in general seems to be appropriate to generate persistent dependencies in absolute returns, measurable through the autocorrelation function applied to squared returns. Nevertheless, the persistency measured with the autocorrelation also drops faster than for comparable real data.

To break away from the distortions incorporated by arbitrarily chosing the parameters of the model, we now want to fit a GARCH model to empirical data via maximum likelihood. However, in order to get a better understanding about the individual steps of the fitting procedure, we first implement some stand-alone parts for simulated data.

% determine time series
tSeries = y;

% treat initial standard deviation and parameters as given
initSigma = 1;
params = [k garch arch];

The main point here is that once we are given the time series, the parameter values and the initial standard deviation, we are able to reconstruct the series of sigmas.

% preallocate reconstructed sigmas
retrieveSigmas = zeros(numel(tSeries), 1);
retrieveSigmas(1) = initSigma;

% reconstruction
for ii=2:numel(tSeries)
    % current sigmas depend on last sigma and last observation
    % through parameter values of the model
    retrieveSigmas(ii) = sqrt(params(1)+...
        params(2)*retrieveSigmas(ii-1)^2 + ...
        params(3)*tSeries(ii-1)^2);
end

As verification, sigmas of simulated time series can be compared with those reconstructed.

figure('position', [50 50 1200 600])
subplot(2, 1, 1);
plot(sigmas)
title('simulated sigmas')
subplot(2, 1, 2);
plot(retrieveSigmas)
title('retrieved sigmas')

Given the reconstructed series of time-varying sigma values, we now can calculate the log-likelihood of the observed time series. Thereby observations are assumed to be conditionally normally distributed with standard deviation given by sigma.

% calculate negative log-likelihood
nllh = sum(0.5*log(retrieveSigmas.^2*2*pi)+...
    0.5*(tSeries.^2./retrieveSigmas.^2));

Since the series of retrieved sigmas depends on the selection of parameters and initial values, the negative log-likelihood also is influenced by these inputs. Hence, we can come up with a two-step estimation procedure, where in a first step the series of sigmas is retrieved, and in a second step the associated negative log-likelihood is calculated. Bundling both steps in one function, we get an objective function usable for optimization. This function is implemented as garchEstimation(). Now we will perform the estimation of the GARCH parameters.

% init params
params0 = [0.1 0.1 0.1];

% linear parameter restrictions: params(2)+params(3)<1
A = [0 1 1];
b = 1;

% lower and upper bounds
lb = [0.01 0.01 0.01];
ub = [0.99 0.99 0.99];

% init vals
data = y;
initVals = [0 1];

% optimization settings
opt = optimset('algorithm', 'sqp');

% optimization
paramsHat = fmincon(...
    @(params)garchEstimation(params, data, initVals), params0, ...
    A, b, [], [], lb, ub, [], opt);

compareToRealParams = [paramsHat; params]
Local minimum possible. Constraints satisfied.

fmincon stopped because the size of the current step is less than
the default value of the step size tolerance and constraints are 
satisfied to within the default value of the constraint tolerance.




compareToRealParams =

    0.2059    0.6038    0.1894
    0.2000    0.6000    0.2000

Given our estimated values we are again able to extract a series of sigma values. These extracted sigma values give us an indication of whether there have been periods of high volatility in the time series or not. The following cell will perform the same analysis for real stock data.

% specify data
data = DAX.logRet;

% optimization settings
opt = optimset('algorithm', 'sqp');

% optimization
[paramsHat nllhVal] = fmincon(...
    @(params)garchEstimation(params, data, initVals), params0, ...
    A, b, [], [], lb, ub, [], opt);

% extract estimated sigma series
retrieveSigmas = zeros(numel(data), 1);
retrieveSigmas(1) = initVals(2);

% reconstruction
for ii=2:numel(data)
    % current sigmas depend on last sigma and last observation
    % through parameter values of the model
    retrieveSigmas(ii) = sqrt(paramsHat(1)+...
        paramsHat(2)*retrieveSigmas(ii-1)^2 + ...
        paramsHat(3)*data(ii-1)^2);
end

% plot observed values with extracted sigmas
close
figure('position', [50 50 1200 600])

subplot(2, 1, 1);
plot(DAX.dates(2:end), DAX.logRet)
datetick 'x'
set(gca, 'xLim', [DAX.dates(2) DAX.dates(end)]);

subplot(2, 1, 2);
plot(DAX.dates(2:end), retrieveSigmas)
datetick 'x'
set(gca, 'xLim', [DAX.dates(2) DAX.dates(end)]);
Local minimum possible. Constraints satisfied.

fmincon stopped because the size of the current step is less than
the default value of the step size tolerance and constraints are 
satisfied to within the default value of the constraint tolerance.



Since GARCH is one of the most important models in econometrics, all the standard applications associated with the model are already implemented in the econometrics toolbox of MATLAB. The framework is built on specification of GARCH models using a structure variable, which is the basic component to pass over model specifications between functions. As default model MATLAB has chosen the same GARCH(1,1) model with constant in the variance equation that we already have chosen above, with the one deviation that it also incorporates a constant term in the mean equation. However, we fit the default model to data without further specifications and compare it with our results so far.

% maximum likelihood estimation of parameters
[coeff, errors, llf, innovations, sigmas, summary] = ...
    garchfit(DAX.logRet);
____________________________________________________________
   Diagnostic Information

Number of variables: 4

Functions 
Objective:                            internal.econ.garchllfn
Gradient:                             finite-differencing
Hessian:                              finite-differencing (or Quasi-Newton)
Nonlinear constraints:                armanlc
Nonlinear constraints gradient:       finite-differencing

Constraints
Number of nonlinear inequality constraints: 0
Number of nonlinear equality constraints:   0
 
Number of linear inequality constraints:    1
Number of linear equality constraints:      0
Number of lower bound constraints:          4
Number of upper bound constraints:          4

Algorithm selected
   medium-scale: SQP, Quasi-Newton, line-search


____________________________________________________________
   End diagnostic information

                                Max     Line search  Directional  First-order 
 Iter F-count        f(x)   constraint   steplength   derivative   optimality Procedure 
    0      5      9714.45        -0.05                                         
    1     16      9676.37     -0.06484       0.0156         -559     3.87e+03   
    2     24       9558.4     -0.08613        0.125    -1.64e+03     3.22e+03   
    3     34      9534.85      -0.1147       0.0312         -769     2.46e+03   
    4     43      9517.37      -0.1075       0.0625         -476     1.97e+03   
    5     49      9437.74     -0.07251          0.5         -465     1.52e+03   
    6     55      9389.59     -0.03626          0.5         -735          848   
    7     66      9388.81     -0.03569       0.0156         -212     1.97e+03   
    8     72      9374.79     -0.01785          0.5         -350     1.17e+03   
    9     83      9374.77      -0.0183       0.0156        -28.8          246   
   10     90      9374.55     -0.01699         0.25         -146         56.3   
   11     95      9374.55     -0.01705            1        -16.7         7.05   
   12    100      9374.55     -0.01701            1        -2.59         3.79   
   13    105      9374.55     -0.01703            1        -1.13         1.64   
   14    110      9374.55     -0.01703            1       -0.387       0.0509  Hessian modified  

Local minimum possible. Constraints satisfied.

fmincon stopped because the predicted change in the objective function
is less than the selected value of the function tolerance and constraints 
are satisfied to within the selected value of the constraint tolerance.



No active inequalities.

Displaying the estimated coefficient structure allows comparison with the values estimated based on our own functions.

% display coefficient structure
coeff

% own values
fprintf('\nIn contrast to that, own values are:\n')
fprintf('       K: %1.4f\n', paramsHat(1))
fprintf('   GARCH: %1.4f\n', paramsHat(2))
fprintf('    ARCH: %1.4f\n', paramsHat(3))

% compare log-likelihoods
logLikelihoods = [-nllhVal llf]
coeff = 

          Comment: 'Mean: ARMAX(0,0,0); Variance: GARCH(1,1) '
     Distribution: 'Gaussian'
                C: 0.0654
    VarianceModel: 'GARCH'
                P: 1
                Q: 1
                K: 0.0320
            GARCH: 0.9006
             ARCH: 0.0824


In contrast to that, own values are:
       K: 0.0317
   GARCH: 0.9008
    ARCH: 0.0825

logLikelihoods =

   1.0e+03 *

   -9.3863   -9.3745

As the displayed values show, there are no substantial deviations, since the constant term in the mean equation is rather small. However, the garchfit function conveniently returns some more outputs, which are useful for further investigations. For example, previous plots easily can be replicated.

close
figure('position', [50 50 1200 600])
subplot(2, 1, 1)
plot(DAX.dates(2:end), DAX.logRet)
datetick 'x'
set(gca, 'xLim', [DAX.dates(2) DAX.dates(end)]);

subplot(2, 1, 2)
plot(DAX.dates(2:end), sigmas)
datetick 'x'
set(gca, 'xLim', [DAX.dates(2) DAX.dates(end)]);

Furthermore, it is easy to derive the standardized returns, where distortions from time-varying conditional first two moments have been extracted. Since the distribution of these standardized returns is the foundation in our maximum likelihood estimation procedure, it is reasonable to examine our assumption of normally distributed conditional returns.

% init params
nBins = 30;
n = numel(DAX.logRet);     % number of observations

% get standardized returns
stdRets = innovations./sigmas;

% compare with normal distribution
[counterX locationsX] = hist(stdRets, nBins);
width = diff(locationsX(1:2));
close all
bar(locationsX, counterX/(n*width), 1);
hold on;

% include best fitting normal distribution
[muhat sigmahat] = normfit(stdRets);
grid = muhat-4:0.1:muhat+4;
plot(grid, normpdf(grid, muhat, sigmahat), 'r', 'LineWidth', 2);
title('relative frequency normal distribution')

Since the figure does not exhibit substantial deviations, our priorly made assumption of conditional normally distributed returns seems to be sustainable. The next step now is to examine the performance of the GARCH model in a backtesting procedure.

% preallocate VaR vector
vars = zeros(numel(quants), numel(DAX.logRet));

for ii=1:numel(DAX.logRet)
    % get sigma value
    curr_sigma = sigmas(ii);
    vars(:, ii) = norminv(quants', coeff.C, curr_sigma);
end

for ii=1:numel(quants)
    % get exceedances
    exceeds = (DAX.logRet' <= vars(ii, :));

    % include in figure
    figure('position', [50 50 1200 600])
    plot(DAX.dates([logical(0) ~exceeds]), ...
        DAX.logRet(~exceeds), '.')
    hold on;
    plot(DAX.dates([logical(0) exceeds]), ...
        DAX.logRet(exceeds), '.r', 'MarkerSize', 12)
    datetick 'x'
    set(gca, 'xLim', [DAX.dates(2) DAX.dates(end)], ...
        'yLim', [floor(min(DAX.logRet))-1 0]);

    % include line for VaR estimations
    hold on;
    plot(DAX.dates(2:end), vars(ii, :), '-k')

    % calculate exceedance frequency
    frequ = sum(exceeds)/numel(DAX.logRet);

    title(['Exceedance frequency: ' num2str(frequ, 3)...
        ' instead of ' num2str(quants(ii), 3)])
end
close all