Working with financial data: regression analysis and curve fitting

Christian Groll

Seminar f�r Finanz�konometrie, Ludwig-Maximilians-Universit�t M�nchen.

All rights reserved.

Contents

This is the second part of the MATLAB course. Here we will show how to download real data and how this data can be further processed.

Then, as first application, we will examine whether some theoretical relationships between risk and return can be found in german stock data. This analysis will be based on regression models.

Subsequently, we will try to find deterministic trends in stock market data based on curve fitting approaches. Only the next script will show common approaches to modelling of stock market returns as stochastic variables.

Required functions

hist_stock_data
processData
LPPL
LPPLfit
constrFunc
LPPLinteractively

Load historic DAX prices

The following code provides an example of the usage of the function hist_stock_data, which is able to download historic stock prices and trading volumes based on the data provided by Yahoo!finance. In order to make the data comply with our requirements, some additional treatments are needed first.

% specify ticker symbol as string variable
tickSym = '^GDAXI';     % specify stock data of interest

% specify beginning and ending as string variables
dateBeg = '01011990';   %  day, month, year: ddmmyyyy
dateEnd = '01072013';

Alternatively, dates also can be determined dynamically, with regard to today. Since the MATLAB command "today" measures time on a numeric scale, the value displayed is a number.

% display date of today
fprintf(['\nToday is ' num2str(today) '.\n'])
    % Note: fprintf is able to display a string to the command
    % window, without having to assign it to a variable or MATLAB's
    % temporary placeholder "ans" (short for "answer") first. In
    % order to get the input string, in this case we first have to
    % concatenate smaller strings into one large string.
Today is 735471.

In order to make the numeric data value more meaningful, we can transform it with the function datestr() into a date expression. As a string, this can be directly assigned to the variable declaring the end of the data period requested.

% dynamic assignment to end of period
dateEnd = datestr(today, 'ddmmyyyy')  % today as last date
dateEnd =

25082013

However, instead of using "today", you also can use the command "date", which returns the date as string right away.

fprintf(['\nToday is ' date '.\n'])
Today is 25-Aug-2013.

In order to download data from Yahoo!finance, we make use of the function hist_stock_data. This function can be found at the MATLAB File Exchange at http://www.mathworks.com/matlabcentral/fileexchange/. The File Exchange is a place where users can find and share content related to MATLAB development.

% load data
daxCrude = hist_stock_data(dateBeg, dateEnd, tickSym);

The function hist_stock_data returns a structure variable. A more detailed insight into the formatting of the output can be achieved with queries.

daxCrude
exampleDateEntries = daxCrude.Date(1:4)
daxCrude = 

      Ticker: '^GDAXI'
        Date: {5761x1 cell}
        Open: [5761x1 double]
        High: [5761x1 double]
         Low: [5761x1 double]
       Close: [5761x1 double]
      Volume: [5761x1 double]
    AdjClose: [5761x1 double]


exampleDateEntries = 

    '2013-08-23'
    '2013-08-22'
    '2013-08-21'
    '2013-08-20'

As the second query shows, historic prices are ordered with latest observations first. This configuration is disadvantageous for further work, since plotting of the prices would show the latest observations to the left. Moreover, instead of storing the dates as a cell array of string variables, we will achieve more flexibility if we store dates as serial dates, which is the same conversion we already encountered with the today command. In this numeric scale, each date is assigned to a uniquely specified number. As anchor point of this system, January 1st, 0000, is assigned to the value one.

fprintf(['Number 1 is equal to the date ' datestr(1) '.\n'])
Number 1 is equal to the date 01-Jan-0000.

In order to switch between dates given as strings and numeric serial dates the functions datestr and datenum can be used. Now we want to convert the date strings to serial dates.

serialDates = datenum(daxCrude.Date, 'yyyy-mm-dd');
% second argument specifies input format of string dates

In accordance with common convention prices and dates shall be arranged in increasing order, with most recent data at the end. Instead of manually encoding a for-loop, the MATLAB function flipud can be used to flip both matrices upside down. The results will be assigned to fields of a new structure variable called dax.

% flip most recent entries to the end
dax.dates = flipud(serialDates);    % initializes structure dax
dax.prices = flipud(daxCrude.Close);

Plotting financial data

When plotting financial data, we usually want the x-axis to be denoted in dates instead of numeric values. This can be done with help of the command "datetick", which interprets values of the respective axis as serial dates, and converts the labels of the individual ticks into meaningful date strings.

Further adjustments to graphical representations can be achieved by manual configuration of figure sizes, as well as additional graphics in one figure window. Both concepts are applied in the following illustration.

figure('position',[50 50 1200 600]) % create gray window, left
                                    % corner at latitude 50,
                                    % height 50, with width 1200
                                    % and height 600

subplot(1, 2, 1);     % Include two different white windows within
                    % the gray figure window. 1,2 denotes
                    % arrangement (one row, two columns of white
                    % windows), while the last number (1) denotes
                    % the currently used window.

% use plot command without further adjustments
plot(dax.prices) % since no x-values are specified, MATLAB
                % automatically numbers observations from 1 to
                % numel(dax.dates).

subplot(1, 2, 2);
plot(dax.dates, dax.prices)
datetick 'x'    % exact format of date labels can be chosen with
                % additional input, e.g. try datetick('x', 29) and
                % datetick('x', 10)
xlabel('dates')
ylabel('prices')
title('historic DAX values')

% crop x-axis to relevant size only
set(gca, 'xLim',[dax.dates(1) dax.dates(end)])

As can be seen at the command line used to crop the x-axis, though MATLAB renders the x-axis labels to date strings, it still needs references denoted in numeric values. That is, it is not possible to directly tell MATLAB to restrict the axis to 01.01.2000 to 31.12.2002 for example. Indexing with date strings is generally not possible. Hence, simple graphical modifications may become quite cumbersome. As first example, the maximum value during the period shall be highlighted with a red point. The general way to do this will be to first find the entry with the highest point in the price vector, which will be given as index value relative to the price matrix. Then, the index has to be converted into the respective index of the serial dates vector. In most cases, lengths of price and serial dates vectors will coincide, so that nothing needs to be done in this step. At last, this index is used to get the value of the serial dates vector at the respective entry.

Logical indexing

In order to find certain specified values in a given matrix, MATLAB usually makes use of the concept of logical indexing. In logical notation, a value of zero represents "no", while a value of "1" has to be interpreted as "yes". Hence, for example, when checking whether the entries of a matrix fulfill a given condition, MATLAB checks the condition on each entry, and returns a logical matrix of same size filled with zeros and ones.

% init matrix
matr = [1 2 3 4 5 6]

% check if entries are greater than 3
greaterThan3 = matr>3

% matrix greaterThan3 is logical variable
logicalCheck = islogical(greaterThan3)
matr =

     1     2     3     4     5     6


greaterThan3 =

     0     0     0     1     1     1


logicalCheck =

     1

Such logical matrices created from logical or relational operations can be used to extract values of a given matrix. However, these extracted values are always returned arranged in vectors, so that information about the position in the original matrix is lost.

% get values greater than 3
valuesGreater3 = matr(greaterThan3)
valuesGreater3 =

     4     5     6

In order to compare numerical values, MATLAB uses relational operators.

% equal to
equalThree = (matr == 3)   % since single equal signs are already
                        % used for assignments, MATLAB notation
                        % requires two equality signs to check
                        % for equality

% less or equal
lessEqualThree = (matr <= 3)   % greater or equal: >=

% not equal to
notEqualThree = (matr ~= 3)
equalThree =

     0     0     1     0     0     0


lessEqualThree =

     1     1     1     0     0     0


notEqualThree =

     1     1     0     1     1     1

Matrix manipulation also can be done based on logical indexing. For example, set each value of matr below three to zero.

% new matrix given by
matr(matr<3)=0

% multiply each entry greater than 4 with 10
matr(matr>4) = 4*matr(matr>4)
matr =

     0     0     3     4     5     6


matr =

     0     0     3     4    20    24

In order to check more than one condition, MATLAB also includes operators to combine logical matrices.

% create two random logicals
log1 = (rand(3)>0.5)
log2 = (rand(3)>0.5)

% elementwise AND: both logicals have to be 1
AND = (log1 & log2)

% elementwise OR: at least one of both logicals has to be 1
OR = (log1 | log2)

% NONE: elementwise AND inverted with tilde command
NONE = ~AND
log1 =

     1     1     1
     1     0     1
     1     0     1


log2 =

     1     0     0
     1     1     1
     1     1     1


AND =

     1     0     0
     1     0     1
     1     0     1


OR =

     1     1     1
     1     1     1
     1     1     1


NONE =

     0     1     1
     0     1     0
     0     1     0

In order to not lose information about the position of entries within a matrix, you can use the find() function, which returns the indices of the first k entries which fulfill a certain condition. However, note that indices are given in single index notation.

matr
greater20 = (matr>=20) % shows logical matrix: indices could be read
                    % off manually
find(matr>=20)     % automatically returns all indices with logical
                % value one
find(matr>=20, 1)   % returns only first index with logical value one
matr =

     0     0     3     4    20    24


greater20 =

     0     0     0     0     1     1


ans =

     5     6


ans =

     5

Returning to the original intention to highlight the maximum price value, this gives

% find index of maximum price
ind = find(dax.prices == max(dax.prices), 1);

% find associated serial date
maxDate = dax.dates(ind);

% include in subplot(1, 2, 2)
subplot(1, 2, 2)
hold on;    % allows to add elements, without deleting previous
            % graphic

plot(maxDate, max(dax.prices), '.r', 'MarkerSize', 18)
shg         % show current graphic: pops up figure window again
hold off;   % next graphic will not be included again

Despite of going this quite cumbersome programmatic way, MATLAB also allows interactive request in figures. This way, press the "Data Cursor" button in the figures toolbar, select the global maximum on your own by clicking on it, then right click on datatip, and "Export Cursor Data to Workspace". Set name to maxPrice. This exports information about the maximum stock price into a structure called maxPrice.

Instead of absolute prices, investors usually are more interested in returns of the dax, where absolute gains / losses are put in relation to the capital invested. Since each return needs one starting value and one ending value, the length of the time series will decrease by one. Hence, also the date vector has to be adapted.

% transform prices to discrete percentage returns
dax.disRet = 100*(dax.prices(2:end) - dax.prices(1:end-1))./...
    dax.prices(1:end-1);

% date vector for returns
dax.retDates = dax.dates(2:end);

% compare prices and returns in figure
close   % closes previously used figure window

% plot prices
ax(1) = subplot(2, 1, 1); % storage of axes handle at ax(1) allows
                        % accessing it later
plot(dax.retDates, dax.prices(2:end))
datetick 'x'
set(gca, 'xLim',[dax.retDates(1) dax.retDates(end)])
title('historic DAX prices')
xlabel('dates')
ylabel('prices')

% plot returns
ax(2) = subplot(2, 1, 2);
plot(dax.retDates, dax.disRet)
datetick 'x'
set(gca, 'xLim',[dax.retDates(1) dax.retDates(end)])
title('historic DAX returns')
xlabel('dates')
ylabel('returns')

% connect axes of both graphs: zooming in applies to both plots
linkaxes([ax(1) ax(2)], 'x')

As can be seen, the return series exhibits phases of different volatilities. While most of the time rather small returns occur, there are also times were both positive and negative returns are persistently larger. This observation is usually refered to as volatility clusters, and is understood as a stylized fact about stock market data. However, when focussing on longer investment horizons, we usually neglect the information about the exact times of occurrence of each return, and take a look at the distribution of returns only. This is done with a histogram.

close   % closes previously used window
hist(dax.disRet, 30)     % sample size justifies 30 bins

In this figure, we want to include a line indicating the mean return over the observed period, as well as two lines indicating regions with more than two standard deviations away from the mean.

% calculate mean return
meanRet = sum(dax.disRet)/numel(dax.disRet);

% calculate standard deviation
stdDev = sum((dax.disRet-meanRet).^2)/(numel(dax.disRet)-1);
stdDev = sqrt(stdDev);

% check results with existing MATLAB functions
deviations = [(meanRet - mean(dax.disRet))...
    (stdDev-sqrt(var(dax.disRet)))]

% include in graphic
yLimits = get(gca, 'yLim');
line(meanRet*[1 1], yLimits, 'Color', 'r')
line((meanRet+2*stdDev)*[1 1], yLimits, 'Color', 'r')
line((meanRet-2*stdDev)*[1 1], yLimits, 'Color', 'r')
text(meanRet+2*stdDev, yLimits(end)/2, '2 standard deviations')
deviations =

     0     0

Regression analysis

One of the most important models also in econometrics is the linear model. Hence, the following lines show the implementation and estimation of a linear regression model in MATLAB. First, we sample from a specified linear model.

% init params
nSim = 1000;       % sample size
muX = 12;       % params explanatory variable
sigmaX = 2.3;
coeff = 0.8;    % regression coefficient
intcept = 4.3;  % regression intercept

% simulate explanatory variable
xMatr = normrnd(muX, sigmaX, nSim, 1);

% simulate standard normally distributed innovations
epsilon = randn(nSim, 1);

% calculate Y according to linear model
yMatr = intcept + coeff*xMatr + epsilon;    % do not use for loop

Now we want to estimate the parameters of the model based on the values simulated.

% because of intercept, expand matrix of explanatory variables
xMatr = [ones(nSim, 1) xMatr];

% OLS estimation, naive way
paramsHat = inv(xMatr'*xMatr)*xMatr'*yMatr;
% usual estimation formula

% avoiding single matrix inversion as mlint warning suggests
paramsHat2 = (xMatr'*xMatr)\(xMatr'*yMatr);      % faster way
paramsHat3 = xMatr\yMatr;                % best way

% calculate regression line
xLimits = [floor(min(xMatr(:, 2))) ceil(max(xMatr(:, 2)))];
                                 % use nearest
                                 % neighbouring integer numbers
grid = xLimits(1):0.1:xLimits(2);
vals = paramsHat(1)+paramsHat(2)*grid;

% plotting data
close
scatter(xMatr(:, 2), yMatr, '.');   % used for visualizing points
                                    % cloud

% include regression line
hold on;    % plot in same figure
plot(grid, vals, 'LineWidth', 2, 'Color', 'r')   % larger line width
set(gca, 'xLim', xLimits)
xlabel('regressor variable')
ylabel('dependent variable')
title(['Linear model: estimated beta is ' num2str(paramsHat(2))])

Because of the risk-aversion of investors, theoretical models often conclude that riskier assets should in general coincide with higher expected returns, since investors demand higher compensation for the risk involved. As a first application of the linear model, we want to investigate whether this positive relationship can be verified for German stock data. Therefore, we will download historical data of all 30 components of the German stock market index DAX, estimate their mean return and return standard deviation, and regress the mean returns on the standard deviations. Note that standard deviation is only one way to measure inherent risk, and one common criticism is that the symmetrical nature of standard deviation measures positive deviations the same way as negative ones.

% specify start and end point of investigation period
dateBeg = '01011990';
dateEnd = '01072011';

% download data of all components: dax_comp is structure array
daxComp = hist_stock_data(dateBeg, dateEnd, 'ADS.DE', 'ALV.DE',...
    'BAS.DE', 'BAYN.DE', 'BEI.DE', 'BMW.DE', 'CBK.DE', 'DAI.DE', ...
    'DB1.DE',...
    'DBK.DE', 'DPW.DE', 'DTE.DE', 'EOAN.DE', 'FME.DE', 'FRE.DE',...
    'HEI.DE', 'HEN3.DE', 'IFX.DE', 'LHA.DE', 'LIN.DE', 'MAN.DE',...
    'MEO.DE', 'MRK.DE', 'MUV2.DE', 'RWE.DE', 'SAP', 'SDF.DE',...
    'SIE.DE', 'TKA.DE', 'VOW3.DE');

When downloading data of so many different stocks at Yahoo!finance, we usually will observe different sample sizes of the individual time series. This also has to be taken into account when stocks of different countries are involved, since deviating holidays will lead to different sample sizes. Let's first investigate the sample sizes.

% preallocate storage variables for first dates and samples sizes
firstDates = zeros(size(daxComp));
sampleSizes = zeros(size(daxComp));

% extract first date and sample size of each component
for ii=1:numel(firstDates)
    firstDates(ii) = datenum(daxComp(ii).Date(end));
    sampleSizes(ii) = numel(daxComp(ii).Date);
end

% display first dates as strings to command window
fprintf('\nThe respective first observations are given by:\n')

for ii=4:4:numel(daxComp)
    % display four dates per row
    fprintf([datestr(firstDates(ii-3), 'dd-mmm-yyyy') ', '...
        datestr(firstDates(ii-2), 'dd-mmm-yyyy') ', ' ...
        datestr(firstDates(ii-1), 'dd-mmm-yyyy') ', ' ...
        datestr(firstDates(ii), 'dd-mmm-yyyy') '\n'])
end

% if numel(daxComp) is not divisible by 4
remaining = mod(numel(daxComp), 4);
nMultiplesOfFour = (numel(daxComp) - remaining) / 4;
nAlreadyShown = nMultiplesOfFour * 4;
for ii=1:remaining
    if(ii==1)
        str = datestr(firstDates(ii + nAlreadyShown), 'dd-mmm-yyyy');
    else
        str = [str ', ' datestr(firstDates(ii + nAlreadyShown),...
                                'dd-mmm-yyyy')];
    end
end
fprintf(str)

% get ticker symbol of component with minimum sample size
tSym = daxComp(find(sampleSizes == min(sampleSizes))).Ticker;

% display with sample sizes
fprintf(['\nThe minimum sample size occurs for ' tSym ...
    '.\nThere are only %2i observations.\n'], min(sampleSizes))
The respective first observations are given by:
03-Jan-2000, 03-Jan-2000, 03-Jan-2000, 03-Jan-2000
03-Jan-2000, 03-Jan-2000, 03-Jan-2000, 03-Jan-2000
05-Feb-2001, 03-Jan-2000, 20-Nov-2000, 03-Jan-2000
03-Jan-2000, 03-Jan-2000, 03-Jan-2000, 03-Jan-2000
01-Jan-2003, 14-Mar-2000, 03-Jan-2000, 03-Jan-2000
03-Jan-2000, 03-Jan-2000, 03-Jan-2000, 03-Jan-2000
28-Nov-2000, 03-Aug-1998, 03-Jan-2000, 03-Jan-2000
03-Jan-2000, 28-Dec-2007
The minimum sample size occurs for VOW3.DE.
There are only 656 observations.

This index refers to VOW3.DE, standing for Volkswagen. Since all other sample sizes are large enough, we simply exclude Volkswagen from the analysis.

% delete Volkswagen from data
indexOfMinimumSampleSize = find(sampleSizes == min(sampleSizes));
daxComp(indexOfMinimumSampleSize) = [];
firstDates(indexOfMinimumSampleSize) = [];
sampleSizes(indexOfMinimumSampleSize) = [];

% get new minimum
fprintf(['\nThe new minimum now is %2i, which seems to be\n'...
    'sufficient for reasonable analysis.\n'], min(sampleSizes))
The new minimum now is 2199, which seems to be
sufficient for reasonable analysis.

Exercise:

Since the availability of data for individual DAX components changes from time to time, it is not guaranteed, that there will always be only one company with insufficient data. A better way hence would be defining a certain minimal sample size as threshold. Then, all companies with less data should be removed automatically.

In order to eliminate data points with missing values and to adjust the data to the usual convention with chronologically increasing points in time, we make use of the function processData(). Also, string dates are converted to serial dates, and the already used data of the German stock index is included.

tic
[daxDates daxPrices] = processData([daxComp daxCrude]);
toc
Elapsed time is 0.829616 seconds.

The following two queries give an impression about the nature of the output of the function.

% both output are numeric variables
numericVars = [isnumeric(daxDates) isnumeric(daxPrices)]

% get dimensions
size(daxDates)
size(daxPrices)
numericVars =

     1     1


ans =

        2095           1


ans =

        2095          30

Hence the data consist of about 2000 observations of 30 different stocks (29 DAX components and the DAX itself), and daxDates is the vector of respective dates in serial dates format. This information will be stored more meaningful and robust in a structure called daxStocks, together with respective returns, return dates and ticker symbols.

% assign existing data to daxStocks fields
daxStocks.dates = daxDates;
daxStocks.prices = daxPrices;

% transform to percentage discrete returns
daxStocks.disRet = 100*diff(daxPrices)./daxPrices(1:end-1,:);

% diff() calculates differences between successive matrix entries
c = rand(2)
differences = diff(c)

% get ticker symbols
daxStocks.ticker = {daxComp.Ticker daxCrude.Ticker};
c =

    0.6529    0.3000
    0.3815    0.3401


differences =

   -0.2714    0.0401

Now that historical returns are given suitable form, we can easily estimate expected returns and standard deviations. Note that most statistical functions act columnwise. Hence it is always preferable to store observations of a given variable in a column vector, and use different columns for different variables.

% estimate returns and sigmas of DAX components
expRets = mean(daxStocks.disRet );
sigmaHats = sqrt(var(daxStocks.disRet));

% show in figure, standard deviations on x-axis
close   % close last figure
scatter(sigmaHats, expRets, '.')

% highlight DAX itself
hold on;
scatter(sigmaHats(end), expRets(end), 30,[1 0 0], 'filled')

% estimate regression line
betaHat = [ones(numel(sigmaHats), 1) sigmaHats']\expRets';

% calculate regression line
xLimits = get(gca, 'XLim');
grid = linspace(xLimits(1), xLimits(end), 200);   % divide
                                    % specified interval in 200
                                    % parts of equal size
yVals = [ones(numel(grid), 1) grid']*betaHat;

% include regression line in red
plot(grid, yVals, 'r')

% get R^2 from existing MATLAB function
stats = regstats(expRets, sigmaHats',...
    'linear', 'rsquare');
title(['R-square of regression: ' num2str(stats.rsquare)])
xlabel('estimated standard deviations')
ylabel('estimated mean returns')

Although the regression line exhibits an increasing slope as theory suggests, the R-squared of the regression is rather small. Evidence for a positive relation between return and risk is rather weak.

CAPM

The capital asset pricing model tries to explain asset pricies. It is set up on the assumption, that investors only get compensated for that part of an asset's risk that can not get diversified away in a portfolio. Shortly speaking, each assets partly exhibits comovements with the market, called systematic risk. Since this risk component underlies each asset, it can not be diversified away. Hence, investors need to be compensated for it. In contrast to that, the remaining risk inherent in an asset is called the idiosyncratic risk. This component is stock specific, and hence not correlated with idiosyncratic components of other firms. Hence, in a large portfolio of assets, this component could be diversified away.

In order to measure each assets' comovement with the market, we perform a linear regression of the daily returns on daily returns of a market index. Note that the theory is based on dependence to a market portfolio, where our market index here is only an imperfect substitution.

% preallocate vector for estimated betas
betas = zeros(1, 29);
for ii=1:29
    betas(ii) = regress(daxStocks.disRet(:, end),...
        daxStocks.disRet(:, ii));   % no intercept involved
end

% plot betas with expected returns
close
scatter(betas, expRets(1:end-1), '.')

% estimate regression coefficients with intercept
betaHat = [ones(numel(betas), 1) betas']\expRets(1:end-1)';

% include regression line
xLimits = get(gca, 'XLim');
grid = linspace(xLimits(1), xLimits(end), 200);
yVals = [ones(numel(grid), 1) grid']*betaHat;

hold on;
plot(grid, yVals, 'r')
xlabel('estimated beta coefficients')
ylabel('estimated mean returns')
title('CAPM disproved?')

Note that this analysis is only a very rough investigation of the validity of the CAPM, with many sources of error involved (only substitute for market portfolio, applied to returns instead of excess returns,...). In fact, the purpose merely was to come up with some easy example of regression analysis in finance. So do not make the mistake to interpret the investigations as scientifically rigurous and adequate approach. As part of a more thorough investigation at least also returns of larger time horizons would have to be examined.

Stock price prediction based on curve fitting

While the previous part was concerned with looking for an explanatory variable for stock returns, we now will try to find regularities in stock prices that allow to make predictions on future price movements. That is, in course of its evolution, any stock price seems to follow some trend at some point of time. Looking at charts of stock prices one usually might be tempted to assume that such trends could be identified in real-time, thereby allowing for speculative trading opportunities. The idea in this chapter is to fit certain functions to historic stock price paths. Given that the function seems to be a good approximation to past prices, chance might be that it will still be an approximation in the future, so that our function could be used as stock price predictor. However, the approach taken here is slightly different. Based on curve fitting tools, positive trends in stock prices shall be identified. But instead of trying to exactly predict future prices, we only try to identify points in time where the current dynamic changes. That is, we only try to predict break-offs of rising stock prices, without bothering with the exact type of regime evolving after the break-off.

Given that returns fluctuate around a constant positive value, prices should exhibit exponential growth. Such growth rates best can be seen on logarithmic scale, since they correspond to a straight line here. Hence, we first extend the data structure with an additional field logPrices. Visualization shows that DAX prices tend to exhibit super-exponential growth during certain periods.

% get log prices
dax.logPrices = log(dax.prices);

% specify subperiod as strings
begT = '01-Jun-1993';
endT = '29-Jul-1998';

% find indices associated with considered period
indS = find(dax.dates > datenum(begT, 'dd-mmm-yyyy'), 1);
indE = find(dax.dates > datenum(endT, 'dd-mmm-yyyy'), 1);

Note: it is not possible to access the prices with indexing based on the dates of the time series. Hence, dates always have to be converted to chronological indices first. However, the finance toolbox of MATLAB also includes financial time series objects (fints) that can be indexed by date strings. For example, myfts({'05/11/99', '05/21/99', '05/31/99'}) extracts the values of the fints object myfts at the specified dates.

% create figure window
close
figure('Position', [50 50 1200 600])

% plot DAX prices with subperiod highlighted
ax(1) = subplot(2, 1, 1);
plot(dax.dates, dax.prices, 'Color', [1 0.8 0.8]);
hold on;
plot(dax.dates(indS:indE), dax.prices(indS:indE));
datetick 'x'
title('linear scale')

% plot log DAX prices with subperiod highlighted
ax(2) = subplot(2, 1, 2);
plot(dax.dates, dax.logPrices, 'Color', [1 0.8 0.8]);
hold on;
plot(dax.dates(indS:indE), dax.logPrices(indS:indE)); shg
datetick 'x'
title('logarithmic scale')

% connect axes of both graphs: zooming in applies to both plots
linkaxes([ax(1) ax(2)], 'x');

Although it would be easier to fit a straight line to log prices we want to estimate to best fitting exponential growth for normal prices using an optimization. Hence, the exponentially growing function f(x)= a_1*exp(a_2*x) shall be fitted to the stock prices. Therefore, parameters a_1 and a_2 will be chosen such that the mean squared error between the exponential function and the historic price chart is minimized.

% create new grid for subperiod, starting at 1
daysSinceBeg = 1:numel(dax.dates(indS:indE));   % stock market
        % prices are treated as equidistant, with no distinction
        % between Friday / Monday or Monday / Tuesday

% define exponential function as anonymous function
expFun = @(x, params) params(1)*exp(x.*params(2));

% evaluating exponential function similar to normal functions
fprintf(['Calling the anonymous function according to '...
    'usual syntax\nexpFun(3,[0.5 0.5])\nreturns the value'...
    ' %1.2f.\n'], expFun(3,[0.5 0.5]))
Calling the anonymous function according to usual syntax
expFun(3,[0.5 0.5])
returns the value 2.24.
% define mean squared error function as anonymous function
errFun = @(params, x, prices)...
    sum((prices(:) - expFun(x(:), params)).^2);  % for any price
        % series given by prices and associated x values the
        % error function computes the mean squared error between
        % exponential function with parameters params and the
        % price series

% init guess for optimization
params0 = [dax.prices(indS) ...
    log(dax.prices(indE) - dax.prices(indS))/...
    (dax.dates(indE) - dax.dates(indS))];
        % params(2) chosen so that it fulfills the equation:
        % exp((daysSinceBeg(end)-daysSinceBeg(1))*a_2)
        %           != prices(end)-prices(1)

% specify options for optimization
opt = optimset('display', 'off', 'TolX', 1e-18, 'TolFun', 1e-8);

% optimization
[bestParams expMSE] = fminsearch(errFun, params0, opt,...
    daysSinceBeg, dax.prices(indS:indE));

Note: since the objective function, which shall be minimized, also depends on the grid values of x and the given price vector prices, these data has to be given as fixed input into the optimization, since the optimization shall only be applied to the parameter values. Therefore, the parameters of interest have to appear in the objective function as one vector and as first input. Additional inputs are included in the optimization routine fminsearch as additional inputs at last positions. However, this syntax is only allowed when the objective function is given as function handle to an anonymous function. An example of a similiar optimization task involving an already existing MATLAB function will be given further below.

% calculate associated exponential function values
expVals = expFun(daysSinceBeg, bestParams);

% include in given figure
subplot(2, 1, 1);
plot(dax.dates(indS+daysSinceBeg), expVals, 'r'); % Note:
        % dax.dates(indS) + daysSinceBeg does not work, since
        % dax.dates is not numbered consecutively. dax.dates
        % refers to business days, not consecutive days!
xlabel('dates')
ylabel('prices')

subplot(2, 1, 2);
plot(dax.dates(indS+daysSinceBeg), log(expVals), 'r'); shg
xlabel('dates')
ylabel('prices')

% calculate mean squared error on logarithmic scale
mse = sum((dax.logPrices(indS+daysSinceBeg)-log(expVals(:))).^2);

% display mean squared error
fprintf(['\nThe mean squared error between the exponential fit'...
        ' and\nthe stock price path is %3.4f.\n'], mse);
The mean squared error between the exponential fit and
the stock price path is 32.2208.

With the straight line as benchmark, one can see that the stock price path exhibits a convex curvature during the subperiod. This pattern indicates super-exponential growth rates. Such growth rates usually are associated with stock market bubbles. Our intention now will be to identify evolving stock market bubbles, and try to predict the time they burst. According to Didier Sornette and his colleagues, stock market bubbles can be approximated with super-exponentially growing log-periodic power law (LPPL) functions. These are super-exponentially growing functions with finite-time singularities and oscillating behaviour, given by the formula: f(x) = a_1 + a_2*(a_3-x)^(a_4)* (1+a_5*cos(a_6*log(a_3-a_8*x)+a_7). In order to get an impression about the appropriateness of a LPPL function, we will fit it the subperiod and compare its mean squared error to the error of a simple exponential fucntion. Furthermore, we will examine whether the date of the estimated finite-time singularity could be used as indicator of a forthcoming change in regimes.

% fit LPPL model to subperiod
params = lpplFit(dax.logPrices(indS:indE));

% calculate approximation values to stock prices
[vals derivs] = lpplFunc(params);

% create associated grid
grid = dax.dates(indS + ( 1:(params(3)/params(8)-1) ));
    % Note: params(3)/params(8) denotes the time in business days
    % from beginning of subperiod until finite-time singularity.

% include in given figure
subplot(2, 1, 2);
plot(grid, vals, 'g'); shg

% include line for finite time singularity
yLimits = get(gca, 'yLim');
line(dax.dates(indS + floor(params(3)/params(8)) )*[1 1], yLimits,...
    'Color', 'k')

% calculate mean squared error on logarithmic scale
mseLppl = sum( (dax.logPrices(indS + daysSinceBeg) -...
    (vals(daysSinceBeg)')).^2);

fprintf(['\nIn contrast to the MSE of ' num2str(mse) ...
    ' obtained before,\n we now get a MSE of only '...
    num2str(mseLppl) '.\n'])
In contrast to the MSE of 32.2208 obtained before,
 we now get a MSE of only 3.8615.

When looking at the figure, we can see that the fitted LPPL model at the time of the end of the subperiod could indicates an impending regime change, since the critical point given by the finite-time singularity lies only days ahead.

In order to examine the validity of the LPPL model on further stock market indices, you can uncomment the following lines of code and interactively conduct experiments on historic data. As examples of further accurate subperiod fitting, take a look at Hang Seng index from 15-Dec-2004 to 21-Nov-2007, which leads to an estimated regime change 52 business days ahead, or the German stock market index from 15-Oct-1992 to 29-Jul-1998.

% Interactive examination of further stock market indices.

%tickerSyms = cell(8, 1); tickerSyms = {'^GDAXI';'^SMSI';'^SSMI';... '^OMXSPI';'^NDX';'^DJI';'^HSI';'^SSEC'};

indexNames = {'DAX'; 'Madrid General';... 'Swiss Market'; 'Stockholm General'; 'NASDAQ'; ... 'Dow Jones Industrial'; 'Hang Seng';... 'Shanghai Composite'};

for ii=1:numel(tickerSyms) fprintf(['\nIndex investigated: ' indexNames{ii} '\n']) data = hist_stock_data(begT, endT, tickerSyms{ii});

   if (~isempty(data))
      [data_dates data_prices] = processData(data);
      params = LPPLinteractively(data_prices, data_dates)
      title([indexNames{ii} ' -- Press key to continue'])
      hold off
   end
   pause
end

%% Movie

frames = LPPLmovie(data_prices, data_dates, 50);

%% movie(frames, 10, 2)