Financial markets have a strong cyclical component linked to the business cycle, and may undergo temporary or permanent shifts related to both endogenous and exogenous shocks. As a result, the distribution of returns is likely to be different depending on which state of the world we are in. It is therefore advantageous to model such dynamics within a framework which can accommodate and explain these different states. Within time series econometrics, the switching between states has been based either on unobserved components, giving rise to the Markov Switching (MS) models introduced by Hamilton (1989), or observed components leading to Threshold type Autoregressive (TAR) models popularized by Tong (1980). This post and the accompanying package it introduces considers only observed component switching, though there are a number of good packages in R for Markov Switching.

The smooth transition AR model was introduced in Teräsvirta (1994), and has been used in numerous studies from GDP modelling to forex and equity market volatility. See the presentation from the talk above which includes a number of selected references.

The s-state model considered in the **twinkle** package takes the following form:

\[

{y_t} = \sum\limits_{i = 1}^s {\left[ {\left( {{{\phi ‘}_i}y_t^{\left( p \right)} + {{\xi ‘}_i}{x_t} + {{\psi ‘}_i}e_t^{\left( q \right)}} \right){F_i}\left( {{z_{t – d}};{\gamma _i},{\alpha _i},{c_i},{\beta _i}} \right)} \right]} + {\varepsilon _t}

\]

where:

\[

\begin{gathered}

y_t^{\left( p \right)} = {\left( {1,\tilde y_t^{\left( p \right)}} \right)^\prime },\quad \tilde y_t^{\left( p \right)} = {\left( {{y_{t – 1}}, \ldots ,\quad {y_{t – p}}} \right)^\prime },{\phi _i} = {\left( {{\phi _{i0}},{\phi _{i1}}, \ldots ,\quad {\phi _{ip}}} \right)^\prime } \\

\varepsilon _t^{\left( q \right)} = {\left( {{\varepsilon _{t – 1}}, \ldots ,\quad {\varepsilon _{t – q}}} \right)^\prime },{{\psi ‘}_i} = {\left( {{\psi _{i1}}, \ldots ,{\psi _{iq}}} \right)^\prime }\\

{x_t}{\text{ }} = {\left( {{x_1}, \ldots ,{x_l}} \right)^\prime },\quad {{\xi ‘}_i}{\text{ }} = {\left( {{\xi _{i1}}, \ldots ,{\xi _{il}}} \right)^\prime } \\

\end{gathered}

\]

and we allow for a variance mixture so that \( {\varepsilon _t} \sim iid\left( {0,{\sigma _i},\eta } \right) \) with \( \eta \) denoting any remaining distributional parameters which are common across states. The softmax function is used to model multiple states such that:

\[

\begin{gathered}

{F_i}\left( {{z_{t – d}};{\gamma _i},{\alpha _i},{c_i},{\beta _i}} \right) = \frac{{{e^{{\pi _{i,t}}}}}}

{{1 + \sum\limits_{i = 1}^{s – 1} {{e^{{\pi _{i,t}}}}} }} \\

{F_s}\left( {{z_{t – d}};{\gamma _i},{\alpha _i},{c_i},{\beta _i}} \right) = \frac{1}

{{1 + \sum\limits_{i = 1}^{s – 1} {{e^{{\pi _{i,t}}}}} }}\\

\end{gathered}

\]

where the state dynamics \( \pi_{i,t} \) also include the option of lag-1 autoregression:

\[

{\pi _{i,t}} = {\gamma _i}\left( {{{\alpha ‘}_i}{z_{t – d}} – {c_i}} \right) + {{\beta ‘}_i}{\pi _{i,t – 1}},\quad \gamma_i>0

\]

with initialization conditions given by:

\[\pi _{i,0} = \frac{{{\gamma _i}\left( {{{\alpha ‘}_i}\bar z – {c_i}} \right)}}{{1 – {\beta _i}}},\quad \left| \beta \right| < 1\]

The parameter \( \gamma_i \) is a scaling variable determining the smoothness of the transition between states, while \( c_i \) is the threshold intercept about which switching occurs.

The transition variable(s) \( z_t \) may be a vector of external regressors each with its own lag, or a combination of lags of \( y_t \) in which case the model is ‘rudely’, as one participant in the conference noted, called ‘self-exciting’. Should \( z_t \) be a vector, then for identification purposes \( \alpha_1 \) is fixed to 1. Additionally, as will be seen later, it is also possible to pass a function \( f\left(.\right) \) which applies a custom transformation to the lagged values in the case of the self-exciting model. While it may appear at first that the same can be achieved by pre-transforming those values and passing the transformed variables in the switching equation, this leaves you dead in the water when it comes to s(w)imulation and n-ahead forecasting where the value of the transition variable depends on the lagged simulated value of that same variable.

Finally, the transition function \( F_i \) has usually been taken to be either the logistic or exponential transform functions. As the 2 figures below illustrate, the logistic nests the TAR model as \( \gamma_i\to \infty \) and collapses to the linear case as \( \gamma_i\to 0 \). The exponential on the other hand collapses to the linear case as \( \gamma_i \) approaches the limits and has a symmetric shape which is sometimes preferred for exchange rate modelling because of the perceived symmetric exchange rate adjustment behavior. Currently, only the logistic is considered in the **twinkle** package.

I follow a similar construction paradigm for the **twinkle** package as I have in my other packages. Namely, an S4 framework which includes steps for specifying the model, estimation, inference and tests, visual diagnostics, filtering, forecasting, simulation and rolling estimation/forecasting. I consider these to be the minimum methods for creating a complete modelling environment for econometric analysis. Additional methods are summarized in the following table:

Method | Description | Class | Example |
---|---|---|---|

setfixed<- | Set fixed parameters | [1] | >setfixed(spec)<-list(s1.phi0=0) |

setstart<- | Set starting parameters | [1] | >setstart(spec)<-list(s1.phi0=0) |

setbounds<- | Set parameter bounds | [1] | >setbounds(spec)<-list(s1.phi0=c(0,1)) |

nonlinearTest | Luukkonen Test | [1][2] | >nonlinearTest(fit, robust=TRUE) |

modelmatrix | model matrix | [1][2] | >modelmatrix(fit, linear=FALSE) |

coef | coef vector | [2][3] | >coef(fit) |

fitted | conditional mean | [2][3][4][5][6] | >fitted(fit) |

residuals | residuals | >residuals(fit) | |

states | conditional state probabilities | [2][3][4][5][6] | >states(fit) |

likelihood | log likelihood | [2][3] | >likelihood(fit) |

infocriteria | normalized information criteria | [2][3][7] | >infocriteria(fit) |

vcov | parameter covariance matrix | [2] | >vcov(fit) |

convergence | solver convergence | [2][7] | >convergence(fit) |

score | numerical score matrix | [2] | >score(fit) |

sigma | conditional sigma | [2][3][4][5][6] | >sigma(fit) |

as.data.frame | conditional density | [7] | >as.data.frame(roll) |

quantile | conditional quantiles | [2][3][4][5][6][7] | >quantile(fit) |

pit | conditional probability integral transformation | [2][3][7] | >pit(fit) |

show | summary | [1][2][3][4][5][6][7] | >fit |

The classes in column 3 are: [1] STARspec, [2] STARfit, [3] STARfilter, [4] STARforecast, [5] STARsim, [6] STARpath, [7] rollSTAR.

The model specification function has a number of options which I will briefly discuss in the section.

args(starspec) # ## function (mean.model = list(states = 2, include.intercept = c(1, ## 1), arOrder = c(1, 1), maOrder = c(0, 0), matype = 'linear', ## statevar = c('y', 's'), s = NULL, ylags = 1, statear = FALSE, ## yfun = NULL, xreg = NULL, transform = 'logis'), variance.model = list(dynamic = FALSE, ## model = 'sGARCH', garchOrder = c(1, 1), submodel = NULL, ## vreg = NULL, variance.targeting = FALSE), distribution.model = 'norm', ## start.pars = list(), fixed.pars = list(), fixed.prob = NULL, ## ...) ##

Upto 4 **states** may be modelled, with a choice of inclusion or exclusion of an intercept in each state (**include.intercept**), the number of AR (**arOrder**) and MA (**maOrder**) parameters per state and whether to include external regressors in each state (**xreg** should be a prelagged xts matrix). Note that the default for the moving average terms is to include them outside the states, but this can be changed by setting **matype**=’state’. The **statevar** denotes what the state variable is, with “y” being the self-exciting model and “s” a set of external regressors passed as a prelagged xts matrix to **s**. If the choice is “y”, the **ylags** should be a vector of the lags for the variable with a choice like* c(1,3)* denoting lag-1 and lag-3. Finally, the **yfun** option was discussed in the previous section and is a custom transformation function for y returning the same number of points as given.

The variance can be either static (default) or **dynamic** (logical), in which case it can be one of 3 GARCH **model**s (‘sGARCH’,’gjrGARCH’ or ‘eGARCH’) or ‘mixture’ as discussed previously. The rest of the options follow from those in the **rugarch** package in the case of GARCH type variance.

The same distributions as those in **rugarch** are implemented, and there is the option of passing fixed or starting parameters to the specification, although the methods ‘**setfixed< –**‘ and ‘**setstart<-**‘ allow this to be set once the specification has been formed. There is also a ‘**setbounds<-**‘ method for setting parameter bounds which for the unconstrained solvers (the default to use with this type of model) means using a logistic bounding transformation. Finally, the **fixed.probs** option allows the user to pass a set of fixed state probabilities as an xts matrix in which case the model is effectively linear and may be estimated quite easily.

It is probably useful to have the naming convention of the parameters used in the package should starting, fixed or bounds need to be set. These are summarized in the list below and generally follow the notation used in the representation of the model in the previous section:

- All state based variables are preceded by their state number (s1.,s2.,s3.,s4.)
- Conditional Mean Equation:
- intercept: phi0 (e.g. s1.phi0, s2.phi0)
- AR(p): phi1, …, phip (e.g. s1.phi1, s1.phi2, s2.phi1, s2.phi2)
- MA(q): psi1, …, psiq (e.g. s1.psi1, s1.psi2, s2.psi1, s2.psi2). Note that in the case of
**matype**=’linear’, the states are not used. - X(l): xi1, …, xil (e.g. s1.xi1, s2.xi2, x3.xi1)

- State Equation:
- scaling variable: gamma (e.g. s1.gamma)
- Threshold: c (e.g. s1.c)
- Threshold Variables (k): alpha2, …, alphak (e.g. s1.alpha2, s1.alpha3). Note that the first variable (alpha1) is constrained to be 1 for identification purposes so cannot be changed. This will always show up in the summary with NAs in the standard errors since it is not estimated.
- Threshold AR(1): beta (e.g. s1.beta)

- Variance Equation:
- sigma (s): If dynamic and mixture then s1.sigma, s2.sigma etc. If static then just sigma.
- GARCH parameters follow same naming as in the rugarch package

- Distribution:
- skew
- shape

I’ll define a simple specification to use with this post and based on examples from the twinkle.tests folder in the src distribution (under the inst folder). This is based on a weekly realized measure of the Nasdaq 100 index for the period 02-01-1996 to 12-10-2001, and the specification is for a simple lag-1 self-exciting model with AR(2) for each state.

require(quantmod) data(ndx) ndx.ret2 = ROC(Cl(ndx), na.pad = FALSE)^2 ndx.rvol = sqrt(apply.weekly(ndx.ret2, FUN = 'sum')) colnames(ndx.rvol) = 'RVOL' spec = starspec(mean.model = list(states = 2, arOrder = c(2, 2), statevar = 'y', ylags = 1))

Before proceeding, I also check the presence of STAR nonlinearity using the test of Luukkonen (1988) which is implemented in the **nonlinearTest** method with an option for also testing with the robust assumption (to heteroscedasticity):

tmp1 = nonlinearTest(spec, data = log(ndx.rvol)) tmp2 = nonlinearTest(spec, data = log(ndx.rvol), robust = TRUE) testm = matrix(NA, ncol = 4, nrow = 2, dimnames = list(c('Standard', 'Robust'), c('F.stat', 'p.value', 'Chisq.stat', 'p.value'))) testm[1, ] = c(tmp1$F.statistic, tmp1$F.pvalue, tmp1$chisq.statistic, tmp1$chisq.pvalue) testm[2, ] = c(tmp2$F.statistic, tmp2$F.pvalue, tmp2$chisq.statistic, tmp2$chisq.pvalue) print(testm, digit = 5) ## ## F.stat p.value Chisq.stat p.value ## Standard 3.7089 0.0014366 21.312 0.0016123 ## Robust 2.5694 0.0193087 15.094 0.0195396

We can safely reject the linearity assumption under the standard test at the 1% significance level, and under the robust assumption at the 5% significance level. Note that this example is taken from the excellent book of Zivot (2007) (chapter on nonlinear models) and the numbers should also agree with what is printed there.

Estimating STAR models is a challenging task, and for this purpose a number of options have been included in the package.

args(starfit) # ## function (spec, data, out.sample = 0, solver = 'optim', solver.control = list(), ## fit.control = list(stationarity = 0, fixed.se = 0, rec.init = 'all'), ## cluster = NULL, n = 25, ...) ## NULL

The **data** must be an xts object with the same time indices as any data already passed to the **STARspec** object and contain only numeric data without any missing values. The **out.sample** is used to indicate how many data points to optionally leave out in the estimation (from the end of the dataset) for use in out-of-sample forecasting later on when the estimated object is passed to the **starforecast** routine. Perhaps the most important choice to be made is the type of **solver** to use and it’s control parameters **solver.control** which should not be omitted. The following solvers and ‘strategies’ are included:

- optim. The preferred choice is the BFGS solver. The choice of solver is controlled by the
**method**option in the**solver.control**list. All parameter bounds are enforced through the use of a logistic transformation. - nlminb. Have had little luck getting the same performance as the BFGS solver.
- solnp. Will most likely find a local solution.
- cmaes. Even though it is a global solver, it requires careful tweaking of the control parameters (and there are many). This is the
**parma**package version of the solver. - deoptim. Another global solver. May be slow and require tweaking of the control parameters.
- msoptim. A multistart version of optim with option for using the
**cluster**option for parallel evaluation. The number of multi-starts is controlled by the**n.restarts**option in the**solver.control**list. - strategy. A special purpose optimization strategy for STAR problems using the BFGS solver. It cycles between keeping the state variables fixed and estimating the linear variables (conditional mean, variance and any distribution parameters), keeping the linear variables fixed and estimating the state variables, and a random re-start optimization to control for possibly local solutions. The argument
**n**in the routine controls the number of times to cycle through this strategy. The**solver.control**list should pass control arguments for the BFGS solver. This is somewhat related to concentrating the sum of squares methodology in terms of the estimation strategy, but does not minimize the sum of squares, opting instead for a proper likelihood evaluation.

The **strategy** and **msoptim** solver strategies should be the preferred choice when estimating STARMA models.

I continue with the example already covered in the specification section and estimate the model, leaving 50 points for out of sample forecasting and filtering later on:

mod = starfit(spec, data = log(ndx.rvol), out.sample = 50, solver = 'strategy', n = 8, solver.control = list(alpha = 1, beta = 0.4, gamma = 1.4, reltol = 1e-12)) show(mod) plot(mod) # ## ## *---------------------------------* ## * STAR Model Fit * ## *---------------------------------* ## states : 2 ## statevar : y ## statear : FALSE ## variance : static ## distribution : norm ## ## ## Optimal Parameters (Robust Standard Errors) ## ------------------------------------ ## Estimate Std. Error t value Pr(>|t|) ## s1.phi0 -3.54380 0.034260 -103.43760 0.000000 ## s1.phi1 -0.64567 0.426487 -1.51393 0.130043 ## s1.phi2 0.10950 0.319605 0.34262 0.731886 ## s2.phi0 -2.51982 0.849927 -2.96475 0.003029 ## s2.phi1 0.10902 0.214009 0.50944 0.610444 ## s2.phi2 0.17944 0.062210 2.88447 0.003921 ## s1.gamma 3.22588 1.941072 1.66190 0.096532 ## s1.c -2.52662 0.347722 -7.26620 0.000000 ## s1.alpha1 1.00000 NA NA NA ## sigma 0.39942 0.019924 20.04776 0.000000 ## ## LogLikelihood : -126.3 ## ## Akaike 1.0738 ## Bayes 1.1999 ## Shibata 1.0714 ## Hannan-Quinn 1.1246 ## ## r.squared : 0.3167 ## r.squared (adj) : 0.2913 ## RSS : 40.2 ## skewness (res) : -0.235 ## ex.kurtosis (res) : 0.4704 ## ## AR roots ## Moduli1 Moduli2 ## state_1 1.274 7.170 ## state_2 2.076 2.684

Filtering a dataset with an already estimated set of parameters has been already extensively discussed in a related post for the **rugarch** package. The method takes the following arguments:

args(starfilter) ## ## function (spec, data, out.sample = 0, n.old = NULL, rec.init = 'all', ...) ##

The most important argument is probably **n.old** and denotes, in the case that the new dataset is composed of the old dataset (on which estimation took place) and the new data, the number of points composing the original dataset. This is so as to use the same initialization values for certain recursions and return the exact same results for those points in the original dataset. The following example illustrates:

specf = spec setfixed(specf) < - as.list(coef(mod)) N = nrow(ndx.rvol) - 50 modf = starfilter(specf, data = log(ndx.rvol), n.old = N) print(all.equal(fitted(modf)[1:N], fitted(mod))) ## [1] TRUE print(all.equal(states(modf)[1:N], states(mod))) ## [1] TRUE

Nonlinear models are considerable more complex than their linear counterparts to forecast. For 1-ahead this is quite simple, but for n-ahead there is no closed form solution as in the linear case. Consider a general nonlinear first order autoregressive model:

\[

{y_t} = F\left( {{y_{t – 1}};\theta } \right) + {\varepsilon _t}

\]

The 1-step ahead forecast is simply:

\[

{{\hat y}_{t + 1\left| t \right.}} = E\left[ {{y_{t + 1}}\left| {{\Im _t}} \right.} \right] = F\left( {{y_t};\theta } \right)

\]

However, for n-step ahead, and using the Chapman-Kolmogorov relationship \( g\left( {{y_{t + h}}\left| {{\Im _t}} \right.} \right) = \int_{ – \infty }^\infty {g\left( {{y_{t + h}}\left| {{y_{t + h – 1}}} \right.} \right)g\left( {{y_{t + h – 1}}\left| {{\Im _t}} \right.} \right)d{y_{t + h – 1}}} \), we have:

\[E\left[ {{y_{t + h}}\left| {{\Im _t}} \right.} \right] = \int_{ – \infty }^\infty {E\left[ {{y_{t + h}}\left| {{y_{t + h – 1}}} \right.} \right]g\left( {{y_{t + h – 1}}\left| {{\Im _t}} \right.} \right)d{y_{t + h – 1}}}\]

where there is no closed form relationship since \( E\left[ {F\left( . \right)} \right] \ne F\left({E\left[ . \right]} \right) \).

The trick is to start at h=2:

\[

{{\hat y}_{t + 2\left| t \right.}} = \frac{1}{T}\sum\limits_{i = 1}^T {F\left( {{{\hat y}_{t + 1\left| t \right.}} + {\varepsilon _i};\theta } \right)}

\]

and using either quadrature integration or monte carlo summation obtain the expected value. Use that value for the next step, rinse and repeat.

In the **twinkle** package, both quadrature and monte carlo summation are options in the **starforecast** method:

args(starforecast) # ## function (fitORspec, data = NULL, n.ahead = 1, n.roll = 0, out.sample = 0, ## external.forecasts = list(xregfor = NULL, vregfor = NULL, ## sfor = NULL, probfor = NULL), method = c('an.parametric', ## 'an.kernel', 'mc.empirical', 'mc.parametric', 'mc.kernel'), ## mc.sims = NULL, ...) ## NULL

with added options for either parametric, empirical or kernel fitted distribution for the residuals. The method also allows for multiple dispatch methods by taking either an object of class **STARfit** or one of class **STARspec** (with fixed parameters and a dataset). The example below illustrates the different methods:

forc1 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'an.parametric', mc.sims = 10000) forc2 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'an.kernel', mc.sims = 10000) forc3 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'mc.empirical', mc.sims = 10000) forc4 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'mc.parametric', mc.sims = 10000) forc5 = starforecast(mod, n.roll = 2, n.ahead = 20, method = 'mc.kernel', mc.sims = 10000) show(forc1) par(mfrow = c(2, 3)) plot(forc1) plot(forc2) plot(forc3) plot(forc4) plot(forc5) # ## ## *------------------------------------* ## * STAR Model Forecast * ## *------------------------------------* ## Horizon : 20 ## Roll Steps : 2 ## STAR forecast : an.parametric ## Out of Sample : 20 ## ## 0-roll forecast [T0=2000-10-27]: ## Series ## T+1 -2.684 ## T+2 -2.820 ## T+3 -2.948 ## T+4 -3.061 ## T+5 -3.157 ## T+6 -3.231 ## T+7 -3.286 ## T+8 -3.324 ## T+9 -3.350 ## T+10 -3.368 ## T+11 -3.379 ## T+12 -3.387 ## T+13 -3.392 ## T+14 -3.395 ## T+15 -3.397 ## T+16 -3.398 ## T+17 -3.399 ## T+18 -3.400 ## T+19 -3.400 ## T+20 -3.400

The nice thing about the monte carlo method is that the density of each point forecast is now available, and used in the plot method to draw quantiles around that forecast. It can be extracted by looking at the slot *object@forecast$yDist*, which is list of length **n.roll**+1 of matrices of dimensions **mc.sims** by** n.ahead**.

Simulation in the **twinkle** package, like in **rugarch**, can be carried out directly on the estimated **STARfit** object else on a specification object of class **STARspec** with fixed parameters. Achieving equivalence between the two relates to start-up initialization conditions and is always a good check on reproducibility and code correctness, and shown in the example that follows:

sim = starsim(mod, n.sim = 1000, rseed = 10) path = starpath(specf, n.sim = 1000, prereturns = tail(log(ndx.rvol)[1:N], rseed = 10) all.equal(fitted(sim), fitted(path)) ## TRUE all.equal(states(sim), states(path)) ## TRUE

The **fitted** method extracts the simulated series as an **n.sim** by **m.sim** matrix, while the **states** method extracts the simulated state probabilities (optionally takes “type” argument with options for extracting the simulated raw dynamics or conditional simulated mean per state) and can be passed the argument **sim** to indicate which **m.sim** run to extract (default: 1). Passing the correct prereturns value and the same seed as in **starsim**, initializes the simulation from the same values as the test of equality shows between the 2 methods. Things become a little more complicated when using external regressors or GARCH dynamics, but with careful preparation the results should again be the same.

The final key modelling method, useful for backtesting, is that of the rolling estimation and 1-step ahead forecasting which has a number of options to define the type of estimation window to use as well as a **resume** method which re-estimates any windows which did not converge during the original run. This type of method has already been covered in related posts of **rugarch** so I will reserve a more in-depth demo for a later date.

This post provided an introduction to the use of the **twinkle** package which should hopefully make it to CRAN from bitbucket soon. It is still in beta, and it will certainly take some time to mature, so please report bugs or feel free to contribute patches. The package departs from traditional implementations, sparse as they are, in the area of STAR models by offering extensions in the form of (MA)(X) dynamics in the conditional mean, (AR) dynamics in the conditional state equation, a mixture model for the variance, and a softmax representation for the multi-state model. It brings a complete modelling framework, developed in the **rugarch** package, to STAR model estimation with a set of methods which are usually lacking elsewhere. It also brings, at least for the time being, a promise of user engagement (via the R-SIG-FINANCE mailing list) and maintenance.

Download and installation instructions can be found here.

[1] Hamilton, J. D. (1989). *A new approach to the economic analysis of nonstationary time series and the business cycle*. Econometrica: Journal of the Econometric Society, 357-384.

[2] Tong, H., & Lim, K. S. (1980). *Threshold Autoregression, Limit Cycles and Cyclical Data*. Journal of the Royal Statistical Society. Series B (Methodological), 245-292.

[3] Teräsvirta, T. (1994).* Specification, estimation, and evaluation of smooth transition autoregressive models*. Journal of the American Statistical association, 89(425), 208-218.

[4] Luukkonen, R., Saikkonen, P., & Teräsvirta, T. (1988). *Testing linearity against smooth transition autoregressive models*. Biometrika, 75(3), 491-499.

[5] Zivot, E., & Wang, J. (2007). *Modeling Financial Time Series with S-PLUS®* (Vol. 191). Springer.

My own keynote presentation covered smooth transition ARMA models and their implementation in a new package, and some results from an empirical application related to the forecast of the level and state of realized monthly volatility of the S&P500. I am in the process of adding some changes and enhancements following some very useful feedback during the conference and hope to release it to CRAN soon.

Beyond the talks, the most interesting part of the conference was the networking opportunity and the chance to meet new and interesting people and catch up with old acquaintances. If you couldn’t make it this year, do try to come next year, it is well worth it. In the meantime, if you are interested in R and finance, you might want to try and participate on the R-SIG-FINANCE mailing list. It’s a great place to ask questions, contribute ideas and engage in interesting discussions.

The clip below is from our last night’s outing at the Jazz Showcase (thanks to Dirk and Jerome)…

]]>

The latest version of rugarch (1.3-0) is available from the teatime repository on r-forge.

The realized GARCH (realGARCH) model of HHS2012 provides for an excellent framework for the joint modelling of returns and realized measures of volatility. Unlike the naive augmentation of GARCH processes by a realized measures, the realGARCH model relates the observed realized measure to the latent volatility via a measurement equation, which also includes asymmetric reaction to shocks, making for a very flexible and rich representation. Formally, let:

\[ \begin{gathered}

{y_t} = {\mu _t} + {\sigma _t}{z_t},{\text{ }}{z_t} \sim i.i.d\left( {0,1} \right) \\

\log \sigma _t^2 = \omega + \sum\limits_{i = 1}^q {{\alpha _i}\log {r_{t – i}} + \sum\limits_{i = 1}^p {{\beta _i}\log \sigma _{t – i}^2} } \\

\log r_t = \xi + \delta \log \sigma _t^2 + \tau \left( {{z_t}} \right) + {u_t},{\text{ }}{u_t} \sim N\left( {0,\lambda } \right) \\

\end{gathered}

\]

where we have defined the dynamics for the returns (\( y_t \)), the log of the conditional variance (\( \sigma^2_t \)) and the log of the realized measure (\( r_t \)). Note that the notation varies slightly from that of HHS2012, and the **rugarch ** vignette contains a footnote of the differences. The asymmetric reaction to shocks comes via the \( \tau\left(.\right) \) function which is based on the Hermite polynomials and truncated at the second level to give a simple quadratic form:

\[ \tau\left(z_t\right) = \eta_1z_t + \eta_2\left(z^2_t-1\right)

\]

which has the very convenient property that \( E\tau\left(z_t\right)=0 \). The function also forms the basis for the creation of a type of news impact curve \( \nu \left( z \right) \), defined as:

\[ \begin{gathered}

\nu \left( z \right) = E\left[ {\log {\sigma _t}\left| {{z_{t – 1}} = z} \right.} \right] – E\left[ {\log {\sigma _t}} \right] = \delta \tau \left( z \right) \\

\end{gathered}

\]

so that \( 100\times\nu\left(z\right) \) is the percent change in volatility as a function of the standardized innovations.

A key feature of this model is that it preserves the ARMA structure which characterize many standard GARCH models and adapted here from Proposition 1 of the HHS2012 paper:

\[ \begin{gathered}

\log \sigma _t^2 = {\mu _\sigma } + \sum\limits_{i = 1}^{p \vee q} {\left( {\delta {\alpha _i} + {\beta _i}} \right)\log \sigma _{t – 1}^2 + } \sum\limits_{j = 1}^q {{\alpha _j}{w_{t – j}}} \\

\log {r_t} = {\mu _r} + \sum\limits_{i = 1}^{p \vee q} {\left( {\delta {\alpha _i} + {\beta _i}} \right)\log {r_{t – 1}} + {w_t}} – \sum\limits_{j = 1}^{p \vee q} {{\beta _j}{w_{t – j}}} \\

{\mu _\sigma } = \omega + \xi \sum\limits_{i = 1}^q {{\alpha _i}} ,{\mu _r} = \delta \omega + \left( {1 – \sum\limits_{i = 1}^p {{\beta _i}} } \right)\xi \\

\end{gathered}

\]

where \( w_t=\tau\left(z_t\right)+u_t \), \( {\mu _\sigma } = \omega + \xi \sum\limits_{i = 1}^q {{\alpha _i}} \) and \( {\mu _r} = \delta \omega + \left( {1 – \sum\limits_{i = 1}^p {{\beta _i}} } \right)\xi \), and the convention \( {\beta _i} = {\alpha _j} = 0 \) for \( i>p \) and \( j\lt p \). It is therefore a simple matter to show that the persistence \(\hat P\) of the process is given by:

\[

\hat P = \sum\limits_{i = 1}^p {{\beta _i} + } \delta \sum\limits_{i = 1}^q {{\alpha _i}}

\]

while the unconditional (long-run) variance may be written as:

\[

{\hat \sigma ^2} = \frac{{\omega + \xi \sum\limits_{i = 1}^q {{\alpha _i}} }}

{{1 – \hat P}}

\]

For the demonstration, I will use the SPY dataset which accompanied the paper of HHS2012 including the realized volatility based on the realized kernel (parzen) method of Barndorff-Nielsen et al (2008) (this dataset is now included in **rugarch** as of version 1.3-0). In general, the highfrequency package can be used on intraday data to generate any number of realized volatility measures (and is currently the only package AFAIK that provides this functionality in R).

library(rugarch) require(xts) data(spyreal) spec = ugarchspec(mean.model = list(armaOrder = c(0, 0), include.mean = FALSE), variance.model = list(model = 'realGARCH', garchOrder = c(2, 1))) setbounds(spec)<-list(alpha2=c(-1,1)) fit = ugarchfit(spec, spyreal[, 1] * 100, solver = 'hybrid', realizedVol = spyreal[,2] * 100)

Notice how the additional realized measure is passed via the *realizedVol* argument which must be an xts object with matching index as the returns *data*. Also note the use of the hybrid solver which is the recommended method for solving this type of model.

I’m going to compare the results with those of HHS2012:

cf = coef(fit) se = fit@fit$matcoef[, 2] names(se) = names(cf) # HHS2012 divide the period into in-(T=1492) and out- of sample, even though they are estimating the model using the entire dataset (T=1659)...no comment. benchmark.LL = c('logL' = -2388.75, 'pLogL' = -1710.34) benchmark.pars = c('logh0' = -0.289213754, 'omega' = 0.041246175, 'alpha1' = 0.450676773, 'alpha2' = -0.176042911, 'beta1' = 0.701210369, 'xi' = -0.179994885, 'delta' = 1.037491982, 'eta1' = -0.067809509, 'eta2' = 0.07015778, 'lambda' = sqrt(0.145369801)) benchmark.se = c('logh0' = 0.278787604, 'omega' = 0.015515016, 'alpha1' = 0.038027571, 'alpha2' = 0.06335417, 'beta1' = 0.055974942, 'xi' = 0.043611245, 'delta' = 0.057726499, 'eta1' = 0.010309694, 'eta2' = 0.006620828, 'lambda' = 0.00594321) # rugarch does not estimate h0, instead uses either mean(residuals^2), else a choice of variance targeting with options rugarch.LL = c('logL' = sum(-fit@fit$log.likelihoods[1:1492]), 'pLogL' = sum(-fit@fit$partial.log.likelihoods[1:1492])) rugarch.pars = c('logh0' = NA, 'omega' = cf['omega'], 'alpha1' = cf['alpha1'], 'alpha2' = cf['alpha2'], 'beta1' = cf['beta1'], 'xi' = cf['xi'], 'delta' = cf['delta'], 'eta1' = cf['eta11'], 'eta2' = cf['eta21'], 'lambda' = cf['lambda']) rugarch.se = c('logh0' = NA, 'omega' = se['omega'], 'alpha1' = se['alpha1'], 'alpha2' = se['alpha2'], 'beta1' = se['beta1'], 'xi' = se['xi'], 'delta' = se['delta'], 'eta1' = se['eta11'], 'eta2' = se['eta21'], 'lambda' = se['lambda']) names(rugarch.pars) = names(rugarch.se) = c('logh0', 'omega', 'alpha1', 'alpha2', 'beta1', 'xi', 'delta', 'eta1', 'eta2', 'lambda') parsdf = cbind(benchmark.pars, rugarch.pars, benchmark.pars - rugarch.pars) sedf = cbind(benchmark.se, rugarch.se, benchmark.se - rugarch.se) LRE.vars = -log(abs(rugarch.pars - benchmark.pars)/abs(benchmark.pars), base = 10) LRE.se = -log(abs(rugarch.se - benchmark.se)/abs(benchmark.se), base = 10) test = cbind(LRE.vars, LRE.se) tmp1 = t(cbind(rugarch = rugarch.pars, benchmark = benchmark.pars)) tmp2 = t(cbind(rugarch = rugarch.se, benchmark = benchmark.se)) # print the results: ## parameters: ## logh0 omega alpha1 alpha2 beta1 xi delta eta1 eta2 lambda ## rugarch NA 0.0451 0.4766 -0.2027 0.7050 -0.1898 1.027 -0.0616 0.0718 0.3815 ## benchmark -0.2892 0.0412 0.4507 -0.1760 0.7012 -0.1800 1.038 -0.0678 0.0702 0.3813 ## ## standard errors: ## logh0 omega alpha1 alpha2 beta1 xi delta eta1 eta2 lambda ## rugarch NA 0.0137 0.0289 0.0450 0.0381 0.0390 0.0403 0.0097 0.0062 0.0066 ## benchmark 0.2788 0.0155 0.0380 0.0634 0.0560 0.0436 0.0577 0.0103 0.0066 0.0059 ## ## Log Relative Error Test: ## logh0 omega alpha1 alpha2 beta1 xi delta eta1 eta2 lambda ## LRE.vars NA 1.0293 1.2408 0.8191 2.2627 1.2640 2.012 1.041 1.623 3.1413 ## LRE.se NA 0.9355 0.6212 0.5374 0.4947 0.9733 0.519 1.204 1.208 0.9435

The results are pretty close, and all parameters are found to be highly significant. Note that unlike HHS2012, **rugarch** does not include \( h_0 \) in the parameter estimation set but instead, like the other models, estimates it based on the filtered squared residuals. Finally, we reproduce the news impact curve from their paper:

ni = newsimpact(fit, z = seq(-2, 2, length.out = 100)) plot(ni$zx, (ni$zy), ylab = ni$yexpr, xlab = ni$xexpr, type = 'l', main = 'News Impact realGARCH') abline(v = 0) abline(h = 0) grid()

The interested reader should read the HHS2012 paper which has some very good insights. In particular, the coefficient on the log conditional variance in the measurement equation (\(\delta\)) provides interesting insights depending on whether open-close or close-close returns are used.

In **rugarch**, it is possible to create both rolling 1-ahead forecasts (assuming the *out.sample* option was used in the estimation) and long-run n-ahead forecasts, and combinations of both. In the realGARCH model, n.ahead>1 forecasts contain uncertainty because \( u_t \) is not known. As such, simulation methods are used and the *ugarchforecast* routine takes on additional arguments related to this. These are *n.sim* for the number of random samples to use per period for the generation of the discrete time density of the forecast variance (and also the realized volatility), and *returnDistribution* (logical) indicates whether to include these estimates of the density in the returned object. Whatever the case, the point estimate is always returned and based on the mean of the sampled values.

The first example below demonstrates the equivalence of the ugarchfilter and ugarchforecast (using either a uGARCHfit or uGARCHspec objects) methods.

fit = ugarchfit(spec, spyreal[, 1] * 100, out.sample = 25, solver = 'hybrid', realizedVol = spyreal[, 2] * 100) specf = spec setfixed(specf) < - as.list(coef(fit)) filt = ugarchfilter(specf, data = spyreal[, 1] * 100, n.old = nrow(spyreal) - 25, realizedVol = spyreal[, 2] * 100) forc1 = ugarchforecast(fit, n.ahead = 1, n.roll = 25) # forecast from spec forc2 = ugarchforecast(specf, n.ahead = 1, n.roll = 25, data = spyreal[, 1] * 100, out.sample = 25, realizedVol = spyreal[, 2] * 100) filts = tail(sigma(filt), 25) colnames(filts) = 'filter' forcs1 = xts(sigma(forc1)[1, ], move(as.Date(names(sigma(forc1)[1, ])), by = 1)) forcs2 = xts(sigma(forc2)[1, ], move(as.Date(names(sigma(forc2)[1, ])), by = 1)) colnames(forcs1) = 'fit2forecast' colnames(forcs2) = 'spec2forecast' ftest = cbind(filts, forcs1, forcs2) # last forecast is completely out of sample, so not available from the # filter method (which filters given T-1) print(round(ftest, 5)) ## filter fit2forecast spec2forecast ## 2008-07-28 1.1065 1.1065 1.1065 ## 2008-07-29 1.0131 1.0131 1.0131 ## 2008-07-30 0.9885 0.9885 0.9885 ## 2008-07-31 1.0828 1.0828 1.0828 ## 2008-08-01 1.0685 1.0685 1.0685 ## 2008-08-04 1.1434 1.1434 1.1434 ## 2008-08-05 1.0460 1.0460 1.0460 ## 2008-08-06 1.0351 1.0351 1.0351 ## 2008-08-07 0.9206 0.9206 0.9206 ## 2008-08-08 0.9933 0.9933 0.9933 ## 2008-08-11 1.0083 1.0083 1.0083 ## 2008-08-12 0.9368 0.9368 0.9368 ## 2008-08-13 0.9564 0.9564 0.9564 ## 2008-08-14 1.0243 1.0243 1.0243 ## 2008-08-15 0.9903 0.9903 0.9903 ## 2008-08-18 0.9432 0.9432 0.9432 ## 2008-08-19 0.9751 0.9751 0.9751 ## 2008-08-20 0.9453 0.9453 0.9453 ## 2008-08-21 1.0326 1.0326 1.0326 ## 2008-08-22 0.9930 0.9930 0.9930 ## 2008-08-25 0.8638 0.8638 0.8638 ## 2008-08-26 0.9082 0.9082 0.9082 ## 2008-08-27 0.9154 0.9154 0.9154 ## 2008-08-28 0.8658 0.8658 0.8658 ## 2008-08-29 0.8235 0.8235 0.8235 ## 2008-09-01 NA 0.8103 0.8103

The advantage of being able to call the ugarchforecast method with a uGARCHspec object (with fixed pars) should be obvious. Anytime we have new data arriving, we can simply augment the old data and create new forecasts without having to re-estimate the model (unless we wish to update the parameter set). In the next example, a long-horizon forecast is created an compared to the unconditional value:

set.seed(55) forc3 = ugarchforecast(fit, n.ahead = 400, n.sim = 5000) plot(sigma(forc3), type = 'l', main = 'realGARCH long-run forecast') abline(h = sqrt(uncvariance(fit)), col = 2) legend('topright', c('long-run forecast', 'unconditional value'), col = 1:2, lty = c(1, 1), bty = 'n')

The final example shows how the forecast density is extracted from the forecasted object. This is an array of dimensions (n.ahead x n.sim x n.roll+1).

set.seed(55) forc4 = ugarchforecast(fit, n.ahead = 25, n.sim = 10000) # dim(forc4@forecast$sigmaDF) sigmaDF = forc4@forecast$sigmaDF meansig = sqrt(exp(rowMeans(log(sigmaDF[, , 1]^2)))) boxplot(t(sigmaDF[, , 1]), main = '25-ahead volatility forecast (realGARCH)', col = 'orange') points(as.numeric(meansig), col = 'green') # note that for the 1-ahead there is no uncertainty (unless we were doing this Bayes-style so that parameter uncertainty would have an impact).

There are 2 ways to simulate from GARCH models in the **rugarch** package: either directly from an estimated object (uGARCHfit class) or from a GARCH spec (uGARCHspec with fixed parameters), similarly to the ugarchforecast method. However, instead of having a single dispatch method, there is ugarchsim (for the uGARCHfit class) and ugarchpath (for the uGARCHspec class). In the example that follows I show how to generate equivalent values from the two methods:

spec = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'realGARCH', garchOrder = c(1, 1))) fit = ugarchfit(spec, spyreal[, 1] * 100, out.sample = 25, solver = 'hybrid', realizedVol = spyreal[, 2] * 100) specf = spec setfixed(specf) < - as.list(coef(fit)) T = nrow(spyreal) - 25 sim1 = ugarchsim(fit, n.sim = 1000, m.sim = 1, n.start = 0, startMethod = 'sample', rseed = 12) sim2 = ugarchsim(fit, n.sim = 1000, m.sim = 1, n.start = 0, startMethod = 'sample', rseed = 12, prereturns = as.numeric(tail(spyreal[1:T, 1] * 100, 1)), presigma = as.numeric(tail(sigma(fit), 1)), preresiduals = as.numeric(tail(residuals(fit), 1)), prerealized = as.numeric(tail(spyreal[1:T, 2] * 100, 1))) sim3 = ugarchpath(specf, n.sim = 1000, m.sim = 1, n.start = 0, rseed = 12, prereturns = as.numeric(tail(spyreal[1:T, 1] * 100, 1)), presigma = as.numeric(tail(sigma(fit), 1)), preresiduals = as.numeric(tail(residuals(fit), 1)), prerealized = as.numeric(tail(spyreal[1:T, 2] * 100, 1))) print(cbind(head(sigma(sim1)), head(sigma(sim2)), head(sigma(sim3)))) ## [,1] [,2] [,3] ## T+1 1.0735 1.0735 1.0735 ## T+2 0.9836 0.9836 0.9836 ## T+3 1.1091 1.1091 1.1091 ## T+4 1.0306 1.0306 1.0306 ## T+5 0.9609 0.9609 0.9609 ## T+6 0.8771 0.8771 0.8771

Again, it should be obvious what the applications are for ugarchpath in a live environment.

The final part of this article provides a small empirical application comparing the realGARCH and eGARCH models for risk management.

library(parallel) cl = makePSOCKcluster(5) spec1 = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'realGARCH', garchOrder = c(1, 1))) spec2 = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'eGARCH', garchOrder = c(1, 1))) roll1 = ugarchroll(spec1, spyreal[, 1] * 100, forecast.length = 500, solver = 'hybrid', refit.every = 25, refit.window = 'recursive', realizedVol = spyreal[, 2] * 100, cluster = cl) roll2 = ugarchroll(spec2, spyreal[, 1] * 100, forecast.length = 500, refit.every = 25, refit.window = 'recursive', cluster = cl) report(roll1) ## VaR Backtest Report ## =========================================== ## Model: realGARCH-norm ## Backtest Length: 500 ## Data: ## ## ========================================== ## alpha: 1% ## Expected Exceed: 5 ## Actual VaR Exceed: 10 ## Actual %: 2% ## ## Unconditional Coverage (Kupiec) ## Null-Hypothesis: Correct Exceedances ## LR.uc Statistic: 3.914 ## LR.uc Critical: 3.841 ## LR.uc p-value: 0.048 ## Reject Null: YES ## ## Conditional Coverage (Christoffersen) ## Null-Hypothesis: Correct Exceedances and ## Independence of Failures ## LR.cc Statistic: 4.323 ## LR.cc Critical: 5.991 ## LR.cc p-value: 0.115 ## Reject Null: NO ## ## report(roll2) ## VaR Backtest Report ## =========================================== ## Model: eGARCH-norm ## Backtest Length: 500 ## Data: ## ## ========================================== ## alpha: 1% ## Expected Exceed: 5 ## Actual VaR Exceed: 16 ## Actual %: 3.2% ## ## Unconditional Coverage (Kupiec) ## Null-Hypothesis: Correct Exceedances ## LR.uc Statistic: 15.467 ## LR.uc Critical: 3.841 ## LR.uc p-value: 0 ## Reject Null: YES ## ## Conditional Coverage (Christoffersen) ## Null-Hypothesis: Correct Exceedances and ## Independence of Failures ## LR.cc Statistic: 16.527 ## LR.cc Critical: 5.991 ## LR.cc p-value: 0 ## Reject Null: YES ## ## plot(as.xts(as.data.frame(roll1)[, 'Sigma', drop = FALSE]), main = 'realGARCH vs eGARCH\n(out-of-sample volatility forecast)', auto.grid = FALSE, minor.ticks = FALSE) lines(as.xts(as.data.frame(roll2)[, 'Sigma', drop = FALSE]), col = 2) legend('topleft', c('realGARCH', 'eGARCH'), col = 1:2, lty = c(1, 1), bty = 'n') grid()

Peachy!

As widely reported in various applications and papers, there is no denying the value of including realized measures in the modelling process, particularly when done in a flexible and rich framework as offered by the realized GARCH model.

Happy 2014.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., & Shephard, N. (2008). Designing realized kernels to measure the ex post variation of equity prices in the presence of noise. Econometrica, 76(6), 1481-1536.

Hansen, P. R., Huang, Z., & Shek, H. H. (2012). Realized GARCH: a joint model for returns and realized measures of volatility. Journal of Applied Econometrics, 27(6), 877-906.

]]>Simply put, it is a method of allocating equal risk shares to each asset in the portfolio. In more traditional allocation schemes, equity, being the riskiest asset (and hence providing the highest reward), has typically received the lion’s share. With RP, equalization of risk contribution means that equity and other similarly risky assets receive a reduced allocation and low risk assets such as government bonds an increased allocation. As a result, in order to achieve ‘equity-like’ total returns, leverage has typically been used in this context, possibly using a target risk level. Under certain circumstances, namely when all assets have the same risk-to-reward ratios, the RP allocation is equivalent to the tangent portfolio. In the case where only the risks are equal, then the RP allocation is generally equivalent to the equal weight (1/n) portfolio, whilst it coincides with the minimum risk portfolio when cross diversification is highest (though that may be a tricky concept to describe when going beyond the mean-variance paradigm).

Of course, like all portfolio problems, we are speaking of our expectations of future risk. And this is one of the key benefits, argue the proponents of RP, since variance is much more stable and easier to forecast than the returns. Therefore, lying somewhere between the minimum risk and optimal risk portfolios, making use of only the ‘risk’ rather than the reward in the calculation, and backed up by an apparently stellar performance resulting from the recent decline in US Treasury rates, one might wonder why portfolio allocation is still taught at university.

In this author’s opinion, RP is not a proper model of asset allocation, does not contain that key ingredient called an active forecast which is after all what managers are rewarded for producing, and is likely to create complacency because of its oversimplified approach to the problem of forecast uncertainty. In this article I provide 2 applications based on different datasets with different characteristics in order to highlight some issues with the RP approach. I examine different risk\(^1\) measures, within a simple parametric DGP framework for generating forecasts, and find that even with such a simple approach RP cannot outperform the optimal risk-reward strategy or even the minimum risk strategy in a well diversified universe.

Formally, consider the marginal contribution to risk (MCR) of each asset (\(i\) of \(n\)) given a risk measure \( \rho\left(x\right) \):

\[

MC{R_i} = \frac{{\partial \rho \left( x \right)}}{{\partial {x_i}}}

\]

which when multiplied by the asset’s share and summed leads to the total portfolio risk (TR):

\[

TR = \sum\limits_{\forall i} {{x_i}MC{R_i}}

\]

One way to solve the general RP problem was already suggested by Maillard, Roncalli and Teïletche (2010) as a squared error minimization type problem which can be easily solved with an SQP solver:

\[

\begin{gathered}

\mathop {{\text{min}}}\limits_{\mathbf{x}} {\text{ }}\sum\limits_{j = 1}^n {\sum\limits_{i = 1}^n {{{\left( {{x_i}MC{R_i} – {x_j}MC{R_j}} \right)}^2}} }\\

s.t. \\

\sum\limits_{\forall i} {{x_i} = b} \\

{\mathbf{x}} \geqslant 0 \\

\end{gathered}

\]

where \( b \) is the budget constraint. It should be clear that MCR is just the gradient of the risk function. Since all NLP risk functions in the **parma** package have analytic gradients (see the vignette for details), then the extension of RP to other measures beyond variance is quite trivial. For the case of variance, a fast algorithm using Newton’s method proposed by Chaves, Hsu and Shakernia (2012) is also available and tests conducted indicate that it is upto 100x faster than the equivalent SQP optimization (though we are already talking about tenths of a second anyway).

The dataset consists of weekly (Friday) returns of international equity ETFs for the period November-2000 to December 2013. It is a highly correlated dataset with low diversification possibilities in a long-only setup. As such, it serves to illustrate the similarities between the equally weighted and RP portfolios.

Figure 1 aptly illustrates the very high correlation of the dataset whilst Figure 2 provides an indication of the risk and risk-return profile of the dataset, though one should bear in mind that this is for the entire period and does not necessarily reflect the situation at any particular point in time.

The backtest uses a static normal copula to model the covariance, with first stage AR(1)-GARCH(1,1)-JSU dynamics. The choice of a static rather than dynamic correlation model was motivated by the size of the dataset which is quite small, whilst for the conditional first stage GARCH dynamics the JSU distribution was used to account for univariate higher moments. Finally, and again motivated by the size of the dataset, an empirical transformation was used (see Genest, Ghoudi and Rivest (1995) and the **rmgarch** vignette for more details). At each time T (a Friday), data from 1:T was used (hence an expanding window choice) for the modelling, after which the T+1 simulated forecast density (the scenario) was generated for use in the optimization model. For the Mean-Variance model, the covariance of the simulated T+1 forecast density was used, whilst for all other risk measures the actual scenario was used in the optimization.

Three models were evaluated: the minimum risk (MR), risk parity (RP) and optimal risk-reward (OPT) using fractional programming. Within those models, four risk measures were evaluated: mean-variance (EV), mean absolute deviation (MAD) and lower partial moments of orders 2 and 4 (LPM2 and LPM4) representing different aversions to below threshold losses\( ^{2} \).

The weights generated by each model where then used to buy the underlying assets at the close of the day **after** (i.e. Monday) and held until the following re-formation period (i.e. the next Monday). In this way, there was no contemporaneous signal-portfolio formation bias, which may be significant for weekly and higher frequency models. Trading costs were not included nor was price impact and other related costs.

Table 1 provides a summary of the results (using the equal weight as the benchmark), with the key points being:

- Among the allocation models, there is a negligible difference between the different risk measures. This may represent the frequency of the dataset and/or the DGP used (normal copula).
- The RP and equal weight portfolios appear very close, as might be expected from the very close ex-ante variance of the assets and high correlation.
- The MR portfolio shows dismal performance, and this is in line with the little diversification potential of this dataset.
- The OPT portfolio appears superior to the RP model on all measures. Statistical significance of this statement was not checked, and with the exception of the Sharpe ratio one is challenged to do so for a large number of measures even on moderately sized datasets.

SET-1 | RP[EV] | RP[MAD] | RP[LPM2] | RP[LPM4] | MR[EV] | MR[MAD] | MR[LPM2] | MR[LPM4] | OPT[EV] | OPT[MAD] | OPT[LPM2] | OPT[LPM4] | EW |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

CAGR | 7.04 | 6.99 | 6.97 | 7.01 | 3.44 | 2.35 | 3.34 | 4.60 | 10.22 | 10.00 | 9.91 | 9.70 | 7.73 |

SD | 32.99 | 32.98 | 32.95 | 32.94 | 28.15 | 28.14 | 27.95 | 28.01 | 34.89 | 34.82 | 34.83 | 34.81 | 34.03 |

Up | 54.17 | 54.03 | 53.88 | 54.03 | 55.34 | 54.90 | 54.90 | 55.64 | 54.61 | 54.90 | 55.05 | 54.76 | 54.32 |

MaxDrawdown | 77.51 | 77.52 | 77.46 | 77.39 | 74.54 | 75.49 | 74.44 | 73.35 | 71.37 | 71.20 | 71.08 | 70.77 | 77.74 |

CAPM[alpha] | -0.62 | -0.67 | -0.69 | -0.65 | -3.49 | -4.53 | -3.57 | -2.36 | 2.98 | 2.79 | 2.68 | 2.49 | 0.00 |

CAPM[beta] | 0.97 | 0.97 | 0.97 | 0.97 | 0.79 | 0.79 | 0.79 | 0.79 | 0.96 | 0.96 | 0.96 | 0.96 | |

Timing | 0.98 | 0.98 | 0.98 | 0.98 | 0.87 | 0.86 | 0.87 | 0.87 | 1.05 | 1.05 | 1.05 | 1.05 | |

Sharpe | 0.16 | 0.16 | 0.16 | 0.16 | 0.06 | 0.02 | 0.06 | 0.10 | 0.24 | 0.23 | 0.23 | 0.22 | 0.17 |

Information | 0.16 | 0.16 | 0.16 | 0.16 | 0.06 | 0.02 | 0.06 | 0.10 | 0.24 | 0.24 | 0.23 | 0.23 | 0.18 |

Calmar | 0.09 | 0.09 | 0.09 | 0.09 | 0.05 | 0.03 | 0.04 | 0.06 | 0.14 | 0.14 | 0.14 | 0.14 | 0.10 |

Skew | -0.28 | -0.28 | -0.28 | -0.28 | -0.62 | -0.67 | -0.59 | -0.54 | -0.04 | -0.05 | -0.03 | -0.02 | -0.22 |

Kurtosis | 6.81 | 6.80 | 6.77 | 6.76 | 10.92 | 10.79 | 10.15 | 10.16 | 8.16 | 7.90 | 7.81 | 7.73 | 6.21 |

Finally, Figure 3 provides the ‘eye-catching’ illustration of terminal wealth trajectories of the various allocation models under the variance risk measure.

This is a more typical dataset for active asset allocation based on a more diverse universe of asset classes, though we are still limited in this application by readily available instruments and history within an open-data paradigm. The set covers the period June-2006 to December-2013, and I have again used weekly (Friday) returns. The backtest follows the exact methodology of Set 1.

Figure 4 displays good diversification potential in terms of correlations, whilst Figure 5 provides an indication of the risk and risk-return profile of the dataset, with the same caveats as mentioned for Figure 2.

The results in Table 2 provide for some very interesting insights. The higher diversification potential of this dataset has resulted in an exact opposite ranking for the MR portfolio which now has the highest Sharpe ratio, though a much lower CAGR, and the OPT portfolio again beats the RP portfolio. However, the RP portfolio now comfortably outperforms the equal weight portfolio, in line with expectations, given that the risks in the unconditional dataset were not very close (unlike set 1).

SET-2 | RP[EV] | RP[MAD] | RP[LPM2] | RP[LPM4] | MR[EV] | MR[MAD] | MR[LPM2] | MR[LPM4] | OPT[EV] | OPT[MAD] | OPT[LPM2] | OPT[LPM4] | EW |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

CAGR | 9.35 | 9.38 | 9.29 | 9.37 | 8.95 | 8.98 | 8.91 | 8.96 | 11.87 | 11.55 | 11.47 | 12.25 | 10.41 |

SD | 8.17 | 8.17 | 8.08 | 8.23 | 5.14 | 5.13 | 5.15 | 5.46 | 8.89 | 8.94 | 8.76 | 9.35 | 23.31 |

Up | 60.46 | 60.20 | 60.46 | 59.95 | 62.24 | 61.48 | 62.24 | 64.29 | 65.31 | 65.56 | 65.56 | 66.58 | 54.08 |

MaxDrawdown | 24.68 | 24.86 | 24.59 | 24.83 | 15.21 | 15.07 | 15.53 | 17.34 | 24.83 | 24.77 | 24.40 | 26.07 | 55.83 |

CAPM[alpha] | 4.67 | 4.68 | 4.67 | 4.75 | 6.37 | 6.41 | 6.32 | 6.05 | 7.66 | 7.39 | 7.35 | 7.94 | |

CAPM[beta] | 0.30 | 0.30 | 0.29 | 0.29 | 0.10 | 0.10 | 0.10 | 0.13 | 0.24 | 0.24 | 0.23 | 0.25 | |

Timing | 0.73 | 0.73 | 0.73 | 0.73 | 0.34 | 0.34 | 0.35 | 0.45 | 0.64 | 0.63 | 0.63 | 0.63 | |

Sharpe | 0.98 | 0.98 | 0.98 | 0.98 | 1.48 | 1.49 | 1.47 | 1.40 | 1.18 | 1.14 | 1.15 | 1.16 | 0.39 |

Information | 0.99 | 1.00 | 0.99 | 0.99 | 1.49 | 1.50 | 1.48 | 1.41 | 1.20 | 1.15 | 1.17 | 1.18 | 0.39 |

Calmar | 0.38 | 0.38 | 0.38 | 0.38 | 0.59 | 0.60 | 0.57 | 0.52 | 0.48 | 0.47 | 0.47 | 0.47 | 0.19 |

Skew | -0.67 | -0.70 | -0.69 | -0.66 | -0.38 | -0.35 | -0.38 | -0.80 | -1.17 | -1.35 | -1.28 | -0.85 | -0.09 |

Kurtosis | 3.16 | 3.16 | 3.33 | 3.54 | 4.75 | 4.73 | 4.74 | 5.71 | 7.39 | 7.88 | 8.13 | 9.02 | 6.48 |

It is certainly fashionable to publish about uncertainty of inputs in the portfolio literature. Michaud (1989) did it, DeMiguel, Garlappi and Uppal (2009) did it (see this blog for a critique), and so have the proponents of Risk Parity. I don’t see how an investment manager can justify the use of any of these methodologies since he is rewarded for the quality of his inputs (active bets), though they can and certainly should serve a role in sensitivity analysis. If everyone could just allocate resources without a forecast then we would not need investment/resource managers. Even with simple econometric based forecasts such as those generated from a ‘simple’ AR(1) model we still managed to beat the RP and EW portfolios. Perhaps a little more attention should be given to the modelling of the underlying dynamics and a little less to ‘fashionable’ trends in asset allocation and catchy phrases meant to provide yet another sales drive, particularly given the resulting highly levered bets on a certain asset class whose rally depends heavily on a printing press which is fast running out of steam.

If you want the code to replicate the results or the RP code for use with **parma**, contact me by email stating your real name and affiliation.

Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research logistics quarterly, 9(3-4), 181-186.

Chaves, D., Hsu, J., Li, F., & Shakernia, O. (2012). Efficient Algorithms for Computing Risk Parity Portfolio Weights. The Journal of Investing, 21(3), 150-163.

DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy?. Review of Financial Studies, 22(5), 1915-1953.

Genest, C., Ghoudi, K., & Rivest, L. P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82(3), 543-552.

Maillard, S., Roncalli, T., & Teïletche, J. (2010). The Properties of Equally Weighted Risk Contribution Portfolios. The Journal of Portfolio Management, 36(4), 60-70.

Michaud, R. O. (1989). The Markowitz optimization enigma: is’ optimized’optimal?. Financial Analysts Journal, 31-42.

Stoyanov, S. V., Rachev, S. T., & Fabozzi, F. J. (2007). Optimal financial portfolios. Applied Mathematical Finance, 14(5), 401-436.

\( ^{1} \) Risk, despite the numerous papers written on risk parity including the seminal one by Maillard, Roncalli and Teïletche (2010) (although the concept appears to have originated as far back as 1996 with Ray Dalio of Bridgewater Associates, with the actual term ‘Risk Parity’ coined by E.Qian of PanAgora in 2006) is certainly not equivalent to variance. It can be any number of measures deemed appropriate (see the **parma** vignette section for a discussion of risk and deviation measures), and this blog article shows how the **parma** package can be used to calculate RP under a number of different measures in addition to variance, including Mean Absolute Deviation (MAD) and Lower Partial Moments (LPM).

\( ^{2} \) The threshold was based on the portfolio mean which is a case with especially desirable properties as discussed in the **parma** vignette.

The conditional co-moments of \( \mathbf{r}_t \) of order 3 and 4 are represented as tensor matrices

\[ \label{eq1}

\mathbf{M}_{t}^3 = \mathbf{A} \mathbf{M}_{f,t}^3(\mathbf{A} \otimes \mathbf{A})’, \quad \\

\mathbf{M}_{t}^4 = \mathbf{A} \mathbf{M}_{f,t}^4(\mathbf{A} \otimes \mathbf{A} \otimes \mathbf{A})’

\]

where \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \) are the \( N \times N^2 \) conditional third comoment matrix and the \( N \times N^3 \) conditional fourth comoment matrix of the factors, respectively. \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \), are defined as

\[ \begin{eqnarray}

\mathbf{M}_{f,t}^3 & =&

\begin{bmatrix}

\mathbf{M}_{1,f,t}^3,\mathbf{M}_{2,f,t}^3,\ldots,\mathbf{M}_{N,f,t}^3

\end{bmatrix}\\

\mathbf{M}_{f,t}^4 & = &

\begin{bmatrix}

\mathbf{M}_{11,f,t}^4,\mathbf{M}_{12,f,t}^4,\ldots,\mathbf{M}_{1N,f,t}^4|\ldots|\mathbf{M}_{N1,f,t}^4,\mathbf{M}_{N2,f,t}^4,\ldots,\mathbf{M}_{NN,f,t}^4

\end{bmatrix}\\

\end{eqnarray}

\]

where \( \mathbf{M}_{k,f,t}^3, k=1,\ldots,N \) and \( \mathbf{M}_{kl,f,t}^4, k,l=1,\ldots,N \) are the \( N\times N \) submatrices of \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \), respectively, with elements

\[

\begin{eqnarray*}

m_{ijk,f,t}^3 & = & E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] \\

m_{ijkl,f,t}^4 & = & E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}].

\end{eqnarray*}

\]

Since the factors \( f_{it} \) can be decomposed as \( z_{it}\sqrt{h_{it}} \), and given the assumptions on \( z_{it} \), then \( E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] = 0\ \). It is also true that for \( i \neq j\neq k \neq l \), then \( E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = 0 \) and when \( i=j \) and \( k=l \), then \( E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = h_{it}h_{kt}. \)

Thus, under the assumption of mutual independence, all elements in the conditional co-moments matrices with at least 3 different indices are zero. Finally, we standardize the conditional co-moments to obtain conditional coskewness and cokurtosis of \( \mathbf{r}_t \)

\[

\mathbf{S}_{ijk,t} = \frac{m_{ijk,t}^3}{({\sigma_{i,t}}{\sigma _{j,t}}{\sigma _{k,t}})}, \quad \\

\mathbf{K}_{ijkl,t} = \frac{m_{ijkl,t}^4}{({\sigma_{i,t}}{\sigma _{j,t}}{\sigma _{k,t}}{\sigma _{l,t}})},

\]

where \( \mathbf{S}_{ijk,t} \) represents the coskewness between elements \( i,j,k \) of \( \mathbf{r}_t \), \( \sigma_{i,t} \) the standard deviation of \( \mathbf{r}_{i,t} \), and in the case of \( i=j=k \) represents the skewness of asset \( i \) at time \( t \), and similarly for the cokurtosis tensor \( \mathbf{K}_{ijkl,t} \)

The flattened tensors grow quite quickly in size as n (factors) and m (moment) become larger. Populating the factor matrices with the values from the ACD dynamics in a fast and efficient manner is key if this model is to be called ‘feasible’ and estimation ‘large-scale’. For the third co-moment matrix, this is very simply since the column based vectorized index of the location of non-zero entries is found to be \( ((1:n)-1)n^2 + ((1:n)-1)n + (1:n) \).

These represent the (column based vectorized) location in the third co-moment matrix of the factor unstandardized skewness estimated from the ACD model (and determined jointly by the skew and shape dynamics). In the case of the fourth co-moment matrix of the factors, the situation is slightly more involved since in addition to the entries \( \{i=j=k=l\} \) for which the column based vectorized location is found to be \( ((1:n)-1)n^3 + ((1:n)-1)n^2 + ((1:n)-1)n + (1:n) \), there are also the entries \( \{i=j,k=l\} \) to consider as discussed in the previous section. To this end, consider the \( n\times n^3 \) matrix of the flattened tensor \( M_{f,ijkl}^4 \), with the indices illustrated as in the \( 4\times4^3 \) matrix below:

\[ \small

\begin{array}{*{14}{c}}

{j:} & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & {…} & 4 & 4 \\

{k:} & 1 & 1 & 1 & 1 & 2 & 2 & 2 & 2 & {…} & 4 & 4 \\

{l:} & 1 & 2 & 3 & 4 & 1 & 2 & 3 & 4 & {…} & 3 & 4 \\

{i:1} & {\{ 1111\} } & {\{ 1112\} } & {\{ 1113\} } & {\{ 1114\} } & {\{ 1121\} } & {\{ 1122\} } & {\{ 1123\} } & {\{ 1124\} } & {…} & {\{ 1443\} } & {\{ 1444\} } \\

{i:2} & {\{ 2111\} } & {\{ 2112\} } & {\{ 2113\} } & {\{ 2114\} } & {\{ 2121\} } & {\{ 2122\} } & {\{ 2123\} } & {\{ 2124\} } & {…} & {\{ 2443\} } & {\{ 2444\} } \\

{i:3} & {\{ 3111\} } & {\{ 3112\} } & {\{ 3113\} } & {\{ 3114\} } & {\{ 3121\} } & {\{ 3122\} } & {\{ 3123\} } & {\{ 3124\} } & {…} & {\{ 3443\} } & {\{ 3444\} } \\

{i:4} & {\{ 3111\} } & {\{ 4112\} } & {\{ 4113\} } & {\{ 4114\} } & {\{ 3121\} } & {\{ 4122\} } & {\{ 4123\} } & {\{ 4124\} } & {…} & {\{ 4443\} } & {\{ 4444\} } \\

\end{array}

\normalsize

\]

We are interested in finding the index location of all the pairs where \( \{i=j,k=l\} \) and their permutations, e.g. {1122},{1212},{2211},{2112},{1221},{2121}.

To do this, we first note the following:

- There are \( \frac{n!}{2(n-2)!} \) unique pairs.
- Each pair has 6 permutations.
- The number of unique pairs whose difference is \( d \) is \( n-d \) e.g. for \( d=1 \),\( n=4 \), we have 3 unique pairs {2,1},{3,2} and {4,3}.
- The first pair has columnwise vector based index \( v=n+2 \) e.g. for \( n=4 \), the first pair is {2112} in the example matrix above with columnwise vector based index of 6.
- The \( p^{th} \) unique pair, representing the first in the set of pairs with \( n-d \) differences, has vector based index \( (p-1)+(n+1) \), which given the starting pair with index \( n+2 \) means that we can calculate the indices of each unique pair in the matrix given the previous unique pair’s position.
- Within each \( p^{th} \) unique pair, there are 6 permutations. Starting from the index of the first pair in the series of 6, denoted \( v_1 \), the second permutation has index \( v_1+((n-1)n)d \), the third \( v_2+(n-1)d \), the fourth \( v_3+d(n+1)(n-1)^2 \), the fifth \( v_4+(n-1)d \) and the sixth \( v_5+((n-1)n)d \).
- The first pair of the next set of pairs whose difference is \( n-d \) has and index which is equal to \( n^3+n^2+n+1 \) more than the previous pair.

This fast method for calculating the location of each entry and populating it accordingly is already implemented in the **rmgarch** package for the GO-GARCH (NIG/GH) model. In addition, in order to avoid memory problems when it comes time to perform the kronecker multiplications using the mixing matrix \( \mathbf{A} \) on the factor higher co-moments to arrive at the asset higher co-moments, the method of Buis amd Dyksen (1996) is used which is also implemented in the klin package.

A very interesting application made possible as a result of the model’s properties is the estimation of the (higher moment) time varying betas from the CAPM model. Consider a universe of \( n \) assets with returns \( \mathbf{r}_t \), with benchmark \( b \) whose return is \( r_{b,t} \) and determined by a linear combination based on a pre-specified weighting vector \( \mathbf{w}_t \):

\[

r_{b,t} = \mathbf{w}_t’ \mathbf{r}_t

\]

such that \( \mathbf{w}_t’\mathbf{1}=1 \). It could be the case that \( \mathbf{w}_t \) represents the weights of the benchmark index or some other pre-deteremined weighting scheme (e.g. equal weight). To calculate the CAPM betas with respect to the benchmark, we need not model the pairwise combination of assets-benchmark but instead only the assets which comprise the benchmark. The following formulae outline the steps:

- Define the conditional covariance: \( M_t^2=AH_tA’ \), where \( {{\mathbf{H}}_{\mathbf{t}}} = E\left[ {{f_t}{{f’}_t}{{\left| \Im \right.}_{t – 1}}} \right], \in {\mathbb{R}^{N \times N}} \)
- Define the conditional 3rd co-moment: \( {\mathbf{M}}_{\mathbf{t}}^3{\mathbf{ = AM}}_{{\mathbf{f,t}}}^{\mathbf{3}}\left( {{\mathbf{A’}} \otimes {\mathbf{A’}}} \right), \in {\mathbb{R}^{N \times {N^2}}} \)
- Define the conditional 4th co-moment: \( {\mathbf{M}}_{\mathbf{t}}^4{\mathbf{ = AM}}_{{\mathbf{f,t}}}^4\left( {{\mathbf{A’}} \otimes {\mathbf{A’}} \otimes {\mathbf{A’}}} \right), \in {\mathbb{R}^{N \times {N^3}}} \)

The conditional beta Covariance:

\[

{\beta _{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right)\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)} \right]}}

{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^2}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^2{\mathbf{w}}}}

{{{\mathbf{w’M}}_{\mathbf{t}}^{\mathbf{2}}{\mathbf{w}}}}

\]

where \( {\mathbf{m}}_{i,t}^2: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^{\mathbf{2}}\left( {i = 1,…,N} \right) \).

The conditional beta Coskewness:

\[

{s_{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right){{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^2}} \right]}}

{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^3}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^3\left( {{\mathbf{w}} \otimes {\mathbf{w}}} \right)}}

{{{\mathbf{w’M}}_{\mathbf{t}}^{\mathbf{3}}\left( {{\mathbf{w}} \otimes {\mathbf{w}}} \right)}}

\]

where \( {\mathbf{m}}_{i,t}^3: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^3\left( {i = 1,…,N} \right) \).

The conditional beta Cokurtosis:

\[

{{\text{k}}_{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right){{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^3}} \right]}}

{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^4}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^4\left( {{\mathbf{w}} \otimes {\mathbf{w}} \otimes {\mathbf{w}}} \right)}}

{{{\mathbf{w’M}}_{\mathbf{t}}^4\left( {{\mathbf{w}} \otimes {\mathbf{w}} \otimes {\mathbf{w}}} \right)}}

\]

where \( {\mathbf{m}}_{i,t}^4: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^4\left( {i = 1,…,N} \right) \)

Thus, considering that we can include dimensionality reduction at the PCA whitening stage (the *n.comp* argument in the rmgarch::fastica algorithm), the IFACD (and the non-time varying higher moments restricted GO-GARCH sub-model) provides a clear avenue for estimating a large scale statistical factor based time varying higher moment CAPM model. A demonstration is available here.

In the IFACD model, the conditional weighted density can be calculated using the inversion of the NIG/GH characteristic function via FFT as discussed in Ghalanos, Rossi and Urga (2014) and the rmgarch vignette (for the GO-GARCH model). Methods for working with the weighted density are already available in the **rmgarch** package including methods for the density, distribution and quantile (dfft, pfft and qfft) on estimated, forecast and simulated objects. An alternative avenue for the calculation of the weighted quantile is to use the Cornish-Fisher expansion which makes use of the conditional higher moments:

\[

Va{R_{t,\alpha }} = {\mu _t} + {\sigma _t}\left( {\phi + (1 – {\phi ^2})\frac{{{S_t}}}

{6} + \left( {{\phi ^3} – 3\phi } \right)\frac{{{K_t}}}

{{24}} + \left( {5\phi – 2{\phi ^3}} \right)\frac{{S_t^2}}

{{36}}} \right)

\]

where \( \phi={\Phi ^{ – 1}}\left( \alpha \right) \), represents the quantile of the standard normal distribution evaluated at the coverage level \( \alpha \), \( S_t \) the skewness at time \( t \) and \( K_t \) the excess kurtosis at time \( t \). The weighted moments \( (\mu_t,\sigma_t,S_t,K_t) \) can be calculated as follows:

\[ \begin{gathered}

\mu_t = \mathbf{w}_t’ \mathbf{M_{t}^1},\\

\sigma_{{t}}^2 = \mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t, \\

{S_{t}} = \frac{{\mathbf{w}_t’\mathbf{M}_{_t}^3(\mathbf{w}_t \otimes \mathbf{w}_t)}}{{{{(\mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t)}^{3/2}}}}, \\

{K_{t}} = \frac{{\mathbf{w}_t’\mathbf{M}_{_t}^4(\mathbf{w}_t \otimes \mathbf{w}_t \otimes \mathbf{w}_t)}}{{{{(\mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t)}^2}}}, \\

\end{gathered}

\]

where \( \mathbf{M}_{t}^1 \) is the conditional mean vector, \( \mathbf{M}_{t}^3 \) and \( \mathbf{M}_{t}^4 \) are the third and fourth co-moment matrices described in the previous section, and \( \mathbf{\Sigma}_{t} \) the conditional covariance (the notation \( \mathbf{M}_{t}^2 \) has also been used).

A demonstration is available here.

Buis, P. E., & Dyksen, W. R. (1996). Efficient vector and parallel manipulation of tensor products. ACM Transactions on Mathematical Software (TOMS), 22(1), 18-23.

Ghalanos, A., Rossi, E. & Urga G. (2014). Independent Factor Autoregressive Conditional Density model. Econometric Reviews (forthcoming).

]]>~~A new package for dynamic binary models (~~**dbm**) is also out and a blog post will follow shortly.

As of 2014, the development repository has moved to bitbucket (see the downloads page for how to download and install).

-Alexios

]]>In a much cited paper DeMiguel, Garlappi, and Uppal (2009) argued that it is difficult to beat the equal weighting strategy, considering a number of mostly mean-variance related models on limited history monthly data. The premise of their argument was based on the uncertainty in estimation of quantities such as covariance. While it is certainly true that uncertainty about such quantities is related to the size of the dataset, using a very short history of monthly data to make that point and publish the paper does not appear to me to provide for a scholarly answer to the paper’s title (“Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?”). Additionally, they do not even test the significance in the difference of the Sharpe ratios which makes it rather difficult to draw *any* conclusions.

In defense of optimization, and a direct reply to that study, which was criticized for using very short histories for the estimation of the covariance, Kritzman, Page, and Turkington (2010) provide a very thorough empirical application across a number of portfolios using the mean-variance criterion, including one comprising daily returns of the point in time constituents of the S&P500 from 1998 to 2008, and show substantially large differences (though again not statistically tested) in Sharpe ratios against the equal weighting strategy. The arguments against 1/N are readily generalized to any static weight allocation scheme, such as employed by most benchmark indices, with clear implications for benchmark policy choice and investment manager reward schemes. The question of the efficiency of benchmark indices was addressed some time ago by Grinold (1992) who asked whether index benchmarks are efficient, using the test of Gibbons, Ross, and Shanken (1989) on the country indices of Australia, Germany, Japan, U.K. and U.S. for the period ending 1991 (with start dates as early as 1973 for the U.S.). He found that 4 out of the 5 indices were not efficient ex-post. More recently, Demey, Maillard, and Roncalli (2010) also argued that neither capitalization nor price weighted indices are efficient, exposing investors to what should be diversifiable risk.

In this blog article, I will try to provide one empirical application, using a popular Global Tactical Asset Allocation model based on an approach proposed by Faber(2007), to show the value of a quantitive approach to asset allocation over the naive 1/N strategy. I make use of the **rmgarch** and **parma** packages, together with the **blotter** package for realistic portfolio accounting. Because of the complexity of the setup, the code to generate the tables and figures will be provided seperately as a link (see bottom of page) rather than displayed inline and there are a number of custom functions written on top of blotter. While the quantstrat package provides one excellent avenue for implementing such strategies, I have not used it in this case, opting for a more custom solution which provides for more transparency in what is actually happening under the hood. Feel free to drop me a note if you find any errors or have any suggestions.

One of the issues in many of the backtests one reads about is a failure to take into account companies which have delisted. This leads to the so called survivorship bias problem when backtesting equities, and is even more pronounced in hedge fund databases where reporting is voluntary (e.g. funds which do badly in one year may decide not to report). A realistic test of the efficiency of a benchmark index would require having access to the point in time constituents of that index, which would probably require access to a proprietary database. In this application, I will instead consider a set of market aggregates, which alleviates this problem, and also provides for two additional benefits. First, the implementation of the strategy becomes scalable and there are no issues of liquidity (for most of the indices), and second, the use of aggregates eliminates some of the idiosyncratic noise found in individual securities which enables the identification of trends using a moving average filter much easier. Because ETF’s usually have a limited history going back to the early 2000’s, I have opted to use where appropriate a Vanguard fund as the instrument of choice for testing. Nevertheless, almost half of the securities do not trade for most of 1990 and early 2000 making it a little difficult to draw firm conclusions for the whole period since portfolio optimization benefits from a large and diversified group of securities.

The universe is comprised of the following assets:

- VFINX (US Equity s:1987), VEURX (EU Equity s:1990), VPACX (Asia Pacific Equity s:1990), VEIEX (Emerging Market Equity s:1995)
- SH (ProShares Short S&P 500 s:2006)
- VGSIX (REIT s:1996)
- IXC (Global Energy s:2001), GLD (Gold s:2004)
- VWEHX (High Yield Corporate s:1989), VCLT (Long Term Corporate s:2009)
- DBV (Carry Trade G10 s:2006)

This represents a select exposure to global Equities, US Corporates, Commodities, US Real Estate and Rates/Currencies. The use of the Short S&P 500 index is to obtain some additional diversification in times of global crises, since the strategy is long only.

A first look at some of the statistics of this dataset reveals that it exhibits the typical set of stylized facts.

Ticker | Mean | Median | Min | Max | Stdev | Skewness | Kurtosis | LB(1) | LB(2) | ARCH-LM(1) | JB |
---|---|---|---|---|---|---|---|---|---|---|---|

VFINX | 0.0003 | 0.0007 | -0.2286 | 0.1096 | 0.012 | -1.2924 | 30.34 | 0.0001 | 0 | 0 | 0 |

VEURX | 0.0003 | 0.0008 | -0.1091 | 0.1174 | 0.0132 | -0.3212 | 11.367 | 0.0137 | 0.0011 | 0 | 0 |

VPACX | 0.0001 | 0 | -0.0856 | 0.1277 | 0.0134 | 0.2265 | 9.119 | 0.3344 | 0.0758 | 0 | 0 |

VEIEX | 0.0003 | 0.0011 | -0.126 | 0.1367 | 0.0146 | -0.3943 | 13.32 | 0 | 0 | 0 | 0 |

SH | -0.0004 | -0.0008 | -0.1168 | 0.09 | 0.0146 | -0.1571 | 12.154 | 0 | 0 | 0 | 0 |

VGSIX | 0.0004 | 0 | -0.2149 | 0.169 | 0.0184 | -0.2011 | 23.696 | 0 | 0 | 0 | 0 |

IXC | 0.0004 | 0.001 | -0.1412 | 0.1434 | 0.0173 | -0.3857 | 12.174 | 0.0021 | 0 | 0 | 0 |

GLD | 0.0005 | 0.0008 | -0.0919 | 0.107 | 0.0131 | -0.3751 | 8.728 | 0.4471 | 0.7443 | 0 | 0 |

VWEHX | 0.0003 | 0 | -0.0521 | 0.0231 | 0.0032 | -0.6722 | 21.772 | 0 | 0 | 0 | 0 |

VCLT | 0.0003 | 0.0006 | -0.0352 | 0.0224 | 0.0064 | -0.2832 | 4.768 | 0.0067 | 0.0218 | 0.0003 | 0 |

DBV | 0 | 0.0006 | -0.085 | 0.0642 | 0.0097 | -0.7927 | 14.815 | 0.0001 | 0.0003 | 0 | 0 |

There is serial correlation (Ljung-Box test) and ARCH effects (ARCH-LM test) in addition to clear rejection of normality (Jarque-Bera test). This is of course not surprising for daily returns, though the question of such effects at lower frequencies is debatable (there is usually not enough data to form adequate tests, whilst aggregation methods require a host of additional assumptions e.g. weak GARCH etc). Given the presence of such such dynamics in the dataset, with varying degrees of skewness and kurtosis across the securities, it is certainly not reasonable to opt for a 1/N approach to the allocation of capital.

The figure shows the pairwise complete correlations of the securities used in this application which range from the highly correlated (Equities, Energy, REITS and Carry Trade) to the low correlation (Fixed Income and Precious Metals) and the negatively correlated (Equity Short).

There are 2 competing considerations to take into account when modelling security dynamics and the generation of forecasts. Obviously, more data is preferred to less when using such models as GARCH in order to obtain good estimates of the persistence. On the other hand, the higher the frequency of the data the more the noise in any signal generating mechanism such as the moving average system. In addition, we want to optimize and rebalance based on a lower frequency (monthly) basis so as to eliminate transaction costs and turnover. The signal generating mechanism I apply is based on the crossover of a slow and fast exponential moving average on monthly prices, the length of which is determined dynamically at each point in time. This filter is applied to all securities with a minimum of 600 daily data points, and those with a positive signal are then considered part of the modelling universe. If there are 3 or less securities in the universe, the optimization is not undertaken and instead a fixed weight is assigned to each security (and this may add up to less than 100% investment). The modelling of the universe of positive signal prospects is undertaken using an AR(1)-GARCH(1,1)-student model and the joint dynamics using a static student Copula model (using Kendall’s tau for the correlation) and a parametric transformation (effectively a CCC-Student model). In order to generate the monthly forecast scenario from the higher frequency daily data modelling process, 6000 25-ahead simulated points are generated and collapsed by summation to approximate the monthly forecast return distribution. This scenario matrix of size 6000 x n.securities is then optimized using a fractional mean-LPM(4) model with short-sale constraints and position bounds. The LPM(4) model was chosen in order to capture extreme losses below the portfolio mean (the LPM threshold is the portfolio mean), and a minimum bound of 2.5% was also imposed in order that all securities with a positive signal belong to the final trade set. An upper bound of 50% was also imposed on each security as a concentration limit (this is high because of the small universe of securities used and the effective universe once filtered for positive trends). The fractional optimization model and the LPM measure is described in detail in the vignette of the parma package which also provides some insights into the scaling property of this measure in the case when the threshold is the portfolio mean. The steps are detailed below:

- Convert prices to monthly.
- Determine, for each security, and each time point, the optimal slow and fast window lengths of the EMA filter using a Total Return to Volatility metric, and generate the signals at the end of each month.
- Model those securities with positive signals and generate the month ahead simulated scenario density.
- Optimize the scenario using a fractional mean-LPM(4) model and obtain the allocation weights.

Once the weights are dynamically determined, the model is ready to backtest. The allocation/rebalancing is performed the day after the weights are determined, which in this case is the first tradeable day of the month. In order to provide for a realistic backtest, a $20 roundtrip commission is charged per trade and 1% yearly management fee deducted monthly on a pro-rata basis. The blotter package is used for the order generation and tracking of the portfolio equity, with a number of custom wrapper functions. The same approach is applied to the equal weighted portfolio so that the strategies are directly comparable and individually provide for a realistic representation of their respective performance. In addition to the optimal (OP) and equal weight (EQ) models, an alternative (OPalt) model is also tested which is similar to the OP model but takes into account the forecasted portfolio reward/risk portfolio measure which is generated by the model scenario and optimal weights in order to ‘deleverage’ from a full investment constraint to a 2/3 investment in times when the reward/risk ratio is below ½. The reason I have included this portfolio is in order to gain some additional insight into the timing value of the information in the econometric model as opposed to that of the EMA filter which is known to lag turning points (it is the compromise made for eliminating costly noise in terms of whipsaw patterns).

Figure 1 shows the total number of securities at any given time in the OP portfolio. Notice the red bars represent those times that there were less than 4 securities in the portfolio in which case the optimization was not run and equal weighting given to each security.

1994-2013 | OPalt | OP | EQ | S&P500 | 2000-2013 | OPalt | OP | EQ | S&P500 | |
---|---|---|---|---|---|---|---|---|---|---|

CAGR | 8.94 | 11.16 | 10.34 | 6.52 | CAGR | 8.45 | 9.89 | 8.21 | 0.24 | |

Vol | 8.43 | 10.92 | 11.65 | 19.5 | Vol | 9.19 | 11.88 | 12.82 | 21.18 | |

Sharpe | 0.71 | 0.76 | 0.65 | 0.27 | Sharpe | 0.7 | 0.68 | 0.52 | 0.02 | |

LW p-value | 0.113 | 0.154 | LW p-value | 0.03 | 0.078 | |||||

BestM | 6.63 | 9.82 | 9.82 | 10.5 | BestM | 6.3 | 9.04 | 6.96 | 10.5 | |

WorstM | -6.91 | -10.54 | -10.54 | -19.14 | WorstM | -5.89 | -8.84 | -7.01 | -19.14 | |

UpM | 69.4 | 68.53 | 63.36 | 63.09 | UpM | 67.5 | 67.5 | 62.5 | 59.01 | |

DnM | 30.6 | 31.47 | 36.64 | 36.91 | DnM | 32.5 | 32.5 | 37.5 | 40.99 | |

MaxDraw | 14.5 | 19.04 | 19.04 | 59.65 | MaxDraw | 14.5 | 18.65 | 18.24 | 59.65 |

2003-2013 | OPalt | OP | EQ | S&P500 | 2010-2013 | OPalt | OP | EQ | S&P500 | |
---|---|---|---|---|---|---|---|---|---|---|

CAGR | 11.49 | 13.93 | 12.21 | 5.79 | CAGR | 7.36 | 8.23 | 4.19 | 11.77 | |

Vol | 9.83 | 12.49 | 13.68 | 20.52 | Vol | 7.39 | 8.92 | 11.93 | 17.81 | |

Sharpe | 0.99 | 0.98 | 0.79 | 0.3 | Sharpe | 0.99 | 0.92 | 0.4 | 0.71 | |

LW p-value | 0.065 | 0.1 | LW p-value | 0.054 | 0.098 | |||||

BestM | 6.3 | 9.04 | 6.96 | 10.5 | BestM | 3.88 | 5.76 | 6.1 | 10.5 | |

WorstM | -5.89 | -8.84 | -7.01 | -19.14 | WorstM | -5.89 | -8.84 | -7.01 | -8.37 | |

UpM | 70.97 | 70.97 | 64.52 | 64.8 | UpM | 67.5 | 67.5 | 57.5 | 63.41 | |

DnM | 29.03 | 29.03 | 35.48 | 35.2 | DnM | 32.5 | 32.5 | 42.5 | 36.59 | |

MaxDraw | 14.5 | 18.65 | 18.24 | 59.65 | MaxDraw | 10.13 | 14.83 | 18.24 | 20.03 |

Tables 1 and 2 present the results for various subperiods of the 2 optimal portfolios, the equal weighted portfolio and the S&P 500 index (as captured by the VFINX fund). I use subperiod analysis since half of the securities in the universe only start in the early to mid 2000s, which means that the 1990-2000 period is contaminated with a small number of assets and hence more likely to have equal weights (when the number of signals is less than 4 the model allocates equally to each). The results provide for a very clear picture of the incremental value of each portfolio. The OP and EW portfolios, over the whole sample appear to have the same drawdowns. This is likely related to the fact that both portfolios have the same signal generating mechanism which somewhat lags the turning points and is based on monthly data (so that daily P&L accounting is likely to reveal larger intra-month fluctuations). The OPalt portfolio on the other hand, which makes use of information from the DGP and optimal weight vector has lower drawdowns which indicates that there is value in making use of information from this model beyond what the signals provide for timing purposes. The Sharpe ratios for the optimal portfolios are higher than that of the equal weighted portfolio in all subperiods, but significantly different at the 10% level only the period after 2000 (the ‘LW p-value’ row contains the p-values from the test of Ledoit and Wolf (2008) of the optimal portfolio against the equal weighted portfolio under the null hypothesis that their Sharpe ratio differences are zero), again for the reasons hypothesized earlier. Overall, one may cautiously conclude from this small example that there is value in adopting a disciplined quantitative approach to the allocation of capital beyond the naive 1/N approach.

Figure 2a gives the contribution to P&L of each security over 2 subperiods, whilst Figure 2b gives the *active* contribution (over the equal weighted portfolio) of each security to P&L, which gives some additional insight to where the bets where concentrated.

Figure 3a shows the average yearly allocation per security of the optimal portfolio, whilst Figure 3b the **active** allocation (over the equal weighted portfolio). The figures show a clear difference between the optimal and equal allocation methods, with the former having made positive active bets on Gold and High Yield Corporates over the entire period, and underweighted Asia Pacific Equities in the 1990s.

Finally, Figure 4 provides the evolution of Terminal Wealth for the 3 portfolios, though this is always somewhat less informative than the subperiod analysis since it always depends on the starting point and will not usually be a good indicator of the average risk-return profile, as can be deduced from the OPalt portfolio which has the lowest Terminal Wealth but the highest Sharpe Ratio and lowest drawdowns.

There is really no substitute for a disciplined approach to the modelling of security dynamics and stochastic programming to obtain the optimal weights. Even when there is not enough data, it is certainly possible to map the data available to either a set of higher frequency factors or closely related prospects with more data. This article provided for a very simple example of what is possible. Interesting extensions to the approach adopted might include:

- A more sophisticated signal generating mechanism, taking into account explanatory variables (not just looking at the price trend) for turning point or directional analysis.
- More advanced money management overlay to limit intra-month losses. The blotter wrapper function allows for this and a simple example is included (though not used) in the code. To my knowledge it remains an open question whether stop losses add or subtract value, and it is my experience that including them within a strategy for intra-month risk reduction should be done with caution.
- No short sale constraint. Technically, there is no reason not to remove the constraint, but the realistic modelling of this is more complicated because of the need to track margin etc. The blotter wrapper is not yet setup to deal with this case.

There are 5 files in the zip (unstarched_code). Place them in an empty directory which you will designate as your working directory in R (‘setwd’). The main file which you will need to run is the ‘unstarched_script_run.R’ file. Be patient, it takes some time to complete, mainly because of the portfolio trades/accounting, the progress of which will be displayed on screen. I estimate about 60 minutes to complete.

Demey, P., Maillard, S., & Roncalli, T. (2010). Risk-based indexation. Available at SSRN 1582998.

DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy?. Review of Financial Studies, 22(5), 1915-1953.

Faber, M. T. (2007). A quantitative approach to tactical asset allocation. Journal of Wealth Management.

Gibbons, M. R., Ross, S. A., & Shanken, J. (1989). A test of the efficiency of a given portfolio. Econometrica: Journal of the Econometric Society, 1121-1152.

Grinold, R. C. (1992). Are Benchmark Portfolios Efficient?. The Journal of Portfolio Management, 19(1), 34-40.

Kritzman, M., Page, S., & Turkington, D. (2010). In defense of optimization: the fallacy of 1/N. Financial Analysts Journal, 66(2), 31.

Ledoit, O., & Wolf, M. (2008). Robust performance hypothesis testing with the Sharpe ratio. Journal of Empirical Finance, 15(5), 850-859.

]]>- What is the rationale and source for time variation in the skew and shape parameters of a conditional density?
- Can one single source of randomness adequately model both the odd and even moment variation?
- What is the impact on the coefficients of the lower moments?
- What are the implications of the time varying density for the standardized residuals on long run relationships, simulation methods and the consistency of the parameters?
- Do the marginal benefits, in applied work, outweigh the costs of the difficult estimation?

The first question is quite open and difficult to answer. In a GARCH model, most of the variation in the underlying hypothesized density, its expansion and contraction with the news arrival, is captured by the conditional variance dynamics. Given a suitably chosen skewed and shaped distribution, extreme events can also be accommodated within this framework as can changes in the asymmetry of the distribution. If we believe that there is value in allowing higher moment parameters to vary, which we can at times infer from the estimation and forecast performance, is this the result of some underlying inadequacy of the GARCH model or does the ‘true’ and unknown data generating process actually contain such dynamics (and why)?

For the second question, stochastic volatility models have already provided one way to test this at least for GARCH models, though at a considerable computational cost. For ACD models, this is probably still an open research question.

For the third question, some evidence was presented in Harvey and Siddique (1999) where the inclusion of time varying skewness affected the persistence of the conditional variance and caused some of the asymmetries in the variance to disappear (through a reduced asymmetry coefficient in the variance dynamics). I would also actually expect that for standard GARCH models with very high persistence, the inclusion of time variation in the shape parameter (and hence more directly on the higher even moments) to lead to a reduction in such persistence as some of the extreme variations from shocks could be better accommodated. I also wonder whether the inclusion of time variation in the skew parameter would have an impact on the first moment parameters, particularly when their dynamics might include threshold effects.

The fourth question is very difficult to answer, particularly the consistency of the parameters. There is little theoretical work in the ACD literature on this and there have only been some footnotes in the research about simulations confirming that the parameters of ACD models tested had the standard \( \sqrt(N) \) consistency. I remain very skeptical on this point considering the ‘3-dimensional’ nature of ACD simulation. That is, for each period \( t \) that is generated in a simulation, the density of the standardized residuals is varying, unlike GARCH model simulation where a long path results in a sampling from the standardized distribution.

The final question is the one most recent papers have focused on, and an extensive literature is available in the vignette of the **racd** package. The results are mixed, though one would not necessarily conclude this from just reading the abstract of any one of the papers where there is the natural tendency to portray the results in the best possible light.

Following the rather long intro, the remaining article will provide for an introduction to the usage of the **racd** package which shares many features with the **rugarch** package (since it extends it). If you have some unique insights into these models, would like to add something or comment, feel free to drop me an email. ~~The ~~ (see r-downloads).**racd** package is currently hosted in the teatime repository on r-forge and there are no plans at present to release it to CRAN

The model specification follows closely the implementation in the **rugarch** package with the exception of the ‘distribution.model’ option which is now a list with a number of choices for defining the conditional skew and shape dynamics. For the conditional variance equation, the GARCH models which are included are the standard GARCH (‘sGARCH’), component GARCH (‘csGARCH’) and the multiplicative component GARCH (‘mcsGARCH’) for intraday data. The reason for only including these models has to do with the calculation of their long run properties and persistence which do not require simulation in light of the time variation of the higher moments.

# setup library(racd) library(rugarch) library(parallel) library(xts) data(sp500ret) plot2xts = function(x, ...) { plot(x, y = NULL, type = 'l', auto.grid = FALSE, major.ticks = 'auto', minor.ticks = FALSE, major.format = FALSE, bar.col = 'grey', candle.col = 'white', ann = TRUE, axes = TRUE, cex.main = 0.9, cex.axis = 0.9, ...) grid() } args(acdspec) ## function (variance.model = list(model = 'sGARCH', garchOrder = c(1, ## 1), external.regressors = NULL, variance.targeting = FALSE), ## mean.model = list(armaOrder = c(1, 1), include.mean = TRUE, ## archm = FALSE, arfima = FALSE, external.regressors = NULL), ## distribution.model = list(model = 'snorm', skewOrder = c(1, ## 1, 1), skewshock = 1, skewshocktype = 1, skewmodel = 'quad', ## skew.regressors = NULL, shapeOrder = c(0, 1, 1), shapeshock = 1, ## shapeshocktype = 1, shapemodel = 'quad', shape.regressors = NULL, ## exp.rate = 1), start.pars = list(), fixed.pars = list())

The distribution.model list contains the details of the conditional higher moment specification:

- model. This is the conditional distribution, with the same choice of skewed and shaped distributions as in the rugarch package.
- skewOrder (skewmodel) and shapeOrder (shapemodel). Denote the order of the skew and shape (models) under the different parameterizations available and described in the package’s vignette. Not all models have 3 parameters. For example, the ‘xar’ shape model has 2 parameters whilst the ‘xar’ in skew has 3 (since it is based on the signed rather than absolute shocks).
- skewshocktype and shapeshocktype. A value of 1 denotes the use of squared shocks while any other value denotes absolute value shocks.
- skewshock and shapeshock. A value of 1 denotes the use of the standardized residuals as shocks while any other value denotes the use of the residuals.
- skew.regressors and shape.regressors. A matrix of regressors for the skew and shape dynamics. This should be considered experimental at present, until further testing.
- exp.rate. The scaling value for the exponential transformation of the unbounded shape dynamics (the skew dynamics use a logistic transformation without extra parameters.)

For the test example, I’ll use the following specification:

spec = acdspec(mean.model = list(armaOrder = c(0, 1)), variance.model = list(variance.targeting = TRUE), distribution.model = list(model = 'nig', skewOrder = c(1, 1, 1), shapeOrder = c(1,1,1), skewmodel = 'quad', shapemodel = 'pwl'))

As described in the vignette, estimation in ACD models is highly nonlinear and there is no guarantee of a global optimum. For this reason, it is suggested that the problem is estimated from different starting parameters which is why the routine includes a number of solvers preceded by ‘ms’ to denote a multistart strategy. The included solvers are optim (and msoptim), ucminf (and msucminf), solnp (and mssolnp), nlminb (and msnlminb) and a local implementation of cmaes (bound constrained global solver). For the unconstrained solvers (optim and ucminf), the parameters are transformed into a bounded domain via the logistic map. The use of parallel evaluation in the multistart solvers is enabled by passing a cluster object from the parallel package, as the example which follows illustrates:

data(sp500ret) cl = makePSOCKcluster(10) fit = acdfit(spec, sp500ret, solver = 'msoptim', solver.control = list(restarts = 10), cluster = cl) ## ## *---------------------------------* ## * ACD Model Fit * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ----------------------------------- ## GARCH Model : sGARCH(1,1) ## Mean Model : ARFIMA(0,0,1) ## Distribution : nig ## ## Conditional Skew Dynamics ## ----------------------------------- ## ACD Skew Model : quad(1,1,1) ## Shock type: Std.Residuals ## ## Conditional Shape Dynamics ## ----------------------------------- ## ACD Shape Model : pwl(1,1,1) ## Shock type: Std.Residuals ## ## Optimal Parameters ## ------------------------------------ ## Estimate Std. Error t value Pr(>|t|) ## mu 0.000579 0.000109 5.3235 0.000000 ## ma1 0.009191 0.014274 0.6439 0.519643 ## alpha1 0.089522 0.008213 10.8999 0.000000 ## beta1 0.902153 0.009429 95.6765 0.000000 ## skcons -0.080024 0.031859 -2.5118 0.012011 ## skalpha1 0.284640 0.055745 5.1061 0.000000 ## skgamma1 0.027768 0.013604 2.0412 0.041233 ## skbeta1 0.570018 0.093124 6.1210 0.000000 ## shcons 0.443515 0.193872 2.2877 0.022157 ## shalpha1 0.014926 0.018471 0.8081 0.419033 ## shgamma1 0.783159 0.119464 6.5556 0.000000 ## shbeta1 0.657811 0.088193 7.4588 0.000000 ## omega 0.000001 NA NA NA ## ## Robust Standard Errors: ## Estimate Std. Error t value Pr(>|t|) ## mu 0.000579 0.000107 5.43342 0.000000 ## ma1 0.009191 0.014895 0.61709 0.537178 ## alpha1 0.089522 0.010646 8.40916 0.000000 ## beta1 0.902153 0.012237 73.72243 0.000000 ## skcons -0.080024 0.030544 -2.61994 0.008795 ## skalpha1 0.284640 0.060626 4.69498 0.000003 ## skgamma1 0.027768 0.015036 1.84676 0.064782 ## skbeta1 0.570018 0.087479 6.51609 0.000000 ## shcons 0.443515 0.318005 1.39468 0.163112 ## shalpha1 0.014926 0.009775 1.52695 0.126773 ## shgamma1 0.783159 0.144317 5.42667 0.000000 ## shbeta1 0.657811 0.146672 4.48492 0.000007 ## omega 0.000001 NA NA NA ## ## LogLikelihood : 18154 ## ## Information Criteria ## ------------------------------------ ## ACD GARCH ## Akaike -6.5697 -6.5540 ## Bayes -6.5554 -6.5480 ## Shibata -6.5698 -6.5540 ## Hannan-Quinn -6.5647 -6.5519 ## ## [truncated remaining output] ## ## Elapsed time : 1.818

As can be inferred from the robust standard error, the majority of the higher moment parameters are significant, and the information criteria indicate an improvement over the non time varying GARCH equivalent model.

There are a number of extractor functions, in addition to the standard ones such as ‘fitted’,’sigma’,’residuals’, ‘quantile’ and ‘pit’, such as ‘shape’ and ‘skew’ which extract the conditional time varying skew and shape xts vectors with option for returning either the ‘transformed’ (default TRUE) or unbounded values. Additionally, the methods ‘skewness’ and ‘kurtosis’ return the implied conditional time varying skewness and excess kurtosis xts vectors. The following figure provides a visual illustration of the estimated dynamics:

par(mfrow = c(3, 2), mai = c(0.75, 0.75, 0.3, 0.3)) plot2xts(fitted(fit), col = 'steelblue', main = 'Conditional Mean') plot2xts(abs(as.xts(sp500ret)), col = 'grey', main = 'Conditional Sigma') lines(sigma(fit), col = 'steelblue') plot2xts(skew(fit, transform = FALSE), col = 'grey', main = 'Skew') lines(skew(fit), col = 'steelblue') legend('topleft', c('Unbounded', 'Bounded'), col = c('grey', 'steelblue'), lty = c(1,1), bty = 'n', cex = 0.8) plot2xts(shape(fit, transform = FALSE), col = 'grey', main = 'Shape', ylim = c(0,10)) lines(shape(fit), col = 'steelblue') legend('topleft', c('Unbounded', 'Bounded'), col = c('grey', 'steelblue'), lty = c(1,1), bty = 'n', cex = 0.8) plot2xts(skewness(fit), col = 'steelblue', main = 'Skewness') plot2xts(kurtosis(fit), col = 'steelblue', main = 'Kurtosis (ex)')

Forecasting in the **racd** package can only be done on an estimated (ACDfit) object (unlike **rugarch** where a specification object can also be used), but for 1-ahead forecasting it is possible to use the *acdfilter* method instead. For n-ahead (n>1) forecasts, for the higher moment dynamics, this is done via simulation as there is no closed form solution as explained in the vignette. There is nothing particularly interesting to note about the *acdforecast* method here so I will go directly to the rolling forecast and backtesting method (*acdroll*). This has a number of extra options compared to the ugarchroll method and these are explained in detail in the vignette. Suffice to say, these extra options are related to restrictions on the dynamics to facilitate convergence.

roll = acdroll(spec, sp500ret, n.start = 2000, refit.every = 100, refit.window = 'recursive', solver = 'msoptim', solver.control = list(restarts = 10), cluster = cl, fixARMA = TRUE, fixGARCH = TRUE, fixUBShape = TRUE, UBSHapeAdd = 3, compareGARCH = 'LL') # check convergence(roll) if it is not zero, resubmit via the 'resume' # method. gspec = ugarchspec(mean.model = list(armaOrder = c(0, 1)), variance.model = list(variance.targeting = TRUE), distribution = 'nig') rollg = ugarchroll(gspec, sp500ret, n.start = 2000, refit.every = 100, refit.window = 'recursive') show(roll) ## ## *-------------------------------------* ## * ACD Roll * ## *-------------------------------------* ## No.Refits : 36 ## Refit Horizon : 100 ## No.Forecasts : 3523 ## GARCH Model : sGARCH(1,1) ## Mean Model : ARFIMA(0,0,1) ## ## ACD Skew Model : pwl(1,1,1) ## ACD Shape Model : pwl(1,1,1) ## Distribution : nig ## ## Forecast Density: ## Mu Sigma Skew Shape Shape(GIG) Realized ## 1995-02-03 2e-04 0.0055 0.0501 0.6855 0 0.0123 ## 1995-02-06 2e-04 0.0060 0.3130 0.1493 0 0.0052 ## 1995-02-07 2e-04 0.0060 0.3292 0.4318 0 -0.0007 ## 1995-02-08 3e-04 0.0059 0.2219 0.8460 0 0.0008 ## 1995-02-09 3e-04 0.0058 0.1515 0.9846 0 -0.0021 ## 1995-02-10 3e-04 0.0057 0.0580 1.0217 0 0.0026 ## ## .......................... ## Mu Sigma Skew Shape Shape(GIG) Realized ## 2009-01-23 0.0008 0.0273 -0.1161 1.2626 0 0.0054 ## 2009-01-26 0.0003 0.0264 -0.1010 1.6388 0 0.0055 ## 2009-01-27 0.0003 0.0255 -0.0890 1.7417 0 0.0109 ## 2009-01-28 0.0001 0.0248 -0.0623 1.6254 0 0.0330 ## 2009-01-29 -0.0004 0.0253 0.0364 0.6087 0 -0.0337 ## 2009-01-30 0.0013 0.0258 -0.1027 1.3595 0 -0.0231 ## ## Elapsed: 31.52 mins ## vartable = rbind(as.data.frame(VaRTest(alpha = 0.01, actual = roll@forecast$VaR[, 'realized'], VaR = roll@forecast$VaR[, 'alpha(1%)'])[c(1, 2, 11, 12)], row.names = c('ACD(1%)')), as.data.frame(VaRTest(alpha = 0.05, actual = roll@forecast$VaR[, 'realized'], VaR = roll@forecast$VaR[, 'alpha(5%)'])[c(1, 2, 11, 12)], row.names = c('ACD(5%)')), as.data.frame(VaRTest(alpha = 0.01, actual = rollg@forecast$VaR[, 'realized'], VaR = rollg@forecast$VaR[, 'alpha(1%)'])[c(1, 2, 11, 12)], row.names = c('GARCH(1%)')), as.data.frame(VaRTest(alpha = 0.05, actual = rollg@forecast$VaR[, 'realized'], VaR = rollg@forecast$VaR[, 'alpha(5%)'])[c(1, 2, 11, 12)], row.names = c('GARCH(5%)'))) ## ## ## print(vartable, digits = 3) ## expected.exceed actual.exceed cc.LRp cc.Decision ## ACD(1%) 35 28 0.357 Fail to Reject H0 ## ACD(5%) 176 203 1.000 Fail to Reject H0 ## GARCH(1%) 35 26 0.215 Fail to Reject H0 ## GARCH(5%) 176 191 0.571 Fail to Reject H0 ## ## print(rbind(as.data.frame(BerkowitzTest(qnorm(as.numeric(pit(roll))), tail.test = TRUE, alpha = 0.01)[c(1, 2, 4, 6)], row.names = 'ACD'), as.data.frame(BerkowitzTest(qnorm(as.numeric(pit(rollg))), tail.test = TRUE, alpha = 0.01)[c(1, 2, 4, 6)], row.names = 'GARCH')), digits = 4) ## uLL rLL LRp Decision ## ACD -163.2 -164.3 0.35633 fail to reject NULL ## GARCH -157.8 -160.3 0.07761 fail to reject NULL ## ## HL = cbind(HLTest(as.numeric(pit(roll)))$statistic, HLTest(as.numeric(pit(rollg)))$statistic) colnames(HL) = c('ACD', 'GARCH') print(HL, digits = 4) ## ACD GARCH ## M(1,1) 0.2349 5.787 ## M(2,2) 6.2283 19.640 ## M(3,3) 9.6166 26.599 ## M(4,4) 11.2113 28.929 ## M(1,2) 0.8564 4.306 ## M(2,1) 8.4187 24.476 ## W 18.3033 11.897

At the 1% and 5% coverage levels, neither the ACD nor GARCH models can be rejected, where the null is the correct number of and independence of the VaR exceedances, with higher p-values for the ACD model. In terms of the goodness of fit of the tail of the density, and using the tail test of Berkowitz (2001) at the 1% quantile, the ACD model appears to generate a significantly higher p-value than the GARCH model which can be rejected at the 10% level of significance. Note that the ‘pit’ method returns the probability integral transformation of the realized data given the conditional forecasted density. Another test which uses the ‘pit’ method is that of Hong and Li (2005), a Portmanteau type test, which evaluates the goodness of fit on the whole density. While both models are rejected as providing a ‘correct fit’ (W statistic), the indications from the moment based statistics (M statistics) indicate that the ACD model has significantly better fit, as can be inferred from the lower values (the statistics are distributed as N(0,1)).

Unlike GARCH models where there is one call to the random number of size n.sim, in ACD models there are n.sim calls of size 1 because of the time variation in the conditional standardized residuals density. Simulation may be carried out either from a fitted object (using *acdsim*) or from a specification object (using *acdpath*). For the latter, this is not enabled for the mcsGARCH model. The following example provides a short illustration of the method and shows how to obtain equivalence between the simulation from fit and spec:

sim1 = acdsim(fit, n.sim = 250, m.sim = 5, rseed = 100:104) # re-define the spec without variance targeting (only used in estimation # routine) spec = acdspec(mean.model = list(armaOrder = c(0, 1)), variance.model = list(variance.targeting = FALSE), distribution.model = list(model = 'nig', skewOrder = c(1, 1, 1), shapeOrder = c(1, 1, 1), skewmodel = 'quad', shapemodel = 'pwl')) setfixed(spec) < - as.list(coef(fit)) sim2 = acdpath(spec, n.sim = 250, m.sim = 5, rseed = 100:104, prereturns = tail(sp500ret[,1], 1), presigma = as.numeric(tail(sigma(fit), 1)), preshape = as.numeric(tail(shape(fit),1)), preskew = as.numeric(tail(skew(fit), 1))) # check c(all.equal(fitted(sim1), fitted(sim2)), all.equal(sigma(sim1), sigma(sim2)), all.equal(skew(sim1), skew(sim2)), all.equal(shape(sim1), shape(sim2))) ## [1] TRUE TRUE TRUE TRUE S = skewness(sim1) K = kurtosis(sim1) R = fitted(sim1) V = sigma(sim1) par(mfrow = c(2, 2), mai = c(0.75, 0.75, 0.3, 0.3), cex.axis = 0.8) matplot(S, type = 'l', col = 1:5, main = 'Simulated Skewness', xlab = '') matplot(K, type = 'l', col = 1:5, main = 'Simulated Kurtosis (ex)', xlab = '') matplot(apply(R, 2, 'cumsum'), type = 'l', col = 1:5, main = 'Simulated Paths', ylab = '', xlab = '') matplot(V, type = 'l', col = 1:5, main = 'Simulated Sigma', xlab = '')

Time varying higher moments within a GARCH modelling framework (ACD) provide for a natural extension to time variation in the conditional mean and variance. Whether the added marginal benefits of their inclusion justify the complexity in their estimation remains an open empirical question which hopefully the **racd** package will enable to be researched in greater depth and transparency.

Berkowitz, J. (2001). Testing density forecasts, with applications to risk management. Journal of Business & Economic Statistics, 19(4), 465-474.

Hansen, B. E. (1994). Autoregressive Conditional Density Estimation. International Economic Review, 35(3), 705-30.

Harvey, C. R., & Siddique, A. (1999). Autoregressive conditional skewness. Journal of financial and quantitative analysis, 34(4), 465-497.

Hong, Y., and Li, H. (2005), Nonparametric specification testing for continuous-time models with applications to term structure of interest rates, Review of Financial Studies, 18(1), 37-84.

]]>library(rugarch) Sys.setenv(TZ = 'GMT') library(quantmod) R_i = read.csv('C_2008_1minret.csv') R_i = xts(R_i[, 2], as.POSIXct(R_i[, 1])) getSymbols('C', from = '2000-01-01') C = adjustOHLC(C, use.Adjusted = TRUE) R_d = ROC(Cl(C), na.pad = FALSE)

Consider the correlogram of the absolute 1-min returns for Citigroup during the sample period in question:

par(cex.main = 0.85, col.main = 'black') acf(abs(as.numeric(R_i)), lag.max = 4000, main = '1-min absolute returns\nCitigroup (2008 Jan-Feb)', cex.lab = 0.8)

The regular pattern is quite clear, repeating approximately every 390 periods (1-day) and showing an increase in volatility around the opening and closing times. GARCH, and more generally ARMA type models can only handle an exponential decay, and not the type of pattern seen here. Several approaches have been suggested in the literature in order to de-seasonalize the absolute returns such as the flexible Fourier method of Andersen and Bollerslev (1997), and the periodic GARCH model of Bollerslev and Ghysels (1996). Unfortunately I have found none of these, or closely related models particularly easy to work with. More recently, Engle and Sokalska (2012) (henceforth ES2012) introduced the multiplicative component GARCH model as a parsimonious alternative, which I have now included in the **rugarch** package (ver 1.01-6). This article discusses its implementation, challenges and specific details of working with this model, which allows a rather simple but powerful way to use GARCH for regularly spaced intraday returns.

Consider the continuously compounded return \( r_{t,i} \), where \( t \) denotes the day and \( i \) the regularly spaced time interval at which the return was calculated. Under this model, the conditional variance is a multiplicative product of daily, diurnal and stochastic (intraday) components, so that the return process may be represented as:

\[ \begin{gathered}

{r_{t,i}} = {\mu _{t,i}} + {\varepsilon _{t,i}}\\

{\varepsilon _{t,i}} = \left( {{q_{t,i}}{\sigma _t}{s_i}} \right){z_{t,i}}

\end{gathered} \]

where \( q_{t,i} \) is the stochastic intraday volatility, \( \sigma_t \) a daily exogenously determined forecast volatility, \( s_i \) the diurnal volatility in each regularly spaced interval \( i \), \( z_{t,i} \) the i.i.d (0,1) standardized innovation which conditionally follows some appropriately chosen distribution. In ES2012, the forecast volatility \( \sigma_t \) is derived from a multifactor risk model externally, but it is just as possible to generate such forecasts from a daily GARCH model. The seasonal (diurnal) part of the process is defined as:

\[ {s_i} = \frac{1}{T}\sum\limits_{t = 1}^T {\left( {\varepsilon _{_{t,i}}^2/\sigma _t^2} \right)}. \]

Dividing the residuals by the diurnal and daily volatility gives the normalized residuals (\( \bar\varepsilon \)):

\[ {{\bar \varepsilon }_{t,i}} = {\varepsilon _{t,i}}/\left( {{\sigma _t}{s_i}} \right) \]

which may then be used to generate the stochastic component of volatility \( q_{t,i} \) with GARCH motion dynamics. In the **rugarch** package, unlike the paper of ES2012, the conditional mean and variance equations (and hence the diurnal component on the residuals from the conditional mean filtration) are estimated jointly. Furthermore, and unlike ES2012, it is possible to include ARMAX dynamics in the conditional mean, though because of the complexity of the model and its use of time indices, ARCH-m is not currently allowed. Finally, as an additional departure from ES2012, the diurnal component in the **rugarch** package is estimated using the median rather than the mean function (since version 1.2-3), providing a more robust alternative given the type and length of the data typically used. The next sections provide a demonstration of the model using the Citigroup dataset.

Unlike all other GARCH models implemented in **rugarch**, the mcsGARCH model requires the user to pass an xts object of the forecast daily variance of the data for the period under consideration.

# Find the unique days in the intraday sample n = length(unique(format(index(R_i), '%Y-%m-%d'))) # define a daily spec spec_d = ugarchspec(mean.model = list(armaOrder = c(1, 1)), variance.model = list(model = 'eGARCH', garchOrder = c(2, 1)), distribution = 'nig') # use the ugarchroll method to create a rolling forecast for the period in # question: roll = ugarchroll(spec_d, data = R_d['/2008-02-29'], forecast.length = n, refit.every = 5, refit.window = 'moving', moving.size = 2000, calculate.VaR = FALSE) # extract the sigma forecast df = as.data.frame(roll) f_sigma = as.xts(df[, 'Sigma', drop = FALSE]) # now estimate the intraday model spec = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'mcsGARCH'), distribution = 'nig') # DailyVar is the required xts object of the forecast daily variance fit = ugarchfit(data = R_i, spec = spec, DailyVar = f_sigma^2)

The following plots show the decomposition of the volatility into its various components. The regular pattern of the Total volatility would have been impossible to capture using standard GARCH models. Note that although the volatility series are stored as xts objects, they cannot be properly plotted using the standard plot.xts function which is why I make use of the axis function with a numeric series.

ep < - axTicksByTime(fit@model$DiurnalVar) par(mfrow = c(4, 1), mar = c(2.5, 2.5, 2, 1)) plot(as.numeric(fit@model$DiurnalVar^0.5), type = 'l', main = 'Sigma[Diurnal]', col = 'tomato1', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid() plot(as.numeric(fit@model$DailyVar^0.5), type = 'l', main = 'Sigma[Daily-Forecast]', col = 'tomato2', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid() plot(fit@fit$q, type = 'l', main = 'Sigma[Stochastic]', col = 'tomato3', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid() plot(as.numeric(sigma(fit)), type = 'l', main = 'Sigma[Total]', col = 'tomato4', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid()

The biggest challenge in writing code for the forecast was dealing with the aligning and matching of times, particularly future time/dates, since the model depends on the diurnal component which is time specific. As a key component of the forecast routine, I wrote a little function which creates a sequence of time/dates, similar to seq.POSIXt, but with the extra option of defining the time interval which dictates the start and end of the trading day. For example, considering the opening and closing times of the NYSE, 09:30 to 16:00, I would like to be able to create a set of n future periods starting from T0 within only this interval, and excluding weekends. The function is defined as:

**ftseq(T0, length.out, by, interval, exclude.weekends = TRUE)**

where *T0* is a POSIXct date/time of the starting period, *length.out* the periods ahead to consider, *by* the *difftime* (e.g. “mins”), and interval a character vector of the start and end times which *T0* must belong to and is a multiple of *by*.

# create the interval interval = format(seq(as.POSIXct('2008-01-02 09:31:00'), as.POSIXct('2008-01-02 16:00:00'), by = 'min'), '%H:%M:%S') # ForcTime = ftseq(T0 = as.POSIXct('2008-02-29 16:00:00'), length.out = 390 * 2 + 1, by = 'mins', interval = interval) tail(ForcTime, 25) ## [1] '2008-03-04 15:37:00 GMT' '2008-03-04 15:38:00 GMT' ## [3] '2008-03-04 15:39:00 GMT' '2008-03-04 15:40:00 GMT' ## [5] '2008-03-04 15:41:00 GMT' '2008-03-04 15:42:00 GMT' ## [7] '2008-03-04 15:43:00 GMT' '2008-03-04 15:44:00 GMT' ## [9] '2008-03-04 15:45:00 GMT' '2008-03-04 15:46:00 GMT' ## [11] '2008-03-04 15:47:00 GMT' '2008-03-04 15:48:00 GMT' ## [13] '2008-03-04 15:49:00 GMT' '2008-03-04 15:50:00 GMT' ## [15] '2008-03-04 15:51:00 GMT' '2008-03-04 15:52:00 GMT' ## [17] '2008-03-04 15:53:00 GMT' '2008-03-04 15:54:00 GMT' ## [19] '2008-03-04 15:55:00 GMT' '2008-03-04 15:56:00 GMT' ## [21] '2008-03-04 15:57:00 GMT' '2008-03-04 15:58:00 GMT' ## [23] '2008-03-04 15:59:00 GMT' '2008-03-04 16:00:00 GMT' ## [25] '2008-03-05 09:31:00 GMT'

As can be seen, the first time is immediately after T0 (T0 is not included), and the sequence only runs for the defined interval, and optionally (default TRUE) skips weekends. This comes in very handy in the forecast routine.

Like the estimation method, the forecast routine also requires that you supply the forecast volatility for the period under consideration. However, because of the possible use of the out.sample in the estimation routine, it is not known beforehand whether this will eventually be needed since there may be enough intraday data in the out.sample period and the combination of n.ahead+n.roll chosen so that this user supplied forecast is not required. If you do not supply it and it is needed the routine will check and let you know with an error message. Finally, the presence of the diurnal component complicates the long run unconditional forecast of the underlying variance, so that the use of the **uncvariance** and related methods will always return the value for the component variance rather than the actual total variance (unlike the csGARCH model which returns both).

fit2 = ugarchfit(data = R_i, spec = spec, DailyVar = f_sigma^2, out.sample = 300) # won't supply DailyVar to get an error forc = ugarchforecast(fit2, n.ahead = 10, n.roll = 299) ## Error: DailyVar requires forecasts for: 2008-03-03 ...resubmit.

Notice the error which indicates we need 2008-03-03 forecast. Since we don’t have it, we re-estimate with ugarchforecast:

fit_d = ugarchfit(spec_d, data = R_d['2002/2008-02-29']) forc_d = ugarchforecast(fit_d, n.ahead = 1) f_sigma = xts(as.numeric(sigma(forc_d)), as.POSIXct('2008-03-03')) # intraday forecast forc = ugarchforecast(fit2, n.ahead = 10, n.roll = 299, DailyVar = f_sigma^2) show(forc) ## ## *------------------------------------* ## * GARCH Model Forecast * ## *------------------------------------* ## Model: mcsGARCH ## Horizon: 10 ## Roll Steps: 299 ## Out of Sample: 10 ## ## 0-roll forecast [T0=2008-02-29 11:00:00]: ## Series Sigma[Total] Sigma[Stochastic] ## T+1 -7.681e-06 0.0015132 0.8702 ## T+2 -7.664e-06 0.0057046 0.8718 ## T+3 -7.657e-06 0.0055551 0.8734 ## T+4 -7.654e-06 0.0058834 0.8750 ## T+5 -7.653e-06 0.0063295 0.8766 ## T+6 -7.653e-06 0.0013036 0.8781 ## T+7 -7.653e-06 0.0012846 0.8797 ## T+8 -7.653e-06 0.0011227 0.8812 ## T+9 -7.652e-06 0.0008177 0.8827 ## T+10 -7.652e-06 0.0009259 0.8842

Note that plot methods for this model are not yet fully implemented for reasons described previously.

Unlike standard GARCH simulation, the interval time is important in intraday GARCH since we are generating paths which follow very specific regularly sampled time points. Additionally, simulated or forecast daily variance needs to again be supplied for the simulation period under consideration. This is an xts object, and can also optionally have *m.sim* columns so that each independent simulation is based on the adjusted residuals by an independent simulation of the daily variance. The following example code shows the simulation of 10,000 points at 1-min intervals into the future and illustrates the effect of the seasonality component:

T0 = tail(index(R_i), 1) # model$dtime contains the set of unique interval points in the dataset # (and available from all rugarch objects for the mcsGARCH model) # model$dvalues contains the diurnal component for each interval ftime = ftseq(T0, length.out = 10000, by = fit@model$modeldata$period, interval = fit@model$dtime) dtime = unique(format(ftime, '%Y-%m-%d')) # sim_d = ugarchsim(fit_d, n.sim = length(dtime), m.sim = 1) var_sim = xts(as.matrix(sigma(sim_d)^2), as.POSIXct(dtime)) sim = ugarchsim(fit, n.sim = 10000, n.start = 0, m.sim = 1, DailyVar = var_sim, rseed = 10) # ep < - axTicksByTime(sim@simulation$DiurnalVar) par(mfrow = c(4, 1), mar = c(2.5, 2.5, 2, 1)) plot(as.numeric(sim@simulation$DiurnalVar^0.5), type = 'l', main = 'Sigma[Diurnal]', col = 'tomato1', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid() plot(as.numeric(sim@simulation$DailyVar^0.5), type = 'l', main = 'Sigma[Daily-Simulated]', col = 'tomato2', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid() plot(sim@simulation$qSim[, 1], type = 'l', main = 'Sigma[Stochastic]', col = 'tomato3', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid() plot(as.numeric(sigma(sim)), type = 'l', main = 'Sigma[Total]', col = 'tomato4', xaxt = 'n', ylab = 'sigma', xlab = ' ') axis(1, at = ep, labels = names(ep), tick = TRUE) grid()

The ugarchroll function is quite useful for testing a model’s adequacy in a backtest application, and the code below illustrates this for the mcsGARCH model for the data and period under consideration.

n = length(index(R_d['2008-01-01/2008-03-01'])) spec_d = ugarchspec(mean.model = list(armaOrder = c(1, 1)), variance.model = list(model = 'sGARCH'), distribution = 'std') roll = ugarchroll(spec_d, data = R_d['/2008-02-29'], forecast.length = n, refit.every = 5, refit.window = 'moving', moving.size = 2000, calculate.VaR = FALSE) df = as.data.frame(roll) f_sigma = as.xts(df[, 'Sigma', drop = FALSE]) spec = ugarchspec(mean.model = list(armaOrder = c(1, 1), include.mean = TRUE), variance.model = list(model = 'mcsGARCH'), distribution = 'std') roll = ugarchroll(spec, data = R_i, DailyVar = f_sigma^2, forecast.length = 3000, refit.every = 390, refit.window = 'moving', moving.size = 3000, calculate.VaR = TRUE) # Generate the 1% VaR report report(roll) ## VaR Backtest Report ## =========================================== ## Model: mcsGARCH-std ## Backtest Length: 3000 ## ========================================== ## alpha: 1% ## Expected Exceed: 30 ## Actual VaR Exceed: 33 ## Actual %: 1.1% ## ## Unconditional Coverage (Kupiec) ## Null-Hypothesis: Correct Exceedances ## LR.uc Statistic: 0.294 ## LR.uc Critical: 3.841 ## LR.uc p-value: 0.588 ## Reject Null: NO ## ## Conditional Coverage (Christoffersen) ## Null-Hypothesis: Correct Exceedances and ## Independence of Failures ## LR.cc Statistic: 1.028 ## LR.cc Critical: 5.991 ## LR.cc p-value: 0.598 ## Reject Null: NO

Not bad at all. Who say’s GARCH models are not good!?

The VaRplot function has been adjusted to work nicely with intraday data as shown below. The spikes in the VaR observed are the result of the seasonal component around the opening of trading.

D = as.POSIXct(rownames(roll@forecast$VaR)) VaRplot(0.01, actual = xts(roll@forecast$VaR[, 3], D), VaR = xts(roll@forecast$VaR[,1], D))

It is quite ‘easy’ to add additional GARCH flavors to the multiplicative model such as the eGARCH, GJR etc, and might do so in due course, time permitting. Another possible direction for expansion would be to treat the diurnal effect separately for each day of the week. I estimate that this would not be too hard to implement, providing a marginal slowdown in the estimation as a result of the increased lookup time for the matching of time and days.

Finally, this model is not ‘plug-and-play’, requiring some thought in the use of time & dates, and the preparation of the intraday returns. The garbage-in garbage-out rule clearly applies.

Please remember that questions about this or other issues with **rugarch** should be addressed to the r-sig-finance mailing list.

Bollerslev, T., & Ghysels, E. (1996). Periodic autoregressive conditional heteroscedasticity. Journal of Business & Economic Statistics, 14(2), 139–151.

Andersen, T. G., & Bollerslev, T. (1997). Intraday periodicity and volatility persistence in financial markets. Journal of Empirical Finance, 4(2), 115–158.

Engle, R. F., & Sokalska, M. E. (2012). Forecasting intraday volatility in the US equity market. Multiplicative component GARCH. Journal of Financial Econometrics, 10(1), 54–83.

]]>The main extractor method is no longer the **as.data.frame**, but instead, **sigma** and **fitted** will now extract the conditional sigma (GARCH) and mean (ARFIMA) values from all objects. The old extractor methods are now mostly deprecated, with the exception of certain classes where it still makes sense to use them (i.e. uGARCHroll, uGARCHdistribution, uGARCHboot).

library(rugarch) data(sp500ret) fit = ugarchfit(ugarchspec(), sp500ret, out.sample = 10) c(is(sigma((fit))), is(fitted(fit)), is(residuals(fit))) ## [1] 'xts' 'xts' 'xts' ## plot(xts(fit@model$modeldata$data, fit@model$modeldata$index), auto.grid = FALSE, minor.ticks = FALSE, main = 'S&P500 Conditional Mean') lines(fitted(fit), col = 2) grid()

plot(xts(abs(fit@model$modeldata$data), fit@model$modeldata$index), auto.grid = FALSE, minor.ticks = FALSE, main = 'S&P500 Conditional Sigma', col = 'grey') lines(sigma(fit), col = 'steelblue') grid()

Apart from getting the nice xts charts, **rugarch** can now handle any type of time-formatted data (which can be coerced to xts) including intraday.

A key change has also been made to the output of the forecast class (uGARCHforecast) which merits particular attention. Because **rugarch** allows to combine both rolling and unconditional forecasts, this creates a rather challenging problem in how to meaningfully output the results, with simple extractor methods such as **sigma** and **fitted**. The output in this case will be an n.ahead by (n.roll+1) matrix, where the column headings are now the *T+0* dates, since they will always exist to use within a forecast framework, whilst the row names are labelled as *T+1*, *T+2*, …,*T+n.ahead*. The following example illustrates, and shows some simple solutions to deal with portraying n.ahead dates which are completely in the future (i.e. not in the available out of sample testing period).

f = ugarchforecast(fit, n.ahead = 25, n.roll = 9) sf = sigma(f) head(sf) ## 2009-01-15 2009-01-16 2009-01-20 2009-01-21 2009-01-22 2009-01-23 2009-01-26 2009-01-27 2009-01-28 2009-01-29 ## T+1 0.02176 0.02078 0.02587 0.02730 0.02646 0.02519 0.02400 0.02301 0.02388 0.02487 ## T+2 0.02171 0.02073 0.02580 0.02722 0.02638 0.02512 0.02393 0.02295 0.02382 0.02480 ## T+3 0.02166 0.02069 0.02573 0.02714 0.02630 0.02505 0.02387 0.02289 0.02375 0.02473 ## T+4 0.02160 0.02064 0.02565 0.02706 0.02623 0.02498 0.02380 0.02283 0.02369 0.02466 ## T+5 0.02155 0.02059 0.02558 0.02697 0.02615 0.02491 0.02374 0.02277 0.02363 0.02459 ## T+6 0.02150 0.02054 0.02550 0.02689 0.02607 0.02484 0.02368 0.02271 0.02356 0.02452

Therefore, the column headings are the *T+0* dates i.e. the date at which the forecast was made. There are 10 columns since the n.roll option is zero based. To create an xts representation of only the rolling output is very simple:

print(sf[1, ]) ## 2009-01-15 2009-01-16 2009-01-20 2009-01-21 2009-01-22 2009-01-23 2009-01-26 2009-01-27 2009-01-28 2009-01-29 ## 0.02176 0.02078 0.02587 0.02730 0.02646 0.02519 0.02400 0.02301 0.02388 0.02487 ## print(f.roll < - as.xts(sf[1, ])) ## [,1] ## 2009-01-15 0.02176 ## 2009-01-16 0.02078 ## 2009-01-20 0.02587 ## 2009-01-21 0.02730 ## 2009-01-22 0.02646 ## 2009-01-23 0.02519 ## 2009-01-26 0.02400 ## 2009-01-27 0.02301 ## 2009-01-28 0.02388 ## 2009-01-29 0.02487

This gives the *T+0* dates. If you want to generate the actual **T+1** rolling dates, use the **move** function on the *T+0* dates which effectively moves the dates 1-period ahead, and appends an extra 1 day to the end (which is not in the sample date set, therefore this is a ‘generated’ date):

print(f.roll1 < - xts(sf[1, ], move(as.POSIXct(colnames(sf)), by = 1))) ## [,1] ## 2009-01-16 0.02176 ## 2009-01-20 0.02078 ## 2009-01-21 0.02587 ## 2009-01-22 0.02730 ## 2009-01-23 0.02646 ## 2009-01-26 0.02519 ## 2009-01-27 0.02400 ## 2009-01-28 0.02301 ## 2009-01-29 0.02388 ## 2009-01-30 0.02487

For the n.ahead forecasts, I have included in the **rugarch** package another simple date function called **generatefwd** which can be used to generate future non-weekend dates. To use this, you will need to know that the model slot of any class object in **rugarch** which took a method with a data option will keep that data, its format and periodicity, which can be used as follows:

args(generatefwd) ## function (T0, length.out = 1, by = 'days') ## NULL print(fit@model$modeldata$period) ## Time difference of 1 days ## i = 1 DT0 = generatefwd(T0 = as.POSIXct(colnames(sf)[i]), length.out = 25, by = fit@model$modeldata$period) print(fT0 < - xts(sf[, i], DT0)) ## [,1] ## 2009-01-16 0.02176 ## 2009-01-19 0.02171 ## 2009-01-20 0.02166 ## 2009-01-21 0.02160 ## 2009-01-22 0.02155 ## 2009-01-23 0.02150 ## 2009-01-26 0.02145 ## 2009-01-27 0.02139 ## 2009-01-28 0.02134 ## 2009-01-29 0.02129 ## 2009-01-30 0.02124 ## 2009-02-02 0.02119 ## 2009-02-03 0.02114 ## 2009-02-04 0.02109 ## 2009-02-05 0.02104 ## 2009-02-06 0.02099 ## 2009-02-09 0.02094 ## 2009-02-10 0.02089 ## 2009-02-11 0.02084 ## 2009-02-12 0.02079 ## 2009-02-13 0.02075 ## 2009-02-16 0.02070 ## 2009-02-17 0.02065 ## 2009-02-18 0.02060 ## 2009-02-19 0.02055

The same applies to the fitted method. This concludes the part on the uGARCHforecast class and the changes which have been made. More details can always be found in the documentation (if you are wondering how to obtain documentation for an S4 class object such as uGARCHforecast, you need to append a ‘-class’ to the end and enclose the whole thing in quotes e.g. help(‘uGARCHforecast-class’)).

The **quantile** method extracts the conditional quantiles of a **rugarch** object, subject to an extra option (probs) as in the S3 class method in *stats*, while **pit** calculates and returns the probability integral transformation of a fitted, filtered or rolling object (objects guaranteed to have ‘realized’ data to work with).

head(quantile(fit, c(0.01, 0.025, 0.05))) ## q[0.01] q[0.025] q[0.05] ## 1987-03-10 -0.02711 -0.02276 -0.01901 ## 1987-03-11 -0.02673 -0.02247 -0.01881 ## 1987-03-12 -0.02549 -0.02141 -0.01791 ## 1987-03-13 -0.02449 -0.02058 -0.01722 ## 1987-03-16 -0.02350 -0.01972 -0.01647 ## 1987-03-17 -0.02271 -0.01903 -0.01586 ## head(pit(fit)) ## pit ## 1987-03-10 0.7581 ## 1987-03-11 0.4251 ## 1987-03-12 0.5973 ## 1987-03-13 0.3227 ## 1987-03-16 0.2729 ## 1987-03-17 0.9175

These should prove particularly useful in functions which require the quantiles or PIT transformation, such as the risk (**VaRTest** and **VaRDurTest**) and misspecification tests (**BerkowitzTest** and **HLTest**).

rugarch exports a set of functions for working with certain measures on the conditional distributions included in the package. Some of these functions have recently been re-written to take advantage of vectorization, whilst others re-written to take advantage of analytical representations (e.g. as regards to skewness and kurtosis measures for the sstd and jsu distributions). In the latest release, I’ve also included some functions which provide for a graphical visualization of the different distributions with regards to their higher moment features, and demonstrated below:

distplot('nig') ## Warning: skew lower bound below admissible region...adjusting to ## distribution lower bound.

The **distplot** function, available for all skewed and/or shaped distributions provides a visual 3D representation of the interaction of the skew and shape parameters in determining the Skewness and Kurtosis. In cases where the distribution is only skewed or shaped, a simpler 2D plot is created as in the case of the *std* distribution:

distplot('std')

Another interesting plot is that of a distribution’s ‘authorized domain’, which shows the region of Skewness-Kurtosis for which a density exists. This is related to the Hamburger moment problem, and the maximum attainable Skewness (S) given kurtosis (K) ( see Widder (1946) ) . The **skdomain** function returns the values needed to visualize these relationships, and demonstrated below:

# plot the first one XYnig = skdomain('nig', legend = FALSE) XYsstd = skdomain('sstd', plot = FALSE) XYhyp = skdomain('ghyp', plot = FALSE, lambda = 1) XYjsu = skdomain('jsu', plot = FALSE) # the values returned are the bottom half of the domain (which is # symmetric) lines(XYjsu$Kurtosis, XYjsu$Skewness, col = 3, lty = 2) lines(XYjsu$Kurtosis, -XYjsu$Skewness, col = 3, lty = 2) lines(XYsstd$Kurtosis, XYsstd$Skewness, col = 4, lty = 3) lines(XYsstd$Kurtosis, -XYsstd$Skewness, col = 4, lty = 3) lines(XYhyp$Kurtosis, XYhyp$Skewness, col = 5, lty = 4) lines(XYhyp$Kurtosis, -XYhyp$Skewness, col = 5, lty = 4) legend('topleft', c('MAX', 'NIG', 'JSU', 'SSTD', 'HYP'), col = c(2, 'steelblue', 3, 4, 5), lty = c(1, 1, 2, 3, 4), bty = 'n')

From the plot, it is clear that the skew-student has the widest possible combination of skewness and kurtosis for values of kurtosis less than ~6, whilst the NIG has the widest combination for values greater than ~8. It is interesting to note that the Hyperbolic distribution is not defined for values of kurtosis greater than ~9.

The **reduce** function eliminates non-significant coefficients (subject to a pvalue cut-off argument) from a model by fixing them to zero (in **rugarch** this is equivalent to eliminating them) and re-estimating the model with the remaining parameters as the following example illustrates:

spec = ugarchspec(mean.model = list(armaOrder = c(4, 4)), variance.model = list(garchOrder = c(2, 2))) fit = ugarchfit(spec, sp500ret[1:1000, ]) round(fit@fit$robust.matcoef, 4) ## Estimate Std. Error t value Pr(>|t|) ## mu 0.0005 0.0004 1.250 0.2111 ## ar1 0.2250 0.0164 13.677 0.0000 ## ar2 1.6474 0.0175 94.046 0.0000 ## ar3 -0.0661 0.0188 -3.517 0.0004 ## ar4 -0.8323 0.0164 -50.636 0.0000 ## ma1 -0.2328 0.0065 -35.825 0.0000 ## ma2 -1.6836 0.0039 -430.423 0.0000 ## ma3 0.0980 0.0098 9.953 0.0000 ## ma4 0.8426 0.0075 112.333 0.0000 ## omega 0.0000 0.0000 1.527 0.1267 ## alpha1 0.2153 0.1265 1.702 0.0888 ## alpha2 0.0000 0.0000 0.000 1.0000 ## beta1 0.3107 0.1149 2.705 0.0068 ## beta2 0.3954 0.0819 4.828 0.0000 ## # eliminate alpha2 fit = reduce(fit, pvalue = 0.1) print(fit) ## ## *---------------------------------* ## * GARCH Model Fit * ## *---------------------------------* ## ## Conditional Variance Dynamics ## ----------------------------------- ## GARCH Model : sGARCH(2,2) ## Mean Model : ARFIMA(4,0,4) ## Distribution : norm ## ## Optimal Parameters ## ------------------------------------ ## Estimate Std. Error t value Pr(>|t|) ## mu 0.00000 NA NA NA ## ar1 0.23153 0.006583 35.1698 0.000000 ## ar2 1.63558 0.018294 89.4050 0.000000 ## ar3 -0.07264 0.018277 -3.9743 0.000071 ## ar4 -0.82110 0.014031 -58.5206 0.000000 ## ma1 -0.23883 0.009659 -24.7253 0.000000 ## ma2 -1.68338 0.005401 -311.6821 0.000000 ## ma3 0.11434 0.017035 6.7121 0.000000 ## ma4 0.83198 0.013140 63.3179 0.000000 ## omega 0.00000 NA NA NA ## alpha1 0.14532 0.008997 16.1530 0.000000 ## alpha2 0.00000 NA NA NA ## beta1 0.26406 0.086397 3.0564 0.002240 ## beta2 0.58961 0.088875 6.6342 0.000000 ## ## Robust Standard Errors: ## Estimate Std. Error t value Pr(>|t|) ## mu 0.00000 NA NA NA ## ar1 0.23153 0.061687 3.7533 0.000175 ## ar2 1.63558 0.093553 17.4829 0.000000 ## ar3 -0.07264 0.024012 -3.0251 0.002485 ## ar4 -0.82110 0.028395 -28.9168 0.000000 ## ma1 -0.23883 0.037806 -6.3172 0.000000 ## ma2 -1.68338 0.021478 -78.3771 0.000000 ## ma3 0.11434 0.066989 1.7068 0.087851 ## ma4 0.83198 0.052093 15.9711 0.000000 ## omega 0.00000 NA NA NA ## alpha1 0.14532 0.057150 2.5428 0.010996 ## alpha2 0.00000 NA NA NA ## beta1 0.26406 0.129725 2.0356 0.041793 ## beta2 0.58961 0.103879 5.6759 0.000000 ## ## LogLikelihood : 3091 ## #####

Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence set. Econometrica, 79(2), 453-497.

Widder, D. V. (1959). The Laplace Transform. 1946. Princeton University Press, Princeton, New Jersey. Supplementary Bibliography Is. McShane, EJ,“ A canonical form for antiderivatives,” Illinois Jour, of Math, 3, 334-351.