Archives for December 2013

A Review of Risk Parity

What is risk parity (RP)?

Simply put, it is a method of allocating equal risk shares to each asset in the portfolio. In more traditional allocation schemes, equity, being the riskiest asset (and hence providing the highest reward), has typically received the lion’s share. With RP, equalization of risk contribution means that equity and other similarly risky assets receive a reduced allocation and low risk assets such as government bonds an increased allocation. As a result, in order to achieve ‘equity-like’ total returns, leverage has typically been used in this context, possibly using a target risk level. Under certain circumstances, namely when all assets have the same risk-to-reward ratios, the RP allocation is equivalent to the tangent portfolio. In the case where only the risks are equal, then the RP allocation is generally equivalent to the equal weight (1/n) portfolio, whilst it coincides with the minimum risk portfolio when cross diversification is highest (though that may be a tricky concept to describe when going beyond the mean-variance paradigm).

Of course, like all portfolio problems, we are speaking of our expectations of future risk. And this is one of the key benefits, argue the proponents of RP, since variance is much more stable and easier to forecast than the returns. Therefore, lying somewhere between the minimum risk and optimal risk portfolios, making use of only the ‘risk’ rather than the reward in the calculation, and backed up by an apparently stellar performance resulting from the recent decline in US Treasury rates, one might wonder why portfolio allocation is still taught at university.

In this author’s opinion, RP is not a proper model of asset allocation, does not contain that key ingredient called an active forecast which is after all what managers are rewarded for producing, and is likely to create complacency because of its oversimplified approach to the problem of forecast uncertainty. In this article I provide 2 applications based on different datasets with different characteristics in order to highlight some issues with the RP approach. I examine different risk\(^1\) measures, within a simple parametric DGP framework for generating forecasts, and find that even with such a simple approach RP cannot outperform the optimal risk-reward strategy or even the minimum risk strategy in a well diversified universe.

Initial Formalities

Formally, consider the marginal contribution to risk (MCR) of each asset (\(i\) of \(n\)) given a risk measure \( \rho\left(x\right) \):
MC{R_i} = \frac{{\partial \rho \left( x \right)}}{{\partial {x_i}}}
which when multiplied by the asset’s share and summed leads to the total portfolio risk (TR):
TR = \sum\limits_{\forall i} {{x_i}MC{R_i}}
One way to solve the general RP problem was already suggested by Maillard, Roncalli and Teïletche (2010) as a squared error minimization type problem which can be easily solved with an SQP solver:
\mathop {{\text{min}}}\limits_{\mathbf{x}} {\text{ }}\sum\limits_{j = 1}^n {\sum\limits_{i = 1}^n {{{\left( {{x_i}MC{R_i} – {x_j}MC{R_j}} \right)}^2}} }\\
s.t. \\
\sum\limits_{\forall i} {{x_i} = b} \\
{\mathbf{x}} \geqslant 0 \\
where \( b \) is the budget constraint. It should be clear that MCR is just the gradient of the risk function. Since all NLP risk functions in the parma package have analytic gradients (see the vignette for details), then the extension of RP to other measures beyond variance is quite trivial. For the case of variance, a fast algorithm using Newton’s method proposed by Chaves, Hsu and Shakernia (2012) is also available and tests conducted indicate that it is upto 100x faster than the equivalent SQP optimization (though we are already talking about tenths of a second anyway).

Set 1: Low Diversity Universe

The dataset consists of weekly (Friday) returns of international equity ETFs for the period November-2000 to December 2013. It is a highly correlated dataset with low diversification possibilities in a long-only setup. As such, it serves to illustrate the similarities between the equally weighted and RP portfolios.

Figure 1 aptly illustrates the very high correlation of the dataset whilst Figure 2 provides an indication of the risk and risk-return profile of the dataset, though one should bear in mind that this is for the entire period and does not necessarily reflect the situation at any particular point in time.







The backtest uses a static normal copula to model the covariance, with first stage AR(1)-GARCH(1,1)-JSU dynamics. The choice of a static rather than dynamic correlation model was motivated by the size of the dataset which is quite small, whilst for the conditional first stage GARCH dynamics the JSU distribution was used to account for univariate higher moments. Finally, and again motivated by the size of the dataset, an empirical transformation was used (see Genest, Ghoudi and Rivest (1995) and the rmgarch vignette for more details). At each time T (a Friday), data from 1:T was used (hence an expanding window choice) for the modelling, after which the T+1 simulated forecast density (the scenario) was generated for use in the optimization model. For the Mean-Variance model, the covariance of the simulated T+1 forecast density was used, whilst for all other risk measures the actual scenario was used in the optimization.
Three models were evaluated: the minimum risk (MR), risk parity (RP) and optimal risk-reward (OPT) using fractional programming. Within those models, four risk measures were evaluated: mean-variance (EV), mean absolute deviation (MAD) and lower partial moments of orders 2 and 4 (LPM2 and LPM4) representing different aversions to below threshold losses\( ^{2} \).
The weights generated by each model where then used to buy the underlying assets at the close of the day after (i.e. Monday) and held until the following re-formation period (i.e. the next Monday). In this way, there was no contemporaneous signal-portfolio formation bias, which may be significant for weekly and higher frequency models. Trading costs were not included nor was price impact and other related costs.

Table 1 provides a summary of the results (using the equal weight as the benchmark), with the key points being:

  • Among the allocation models, there is a negligible difference between the different risk measures. This may represent the frequency of the dataset and/or the DGP used (normal copula).
  • The RP and equal weight portfolios appear very close, as might be expected from the very close ex-ante variance of the assets and high correlation.
  • The MR portfolio shows dismal performance, and this is in line with the little diversification potential of this dataset.
  • The OPT portfolio appears superior to the RP model on all measures. Statistical significance of this statement was not checked, and with the exception of the Sharpe ratio one is challenged to do so for a large number of measures even on moderately sized datasets.



Finally, Figure 3 provides the ‘eye-catching’ illustration of terminal wealth trajectories of the various allocation models under the variance risk measure.




Set 2: Higher Diversity Universe

This is a more typical dataset for active asset allocation based on a more diverse universe of asset classes, though we are still limited in this application by readily available instruments and history within an open-data paradigm. The set covers the period June-2006 to December-2013, and I have again used weekly (Friday) returns. The backtest follows the exact methodology of Set 1.

Figure 4 displays good diversification potential in terms of correlations, whilst Figure 5 provides an indication of the risk and risk-return profile of the dataset, with the same caveats as mentioned for Figure 2.







The results in Table 2 provide for some very interesting insights. The higher diversification potential of this dataset has resulted in an exact opposite ranking for the MR portfolio which now has the highest Sharpe ratio, though a much lower CAGR, and the OPT portfolio again beats the RP portfolio. However, the RP portfolio now comfortably outperforms the equal weight portfolio, in line with expectations, given that the risks in the unconditional dataset were not very close (unlike set 1).






It is certainly fashionable to publish about uncertainty of inputs in the portfolio literature. Michaud (1989) did it, DeMiguel, Garlappi and Uppal (2009) did it (see this blog for a critique), and so have the proponents of Risk Parity. I don’t see how an investment manager can justify the use of any of these methodologies since he is rewarded for the quality of his inputs (active bets), though they can and certainly should serve a role in sensitivity analysis. If everyone could just allocate resources without a forecast then we would not need investment/resource managers. Even with simple econometric based forecasts such as those generated from a ‘simple’ AR(1) model we still managed to beat the RP and EW portfolios. Perhaps a little more attention should be given to the modelling of the underlying dynamics and a little less to ‘fashionable’ trends in asset allocation and catchy phrases meant to provide yet another sales drive, particularly given the resulting highly levered bets on a certain asset class whose rally depends heavily on a printing press which is fast running out of steam.


If you want the code to replicate the results or the RP code for use with parma, contact me by email stating your real name and affiliation.


Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research logistics quarterly, 9(3-4), 181-186.

Chaves, D., Hsu, J., Li, F., & Shakernia, O. (2012). Efficient Algorithms for Computing Risk Parity Portfolio Weights. The Journal of Investing, 21(3), 150-163.

DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/N portfolio strategy?. Review of Financial Studies, 22(5), 1915-1953.

Genest, C., Ghoudi, K., & Rivest, L. P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82(3), 543-552.

Maillard, S., Roncalli, T., & Teïletche, J. (2010). The Properties of Equally Weighted Risk Contribution Portfolios. The Journal of Portfolio Management, 36(4), 60-70.

Michaud, R. O. (1989). The Markowitz optimization enigma: is’ optimized’optimal?. Financial Analysts Journal, 31-42.

Stoyanov, S. V., Rachev, S. T., & Fabozzi, F. J. (2007). Optimal financial portfolios. Applied Mathematical Finance, 14(5), 401-436.


\( ^{1} \) Risk, despite the numerous papers written on risk parity including the seminal one by Maillard, Roncalli and Teïletche (2010) (although the concept appears to have originated as far back as 1996 with Ray Dalio of Bridgewater Associates, with the actual term ‘Risk Parity’ coined by E.Qian of PanAgora in 2006) is certainly not equivalent to variance. It can be any number of measures deemed appropriate (see the parma vignette section for a discussion of risk and deviation measures), and this blog article shows how the parma package can be used to calculate RP under a number of different measures in addition to variance, including Mean Absolute Deviation (MAD) and Lower Partial Moments (LPM).
\( ^{2} \) The threshold was based on the portfolio mean which is a case with especially desirable properties as discussed in the parma vignette.

A note on the co-moments in the IFACD model

The Independent Factor Autoregressive Conditional Density (IFACD) model of Ghalanos, Rossi and Urga (2014) uniquely, in its class of parametric models, generates time varying higher co-moment forecasts, as a consequence of the ACD specification of the conditional density of the standardized innovations. In this short note I discuss in more detail the properties of the conditional co-moments of this model, certain interesting properties as relates to the higher moment CAPM and a fast algorithm for populating these very large flattened tensors.

Conditional Co-Moments

The conditional co-moments of \( \mathbf{r}_t \) of order 3 and 4 are represented as tensor matrices
\[ \label{eq1}
\mathbf{M}_{t}^3 = \mathbf{A} \mathbf{M}_{f,t}^3(\mathbf{A} \otimes \mathbf{A})’, \quad \\
\mathbf{M}_{t}^4 = \mathbf{A} \mathbf{M}_{f,t}^4(\mathbf{A} \otimes \mathbf{A} \otimes \mathbf{A})’
where \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \) are the \( N \times N^2 \) conditional third comoment matrix and the \( N \times N^3 \) conditional fourth comoment matrix of the factors, respectively. \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \), are defined as
\[ \begin{eqnarray}
\mathbf{M}_{f,t}^3 & =&
\mathbf{M}_{f,t}^4 & = &
where \( \mathbf{M}_{k,f,t}^3, k=1,\ldots,N \) and \( \mathbf{M}_{kl,f,t}^4, k,l=1,\ldots,N \) are the \( N\times N \) submatrices of \( \mathbf{M}_{f,t}^3 \) and \( \mathbf{M}_{f,t}^4 \), respectively, with elements
m_{ijk,f,t}^3 & = & E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] \\
m_{ijkl,f,t}^4 & = & E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}].
Since the factors \( f_{it} \) can be decomposed as \( z_{it}\sqrt{h_{it}} \), and given the assumptions on \( z_{it} \), then \( E[f_{i,t}f_{j,t}f_{k,t}|\mathfrak{F}_{t-1}] = 0\ \). It is also true that for \( i \neq j\neq k \neq l \), then \( E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = 0 \) and when \( i=j \) and \( k=l \), then \( E[f_{i,t}f_{j,t}f_{k,t}f_{l,t}|\mathfrak{F}_{t-1}] = h_{it}h_{kt}. \)
Thus, under the assumption of mutual independence, all elements in the conditional co-moments matrices with at least 3 different indices are zero. Finally, we standardize the conditional co-moments to obtain conditional coskewness and cokurtosis of \( \mathbf{r}_t \)
\mathbf{S}_{ijk,t} = \frac{m_{ijk,t}^3}{({\sigma_{i,t}}{\sigma _{j,t}}{\sigma _{k,t}})}, \quad \\
\mathbf{K}_{ijkl,t} = \frac{m_{ijkl,t}^4}{({\sigma_{i,t}}{\sigma _{j,t}}{\sigma _{k,t}}{\sigma _{l,t}})},
where \( \mathbf{S}_{ijk,t} \) represents the coskewness between elements \( i,j,k \) of \( \mathbf{r}_t \), \( \sigma_{i,t} \) the standard deviation of \( \mathbf{r}_{i,t} \), and in the case of \( i=j=k \) represents the skewness of asset \( i \) at time \( t \), and similarly for the cokurtosis tensor \( \mathbf{K}_{ijkl,t} \)

Location of non-zero entries

The flattened tensors grow quite quickly in size as n (factors) and m (moment) become larger. Populating the factor matrices with the values from the ACD dynamics in a fast and efficient manner is key if this model is to be called ‘feasible’ and estimation ‘large-scale’. For the third co-moment matrix, this is very simply since the column based vectorized index of the location of non-zero entries is found to be \( ((1:n)-1)n^2 + ((1:n)-1)n + (1:n) \).
These represent the (column based vectorized) location in the third co-moment matrix of the factor unstandardized skewness estimated from the ACD model (and determined jointly by the skew and shape dynamics). In the case of the fourth co-moment matrix of the factors, the situation is slightly more involved since in addition to the entries \( \{i=j=k=l\} \) for which the column based vectorized location is found to be \( ((1:n)-1)n^3 + ((1:n)-1)n^2 + ((1:n)-1)n + (1:n) \), there are also the entries \( \{i=j,k=l\} \) to consider as discussed in the previous section. To this end, consider the \( n\times n^3 \) matrix of the flattened tensor \( M_{f,ijkl}^4 \), with the indices illustrated as in the \( 4\times4^3 \) matrix below:

\[ \small
{j:} & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & {…} & 4 & 4 \\
{k:} & 1 & 1 & 1 & 1 & 2 & 2 & 2 & 2 & {…} & 4 & 4 \\
{l:} & 1 & 2 & 3 & 4 & 1 & 2 & 3 & 4 & {…} & 3 & 4 \\
{i:1} & {\{ 1111\} } & {\{ 1112\} } & {\{ 1113\} } & {\{ 1114\} } & {\{ 1121\} } & {\{ 1122\} } & {\{ 1123\} } & {\{ 1124\} } & {…} & {\{ 1443\} } & {\{ 1444\} } \\
{i:2} & {\{ 2111\} } & {\{ 2112\} } & {\{ 2113\} } & {\{ 2114\} } & {\{ 2121\} } & {\{ 2122\} } & {\{ 2123\} } & {\{ 2124\} } & {…} & {\{ 2443\} } & {\{ 2444\} } \\
{i:3} & {\{ 3111\} } & {\{ 3112\} } & {\{ 3113\} } & {\{ 3114\} } & {\{ 3121\} } & {\{ 3122\} } & {\{ 3123\} } & {\{ 3124\} } & {…} & {\{ 3443\} } & {\{ 3444\} } \\
{i:4} & {\{ 3111\} } & {\{ 4112\} } & {\{ 4113\} } & {\{ 4114\} } & {\{ 3121\} } & {\{ 4122\} } & {\{ 4123\} } & {\{ 4124\} } & {…} & {\{ 4443\} } & {\{ 4444\} } \\
We are interested in finding the index location of all the pairs where \( \{i=j,k=l\} \) and their permutations, e.g. {1122},{1212},{2211},{2112},{1221},{2121}.
To do this, we first note the following:

  • There are \( \frac{n!}{2(n-2)!} \) unique pairs.
  • Each pair has 6 permutations.
  • The number of unique pairs whose difference is \( d \) is \( n-d \) e.g. for \( d=1 \),\( n=4 \), we have 3 unique pairs {2,1},{3,2} and {4,3}.
  • The first pair has columnwise vector based index \( v=n+2 \) e.g. for \( n=4 \), the first pair is {2112} in the example matrix above with columnwise vector based index of 6.
  • The \( p^{th} \) unique pair, representing the first in the set of pairs with \( n-d \) differences, has vector based index \( (p-1)+(n+1) \), which given the starting pair with index \( n+2 \) means that we can calculate the indices of each unique pair in the matrix given the previous unique pair’s position.
  • Within each \( p^{th} \) unique pair, there are 6 permutations. Starting from the index of the first pair in the series of 6, denoted \( v_1 \), the second permutation has index \( v_1+((n-1)n)d \), the third \( v_2+(n-1)d \), the fourth \( v_3+d(n+1)(n-1)^2 \), the fifth \( v_4+(n-1)d \) and the sixth \( v_5+((n-1)n)d \).
  • The first pair of the next set of pairs whose difference is \( n-d \) has and index which is equal to \( n^3+n^2+n+1 \) more than the previous pair.

This fast method for calculating the location of each entry and populating it accordingly is already implemented in the rmgarch package for the GO-GARCH (NIG/GH) model. In addition, in order to avoid memory problems when it comes time to perform the kronecker multiplications using the mixing matrix \( \mathbf{A} \) on the factor higher co-moments to arrive at the asset higher co-moments, the method of Buis amd Dyksen (1996) is used which is also implemented in the klin package.

The Higher Moment Time Varying Statistical Factor CAPM

A very interesting application made possible as a result of the model’s properties is the estimation of the (higher moment) time varying betas from the CAPM model. Consider a universe of \( n \) assets with returns \( \mathbf{r}_t \), with benchmark \( b \) whose return is \( r_{b,t} \) and determined by a linear combination based on a pre-specified weighting vector \( \mathbf{w}_t \):
r_{b,t} = \mathbf{w}_t’ \mathbf{r}_t
such that \( \mathbf{w}_t’\mathbf{1}=1 \). It could be the case that \( \mathbf{w}_t \) represents the weights of the benchmark index or some other pre-deteremined weighting scheme (e.g. equal weight). To calculate the CAPM betas with respect to the benchmark, we need not model the pairwise combination of assets-benchmark but instead only the assets which comprise the benchmark. The following formulae outline the steps:

  • Define the conditional covariance: \( M_t^2=AH_tA’ \), where \( {{\mathbf{H}}_{\mathbf{t}}} = E\left[ {{f_t}{{f’}_t}{{\left| \Im \right.}_{t – 1}}} \right], \in {\mathbb{R}^{N \times N}} \)
  • Define the conditional 3rd co-moment: \( {\mathbf{M}}_{\mathbf{t}}^3{\mathbf{ = AM}}_{{\mathbf{f,t}}}^{\mathbf{3}}\left( {{\mathbf{A’}} \otimes {\mathbf{A’}}} \right), \in {\mathbb{R}^{N \times {N^2}}} \)
  • Define the conditional 4th co-moment: \( {\mathbf{M}}_{\mathbf{t}}^4{\mathbf{ = AM}}_{{\mathbf{f,t}}}^4\left( {{\mathbf{A’}} \otimes {\mathbf{A’}} \otimes {\mathbf{A’}}} \right), \in {\mathbb{R}^{N \times {N^3}}} \)

The conditional beta Covariance:
{\beta _{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right)\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)} \right]}}
{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^2}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^2{\mathbf{w}}}}
where \( {\mathbf{m}}_{i,t}^2: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^{\mathbf{2}}\left( {i = 1,…,N} \right) \).

The conditional beta Coskewness:
{s_{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right){{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^2}} \right]}}
{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^3}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^3\left( {{\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
{{{\mathbf{w’M}}_{\mathbf{t}}^{\mathbf{3}}\left( {{\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
where \( {\mathbf{m}}_{i,t}^3: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^3\left( {i = 1,…,N} \right) \).

The conditional beta Cokurtosis:
{{\text{k}}_{i,t}} = \frac{{E\left[ {\left( {{R_{i,t}} – {{\bar R}_{i,t}}} \right){{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^3}} \right]}}
{{E\left[ {{{\left( {{R_{m,t}} – {{\bar R}_{m,t}}} \right)}^4}} \right]}} = \frac{{{\mathbf{m}}_{i,t}^4\left( {{\mathbf{w}} \otimes {\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
{{{\mathbf{w’M}}_{\mathbf{t}}^4\left( {{\mathbf{w}} \otimes {\mathbf{w}} \otimes {\mathbf{w}}} \right)}}
where \( {\mathbf{m}}_{i,t}^4: = ro{w_i}{\mathbf{M}}_{\mathbf{t}}^4\left( {i = 1,…,N} \right) \)

Thus, considering that we can include dimensionality reduction at the PCA whitening stage (the n.comp argument in the rmgarch::fastica algorithm), the IFACD (and the non-time varying higher moments restricted GO-GARCH sub-model) provides a clear avenue for estimating a large scale statistical factor based time varying higher moment CAPM model. A demonstration is available here.

Fast weighted VaR calculation using the Cornish-Fisher Expansion

In the IFACD model, the conditional weighted density can be calculated using the inversion of the NIG/GH characteristic function via FFT as discussed in Ghalanos, Rossi and Urga (2014) and the rmgarch vignette (for the GO-GARCH model). Methods for working with the weighted density are already available in the rmgarch package including methods for the density, distribution and quantile (dfft, pfft and qfft) on estimated, forecast and simulated objects. An alternative avenue for the calculation of the weighted quantile is to use the Cornish-Fisher expansion which makes use of the conditional higher moments:
Va{R_{t,\alpha }} = {\mu _t} + {\sigma _t}\left( {\phi + (1 – {\phi ^2})\frac{{{S_t}}}
{6} + \left( {{\phi ^3} – 3\phi } \right)\frac{{{K_t}}}
{{24}} + \left( {5\phi – 2{\phi ^3}} \right)\frac{{S_t^2}}
{{36}}} \right)
where \( \phi={\Phi ^{ – 1}}\left( \alpha \right) \), represents the quantile of the standard normal distribution evaluated at the coverage level \( \alpha \), \( S_t \) the skewness at time \( t \) and \( K_t \) the excess kurtosis at time \( t \). The weighted moments \( (\mu_t,\sigma_t,S_t,K_t) \) can be calculated as follows:
\[ \begin{gathered}
\mu_t = \mathbf{w}_t’ \mathbf{M_{t}^1},\\
\sigma_{{t}}^2 = \mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t, \\
{S_{t}} = \frac{{\mathbf{w}_t’\mathbf{M}_{_t}^3(\mathbf{w}_t \otimes \mathbf{w}_t)}}{{{{(\mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t)}^{3/2}}}}, \\
{K_{t}} = \frac{{\mathbf{w}_t’\mathbf{M}_{_t}^4(\mathbf{w}_t \otimes \mathbf{w}_t \otimes \mathbf{w}_t)}}{{{{(\mathbf{w}_t'{\mathbf{\Sigma}_{t}}\mathbf{w}_t)}^2}}}, \\
where \( \mathbf{M}_{t}^1 \) is the conditional mean vector, \( \mathbf{M}_{t}^3 \) and \( \mathbf{M}_{t}^4 \) are the third and fourth co-moment matrices described in the previous section, and \( \mathbf{\Sigma}_{t} \) the conditional covariance (the notation \( \mathbf{M}_{t}^2 \) has also been used).
A demonstration is available here.


Buis, P. E., & Dyksen, W. R. (1996). Efficient vector and parallel manipulation of tensor products. ACM Transactions on Mathematical Software (TOMS), 22(1), 18-23.

Ghalanos, A., Rossi, E. & Urga G. (2014). Independent Factor Autoregressive Conditional Density model. Econometric Reviews (forthcoming).