NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Time Series Simulations and practice:


The workhorse function to simulate time series is arima.sim() (see here). However, the application of a convolutional filter precludes a straightforward generation of time series with a given mean value \(\mu.\)

`mean` **affects** the intercept, but it is not the intercept:
> x <- arima.sim(model = list(order = c(0,0,1), ma=0.9), n = 10000, mean = -15)
> y <- arima.sim(model = list(ma=0.9), n = 10000, mean = -15)
> mean(x)
[1] -28.48915
> mean(y)
[1] -28.47643

> See the discussion of why the mean is not, as could be expected, equal to $\mu$ in MA in [here](https://stats.stackexchange.com/q/294748/67822).

`sd` is the sd of the innovations generated by rnorm()

For instance in arima.sim(model = list(order = c(1,0,0), ar=0.9), n = 100) (same as arima.sim(model = list(order = c(1,0,0), ar=0.9), n = 100)), the simulation is for an AR(1) process with coefficient \(0.9.\)

The function from {forecast} to fit an already existing data set, or in the cases below, to recover the parameters of the simulation is Arima(). To use it, it is important to remember the order:

`order = c(p,d,q)`:
`p` is the order (number of time lags) of the autoregressive model
`d` is the degree of differencing (the number of times the data have had past values subtracted)
`q` is the order of the moving-average model.

WHITE NOISE:

White noise is stationary. From here:

\[Y_t = \varepsilon_t\] with \(\epsilon_i \sim N(0,\sigma^2)\). This is a stationary process.

library(cowplot) 
library(fpp2)  
library(forecast)
library(patchwork)
library(ggplot2)

set.seed(1)
n = 500
y =ts(rnorm(100))
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, ci.type="ma", linewidth=5) + ggtitle("ACF WHITE NOISE")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF WHITE NOISE")
p1 / ( p2 | p3)


RANDOM WALK:

RANDOM WALK WITH NO DRIFT:

From here.

This is the same as an AR(1) process with \(\rho = 1.\) The \(\rho = 1\) turns the equation for the AR process \(Y_t = \rho \, Y_{t-1} + \epsilon_t\) into the equation below.

\[\large Y_t = Y_{t-1}+\epsilon_t=\epsilon_t+\epsilon_{t-1}+\cdots+\epsilon_1\]

A random walk is not stationary because the variance is time-dependent: \[\text{var}[y_t] = \text{var}[y_o]+\sum_1^t \text{var}[\epsilon_i]= t\, \text{var}[\epsilon].\]

However, the expectation is zero: \(\mathbb E[Y_t]=0.\)

To make the process stationary, the time series has to undergo differencing. The resultant times series after differencing will be white noise.

set.seed(1)
n = 500
y = ts(cumsum(rnorm(n)))

p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK NO DRIFT")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK NO DRIFT")
p1 / ( p2 | p3)

dy <- diff(y, differences = 1)

p1 = autoplot(dy) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(dy, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK - DIFFERENCING")
p3 = ggPacf(dy, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK - DIFFERENCING")
p1 / ( p2 | p3)

  1. The first \(50\) lags yield statistically significant correlations. See here.

  2. From here. After differencing, the ACF and PACF are very similar. They are white noise.

The ACF at lag \(k\) is equivalent to the correlation between values \(k\)-lag apart (i.e. $x_{t+k} and \(x_t\)); the PACF is the same, except you first subtract out the projection of each of those onto the space spanned by \(x_{t - 1}, \dots, x_{t-(k+1)}\). This projection has an expected value of zero with white noise data, so the theoretical relationship between ACF and PACF in that case is equivalence. However, real (or simulated) data is a bit messy, and the estimated ACF/PACF values are not actually zero. But with larger sample size, the magnitude of the projection drops to zero faster than the ACF does, so even though the estimated ACF and PACF go toward zero, the PACF more quickly approximates the ACF.

What approximately equal ACF and PACF can tell you is that the data appear to be white noise. You can also infer this from the largely non-significant values; the few that are out of bounds are the product of randomness.


RANDOM WALK WITH DRIFT:

From here:

A random walk (with drift) in the logs (geometric Brownian motion) is not a perfect description of stock prices, for a variety of reasons – but it’s not a terrible first approximation, and it’s the basis of the Black Scholes formula

From here:

\[\large Y_t = \alpha + Y_{t-1}+\epsilon_t\]

where \(\epsilon_t \sim N(0, \sigma^2).\)

The \(\mathbb E[Y_t] = \alpha \, t\) and the \(\text{var}(Y_t) = t \, \sigma^2\)

\(\large y_t = y_{t-1}+\epsilon_t=\epsilon_t+\epsilon_{t-1}+\cdots+\epsilon_1\)

From here: The resulting simulation will:

a realization of a random walk with variance t*variance and mean t*drift, where t is the index

set.seed(2024)
n = 500
drift = 2
variance = 1
y = ts(cumsum(rnorm(n, mean=drift, sd=sqrt(variance))))

p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK WITH DRIFT")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK WITH DRIFT")
p1 / ( p2 | p3)

dy <- diff(y, differences = 1)

p1 = autoplot(dy) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(dy, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK WITH DRIFT - differencing")
p3 = ggPacf(dy, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK WITH DRIFT - differencing")
p1 / ( p2 | p3)

After differencing it also returns white noise. See here.


AUTOREGRESSIVE PROCESS (AR):

From here and here:

For AR(1) processes the equation is:

\[Y_t = \rho Y_{t-1}+\epsilon_t\]

In general, for AR(p):

\[\large Y_t = \sum_{i=1}^p \rho_i\, Y_{t - i} + \epsilon_t\]

Notice that, differently from MA models (see below) the mean is not specified, and assumed to be zero. If it is not zero (see here), under the assumption of stationarity of the time series, the mean has to be subtracted on both sides of the equation and from each term:

\[\large Y_t - \mu= \sum_{i=1}^p \rho_i\, \left(Y_{t - i}-\mu\right) + \epsilon_t\]

See also this answer. For instance in AR(2), subtracting the mean from both sides:

\[Y_t - \mu = \rho_1(Y_{t-1} - \mu) + \rho_2(Y_{t-2} - \mu) +\epsilon_t\]

Now moving \(\mu\) to the RHS:

\[Y_t = \mu -\rho_1\,\mu - \rho_2 \,\mu +\rho_1Y_{t-1}+ \rho_2 Y_{t-2}+ \epsilon_t\] and grouping \(c = \mu(1 - \rho_1 - \rho_2)\)

\[Y_t = c + \rho_1Y_{t-1}+ \rho_2 Y_{t-2}+ \epsilon_t\]

From here for AR(1),

\[\mathbb E[Y_t]=0\]

and from here:

\[\text{Var}(Y_t)= \frac{\sigma^2}{1-\rho^2}\]

From here for AR(1):

set.seed(2024)
n = 10^4
y = arima.sim(n=n,list(ar=c(0.15)), innov=rnorm(n))
Arima(y, order = c(1,0,0), include.mean = T)
## Series: y 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.1363  -0.0081
## s.e.  0.0099   0.0116
## 
## sigma^2 = 1.001:  log likelihood = -14191.94
## AIC=28389.89   AICc=28389.89   BIC=28411.52
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)

An example with positive rho values with arima from here

set.seed(2024)
n = 10^4
y <- arima.sim(n=n, model=list(ar=c(0.1,0.8)))
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.0945  0.7962  -0.0618
## s.e.  0.0060  0.0060   0.0914
## 
## sigma^2 = 1.002:  log likelihood = -14198.26
## AIC=28404.51   AICc=28404.52   BIC=28433.35
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)

another example:

set.seed(2024)
n = 10^4
y <- arima.sim(model = list(order = c(1,0,0), ar=0.9), n = n)
Arima(y, order = c(1,0,0), include.mean = T)
## Series: y 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     mean
##       0.8938  -0.0619
## s.e.  0.0045   0.0941
## 
## sigma^2 = 0.9996:  log likelihood = -14187
## AIC=28380   AICc=28380.01   BIC=28401.64
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)

This is the typical pure AR(1) with exponential decay of the ACF, and one single significant spike at lag 1 (positive or negative).

In AR(1) there is one single significant spike on the PACF, and exponential decay in the ACF plot.

  1. The ACF shows geometric decay or damped sine-wave.
  2. The last significant lag in the PACF indicates the order of the AR model.
  3. A sinusoidal decay indicates either a seasonal or cyclic component.

Example of damped sine-wave:

library(tsibbledata)
y = as.ts(global_economy[global_economy$Code=="EGY",'Exports'])
par(mfrow=c(3,1))
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, length(y)))
p2 = ggAcf(y, main='ACF')
p0 = ggAcf(y, linewidth=5, main='Dampened Sinusoidal ACF') +  
        geom_path(linejoin = "round", linewidth=2, col='red') + 
        aes(y=ggAcf(y)$data$Freq, x=ggAcf(y)$data$lag) + 
        theme(axis.title.x=element_blank(), axis.title.y=element_blank()) +
        geom_point(y=ggAcf(y)$data$Freq, x=ggAcf(y)$data$lag, size=4)
p3 = ggPacf(y, linewidth=5, main='PCAF')
(p1 | p2) / ( p3 | p0)

The sinusoidal ACF is typical of cyclical or seasonal data. In this case is cyclic of duration 4 to 8 years.

On the PACF, the last significant spike is \(4,\) so a plausible model would be AR(4). However, the best model is

ar <- auto.arima(y) # Autofit model
summary(ar)
## Series: y 
## ARIMA(2,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2      ma1     mean
##       1.6764  -0.8034  -0.6896  20.1790
## s.e.  0.1111   0.0928   0.1492   0.9142
## 
## sigma^2 = 8.046:  log likelihood = -141.57
## AIC=293.13   AICc=294.29   BIC=303.43
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.04651341 2.736964 2.173253 -2.355658 11.65043 0.8721343
##                     ACF1
## Training set -0.02185372
guess <- Arima(y, order = c(4,0,0), include.mean = T)
summary(guess)
## Series: y 
## ARIMA(4,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     mean
##       0.9861  -0.1715  0.1807  -0.3283  20.0986
## s.e.  0.1247   0.1865  0.1865   0.1273   1.0699
## 
## sigma^2 = 7.885:  log likelihood = -140.53
## AIC=293.05   AICc=294.7   BIC=305.41
## 
## Training set error measures:
##                       ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -0.02192829 2.684212 2.106488 -2.183598 11.33228 0.8453413
##                     ACF1
## Training set -0.03025037

The AICc criteria favors an ARIMA(2,0,1) with a mean by less than one point.

From here.

set.seed(2024)
n = 10^4
AR2.sim <- function(n, ar1, ar2, sd)
{
  Xt = c(0,0)
  for (i in 2:n)
  {
    Xt[i+1] = ar1*Xt[i] + ar2*Xt[i-1] + rnorm(1, mean=0,sd=sd)
  }
  Xt
}
y = ts(AR2.sim(n, -0.5, 0.3, 1))
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##           ar1     ar2     mean
##       -0.5124  0.2878  -0.0054
## s.e.   0.0096  0.0096   0.0082
## 
## sigma^2 = 1:  log likelihood = -14191.4
## AIC=28390.8   AICc=28390.8   BIC=28419.64
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, linewidth=5, lag.max=15) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, linewidth=5, lag.max=15) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)

and a second example:

set.seed(2024)
n = 10^4
y <- arima.sim(n=n, model=list(ar=c(-0.5,0.3)))
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##           ar1     ar2     mean
##       -0.5124  0.2874  -0.0055
## s.e.   0.0096  0.0096   0.0082
## 
## sigma^2 = 0.9991:  log likelihood = -14183.74
## AIC=28375.48   AICc=28375.49   BIC=28404.33
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)

Simulating an AR(1) process with \(\mu \neq 0:\)

Based on this answer, if instead of a mean we want an intercept as in the model:

\[X_{t} = 5 + 0.5X_{t-1}+\epsilon_t\]

the mean \(\mu\) will have to actually be \(\mu = 10,\) because, according to the derivation above upon subtracting the mean on both sides from \(X_t\) and \(X_{t-1},\) we are going to need \(\rho=0.5\) times \(mu\) to equal the intercept \(5\). Therefore, \(\text{interc} = \rho\times \mu \implies \mu=\text{interc}/\rho\) so that the expression above is the same as

\[X_t - 10 = 0.5(X_{t-1} - 10) + \epsilon_t .\] Now, defining \(Y_t = X_t - 10,\)

\[Y_t = .5 Y_{t-1} + \epsilon_t\]

and we just add \(10\) at the end:

set.seed(2024)
n = 10^4
yt <- arima.sim(list(order=c(1,0,0), ar=.5), n=n)
y <- yt + 10 
Arima(y, order = c(1,0,0), include.mean = T)
## Series: y 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1    mean
##       0.4887  9.9860
## s.e.  0.0087  0.0196
## 
## sigma^2 = 1.001:  log likelihood = -14192.45
## AIC=28390.89   AICc=28390.9   BIC=28412.52
p1 = autoplot(yt) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(yt, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(yt, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)

p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)

However, using the code in here and here it is much easier to avoid altogether the convolution filter, and code as a recurrent filter the generation of the time series as in:

# Toy example:
> x = rnorm(10)
> theta1 = 0.4
> theta2 = 0.7
> fil = c(theta1, theta2)
> filter(x, "recursive", filter=fil)
Time Series:
Start = 1 
End = 10 
Frequency = 1 
 [1] -0.5719836  0.7566848 -0.2348018  1.6904645  0.1553653  2.0795214  2.6251119  2.9932320
 [9]  3.5073989  3.9858098
> (t1 = x[1])
[1] -0.5719836
> (t2 = t1 * theta1 + x[2])
[1] 0.7566848
> (t3 = t1 * theta2 + t2 * theta1 + x[3])
[1] -0.2348018
> (t4 = t2 * theta2 + t3 * theta1 + x[4])
[1] 1.690465
set.seed(2024)
library(forecast)
n <- 10^4
theta1 <- 0.78
theta2 <- 0.1
mu <- 70 # Desired mean
sd <- 1 # Desired SD
y <- ts(mu + filter(x = c(0, rnorm(n, 0, 1)), filter = c(theta1,theta2), 
                    method = "recursive")[-1])
# Recovering the initial parameters:
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2     mean
##       0.7670  0.1068  69.9469
## s.e.  0.0099  0.0099   0.0792
## 
## sigma^2 = 1:  log likelihood = -14190.4
## AIC=28388.79   AICc=28388.8   BIC=28417.63
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)


MOVING AVERAGE (MA):

From here:

For MR(q):

\[\large Y_t =\mu+ \sum_{i=1}^q \theta_i\, \epsilon_{t - 1} + \varepsilon_t\]

where \(\mu\) is the mean of the series.

For MA(1) models, the \[\mathbb E[Y_t]=0\]

and \[\text{Var}(Y_t)=\sigma^2\left( 1 + \theta^2\right)\]

From here for MA(1):

set.seed(2024)
n = 1000
y <- arima.sim(model=list(ma=0.9), n=n)
Arima(y, order = c(0,0,1), include.mean = T)
## Series: y 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##          ma1    mean
##       0.9059  0.0086
## s.e.  0.0131  0.0592
## 
## sigma^2 = 0.9664:  log likelihood = -1401.71
## AIC=2809.42   AICc=2809.44   BIC=2824.14
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF MA(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF MA(1)")
p1 / ( p2 | p3)

  1. Significant lags equal to the order on ACF.
  2. Geometric decay of PACF.

The characteristic MA pattern on the PACF is alternating and diminishing over time.

As with AR() processes, applying filter() can generate the time series without need for arima.sim() and with the advantage of being able to plug in a mean. The code is from page 23 of this document.

Here is how it works:

# Toy example:
> x = rnorm(10)
> phi1 = 0.4
> phi2 = 0.7
> fil = c(phi1, phi2)
> filter(x, method='convolution', side=1, filter=fil)
Time Series:
Start = 1 
End = 10 
Frequency = 1 
 [1]         NA  0.4732177  1.1772258 -0.2822853  0.3438599  0.4165922  0.7885656  1.1353353
 [9]  0.4015954 -1.0864283
> phi2 * x[1] + phi1 * x[2]
[1] 0.4732177
> phi2 * x[2] + phi1 * x[3]
[1] 1.177226
# Moving Average Model MA(2)
n <- 10^4
phi1 <- 0.78
phi2 <- 0.1
mu <- 70
sd <- 1
y <- ts(mu + filter(rnorm(n, 0, 1), method="convolution", sides=1, c(1,phi1, phi2))[-(1:50)])
Arima(y, order = c(0,0,2), include.mean = T)
## Series: y 
## ARIMA(0,0,2) with non-zero mean 
## 
## Coefficients:
##          ma1     ma2     mean
##       0.7734  0.0967  69.9906
## s.e.  0.0100  0.0100   0.0188
## 
## sigma^2 = 1.004:  log likelihood = -14139.26
## AIC=28286.52   AICc=28286.53   BIC=28315.35
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF MA(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF MA(2)")
p1 / ( p2 | p3)


MIXED MODELS:

Here are a few advantages of using mixed AR and MA processes:

Flexibility in capturing complex dynamics: AR models capture the linear relationship between the current observation and a linear combination of past observations. MA models, on the other hand, capture the linear relationship between the current observation and the error terms of previous observations. By combining both AR and MA components, an ARMA or ARIMA model can capture a wider range of temporal patterns and potentially model complex dynamics more accurately.

Accommodation of different patterns: AR components are suitable for capturing trends and persistence in a time series, while MA components are effective in capturing short-term shocks and irregularities. By incorporating both components, ARMA or ARIMA models can accommodate different patterns, both short-term and long-term, in the data.

Improved model fitting: In some cases, using a mixed ARMA or ARIMA model with both AR and MA components can provide a better fit to the data compared to using only AR or MA models separately. By allowing for both autoregressive and moving average effects, the model can capture both the long-term dependencies and the short-term fluctuations in the data, leading to improved accuracy in fitting the observed values.

Efficient estimation: When both the AR and MA components are invertible, the parameters of the ARMA or ARIMA model can be estimated efficiently using various estimation techniques, such as maximum likelihood estimation. The invertibility condition ensures that the model is well-defined and ensures stability in the model estimation process.

ARMA:

The model is

\[y_t = c + \phi_1y_{t-1} + \cdots + \phi_py_{t-p} + \theta_1\epsilon_{t-1} + \cdots + \epsilon_t \]

or in backshift notation,

\[y_t = c + \phi_1By_{t} + \cdots + \phi_pB^py_{t} + \theta_1B\epsilon_{t} + \cdots +\theta_qB^q \epsilon_t \]

moving \(y_t\) to the LHS:

\[(1 - \phi_1B - \cdots - \phi_pB^p)y_{t}=c+(\theta_1B + \cdots +\theta_qB^q) \epsilon_t\]


ARIMA:

An ARIMA(1,1,1) model is

\[(1 - \phi_1B)\color{red}{(1-B)}y_t=c+(1+\theta_1B)\epsilon_t\]

where the part in red is the first difference.

There are two ways of writing the ARIMA model in general (from here):

Intercept form:

\[(1 - \phi_1 B-\cdots -\phi_p B^p)y_t' =c +(1 + \theta_1B+ \cdots + \theta_qB^q)\epsilon_t\] where \[y_t' = (1-B)^dy_t\]

Mean form:

\[(1 - \phi_1 B-\cdots -\phi_p B^p)(y_t'-\mu) =(1 + \theta_1B+ \cdots + \theta_qB^q)\epsilon_t\]

where \(\mu\) is the mean of \(y_t'.\) Therefore,

\[c= \mu(1-\phi_1-\cdots-\phi_p)\]

From here:

\[c(1−ϕ_1B−⋯−ϕ_pB_p)(1−B)^dy_t=c+(1+θ_1B+⋯+θ_qB_q)e_t\]

where \(c=μ(1−ϕ_1−⋯−ϕ_p)\) and \(\mu\) is the mean of \((1-B)^d y_t.\)

Thus, the inclusion of a constant in a non-stationary ARIMA model is equivalent to inducing a polynomial trend of order \(d\) in the forecast function. (If the constant is omitted, the forecast function includes a polynomial trend of order \(d−1\).) When \(d=0,\) we have the special case that \(μ\) is the mean of \(y_t.\)

Including constants in ARIMA models using R

arima()

By default, the arima() command in R sets \(c=μ=0\) when \(d>0\) and provides an estimate of \(\mu\) when \(d=0.\) The parameter \(μ\) is called the ‘intercept’ in the R output. It will be close to the sample mean of the time series, but usually not identical to it as the sample mean is not the maximum likelihood estimate when \(p+q>0.\)

The arima() command has an argument include.mean which only has an effect when \(d=0\) and is TRUE by default. Setting include.mean=FALSE will force \(μ=0.\)

Arima()

The Arima() command from the forecast package provides more flexibility on the inclusion of a constant. It has an argument include.mean which has identical functionality to the corresponding argument for arima(). It also has an argument include.drift which allows \(μ\neq 0\) when \(d=1.\) For \(d>1,\) no constant is allowed as a quadratic or higher order trend is particularly dangerous when forecasting. The parameter \(\mu\) is called the ‘drift’ in the R output when \(d=1.\)

There is also an argument include.constant which, if TRUE, will set include.mean=TRUE if \(d=0\) and include.drift=TRUE when \(d=1.\) If include.constant=FALSE, both include.mean and include.drift will be set to FALSE. If include.constant is used, the values of include.mean=TRUE and include.drift=TRUE are ignored.

When \(d=0\) and include.drift=TRUE, the fitted model from Arima() is

\[(1−ϕ_1B−⋯−ϕ_pB^p)(y_t−a−bt)=(1+θ_1B+⋯+θ_qB^q)e_t\]

In this case, the R output will label \(a\) as the ‘intercept’ and \(b\) as the ‘drift’ coefficient.


From here:

n <- 10^4
y <- arima.sim(model = list(order = c(2, 0, 1), ar = c(1, -.9), ma = .8), n = n)
p1 = autoplot(y) + 
  geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF ARMA(2,1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF ARMA(2,1)")
p1 / ( p2 | p3)

Arima(y, order = c(2,0,1), include.mean = T)
## Series: y 
## ARIMA(2,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ma1     mean
##       0.9957  -0.8961  0.7968  -0.0091
## s.e.  0.0045   0.0045  0.0063   0.0200
## 
## sigma^2 = 1.008:  log likelihood = -14228.22
## AIC=28466.45   AICc=28466.45   BIC=28502.5

ARIMA with SEASONALITY:

Find the explanation in here. The code is extracted from here

The seasonal ARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative model. One shorthand notation for the model is

\[(p, d, q) \times (P, D, Q)S\]

with p = non-seasonal AR order, d = non-seasonal differencing, q = non-seasonal MA order, P = seasonal AR order, D = seasonal differencing, Q = seasonal MA order, and S = time span of repeating seasonal pattern. Without differencing operations, the model could be written more formally as

\[\Phi(B^S)\phi(B)(x_t-\mu)=\Theta(B^S)\theta(B)w_t\]

The non-seasonal components are:

\[\phi(B)=1-\phi_1B-\ldots-\phi_pB^p\] \[\theta(B) = 1+ \theta_1B+ \ldots + \theta_qB^q\]

The seasonal components are:

\[\Phi(B^S) = 1- \Phi_1 B^S - \ldots - \Phi_PB^{PS}\] \[\Theta(B^S) = 1 + \Theta_1 B^S + \ldots + \Theta_Q B^{QS}\]

m <- arima(ts(rnorm(100), freq=4), order=c(1,0,1), seasonal=c(1,1,1),
             fixed = c(phi=0.5, theta=-0.4, Phi=0.3, Theta=-0.2))
nsim <- 100
y <- simulate(m, nsim=nsim)
autoplot(forecast(y, h=20)) 

# Increasing the number of samples for a more accurate fitting:
nsim <- 10000
y <- simulate(m, nsim=nsim)
fit <- Arima(y, order=c(1,0,1), seasonal=c(1,1,1))
fit
## Series: y 
## ARIMA(1,0,1)(1,1,1)[4] 
## 
## Coefficients:
##          ar1      ma1    sar1     sma1
##       0.4549  -0.3538  0.3193  -0.2111
## s.e.  0.0867   0.0909  0.0892   0.0924
## 
## sigma^2 = 2.113:  log likelihood = -17921.95
## AIC=35853.89   AICc=35853.9   BIC=35889.94

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.