The workhorse function to simulate time series is
arima.sim()
(see here).
However, the application of a convolutional filter precludes a
straightforward generation of time series with a given mean value \(\mu.\)
`mean` **affects** the intercept, but it is not the intercept:
> x <- arima.sim(model = list(order = c(0,0,1), ma=0.9), n = 10000, mean = -15)
> y <- arima.sim(model = list(ma=0.9), n = 10000, mean = -15)
> mean(x)
[1] -28.48915
> mean(y)
[1] -28.47643
> See the discussion of why the mean is not, as could be expected, equal to $\mu$ in MA in [here](https://stats.stackexchange.com/q/294748/67822).
`sd` is the sd of the innovations generated by rnorm()
For instance in
arima.sim(model = list(order = c(1,0,0), ar=0.9), n = 100)
(same as
arima.sim(model = list(order = c(1,0,0), ar=0.9), n = 100)
),
the simulation is for an AR(1) process with coefficient \(0.9.\)
The function from {forecast}
to fit an already existing
data set, or in the cases below, to recover the parameters of the
simulation is Arima()
. To use it, it is important to
remember the order:
`order = c(p,d,q)`:
`p` is the order (number of time lags) of the autoregressive model
`d` is the degree of differencing (the number of times the data have had past values subtracted)
`q` is the order of the moving-average model.
White noise is stationary. From here:
\[Y_t = \varepsilon_t\] with \(\epsilon_i \sim N(0,\sigma^2)\). This is a stationary process.
library(cowplot)
library(fpp2)
library(forecast)
library(patchwork)
library(ggplot2)
set.seed(1)
n = 500
y =ts(rnorm(100))
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, ci.type="ma", linewidth=5) + ggtitle("ACF WHITE NOISE")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF WHITE NOISE")
p1 / ( p2 | p3)
From here.
This is the same as an AR(1) process with \(\rho = 1.\) The \(\rho = 1\) turns the equation for the AR process \(Y_t = \rho \, Y_{t-1} + \epsilon_t\) into the equation below.
\[\large Y_t = Y_{t-1}+\epsilon_t=\epsilon_t+\epsilon_{t-1}+\cdots+\epsilon_1\]
A random walk is not stationary because the variance is time-dependent: \[\text{var}[y_t] = \text{var}[y_o]+\sum_1^t \text{var}[\epsilon_i]= t\, \text{var}[\epsilon].\]
However, the expectation is zero: \(\mathbb E[Y_t]=0.\)
To make the process stationary, the time series has to undergo differencing. The resultant times series after differencing will be white noise.
set.seed(1)
n = 500
y = ts(cumsum(rnorm(n)))
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK NO DRIFT")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK NO DRIFT")
p1 / ( p2 | p3)
dy <- diff(y, differences = 1)
p1 = autoplot(dy) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(dy, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK - DIFFERENCING")
p3 = ggPacf(dy, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK - DIFFERENCING")
p1 / ( p2 | p3)
The first \(50\) lags yield statistically significant correlations. See here.
From here. After differencing, the ACF and PACF are very similar. They are white noise.
The ACF at lag \(k\) is equivalent to the correlation between values \(k\)-lag apart (i.e. $x_{t+k} and \(x_t\)); the PACF is the same, except you first subtract out the projection of each of those onto the space spanned by \(x_{t - 1}, \dots, x_{t-(k+1)}\). This projection has an expected value of zero with white noise data, so the theoretical relationship between ACF and PACF in that case is equivalence. However, real (or simulated) data is a bit messy, and the estimated ACF/PACF values are not actually zero. But with larger sample size, the magnitude of the projection drops to zero faster than the ACF does, so even though the estimated ACF and PACF go toward zero, the PACF more quickly approximates the ACF.
What approximately equal ACF and PACF can tell you is that the data appear to be white noise. You can also infer this from the largely non-significant values; the few that are out of bounds are the product of randomness.
From here:
A random walk (with drift) in the logs (geometric Brownian motion) is not a perfect description of stock prices, for a variety of reasons – but it’s not a terrible first approximation, and it’s the basis of the Black Scholes formula
From here:
\[\large Y_t = \alpha + Y_{t-1}+\epsilon_t\]
where \(\epsilon_t \sim N(0, \sigma^2).\)
The \(\mathbb E[Y_t] = \alpha \, t\) and the \(\text{var}(Y_t) = t \, \sigma^2\)
\(\large y_t = y_{t-1}+\epsilon_t=\epsilon_t+\epsilon_{t-1}+\cdots+\epsilon_1\)
From here: The resulting simulation will:
a realization of a random walk with variance
t*variance
and meant*drift
, wheret
is the index
set.seed(2024)
n = 500
drift = 2
variance = 1
y = ts(cumsum(rnorm(n, mean=drift, sd=sqrt(variance))))
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK WITH DRIFT")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK WITH DRIFT")
p1 / ( p2 | p3)
dy <- diff(y, differences = 1)
p1 = autoplot(dy) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(dy, lag.max=15, linewidth=5) + ggtitle("ACF RANDOM WALK WITH DRIFT - differencing")
p3 = ggPacf(dy, lag.max=15, linewidth=5) + ggtitle("PACF RANDOM WALK WITH DRIFT - differencing")
p1 / ( p2 | p3)
After differencing it also returns white noise. See here.
For AR(1) processes the equation is:
\[Y_t = \rho Y_{t-1}+\epsilon_t\]
In general, for AR(p):
\[\large Y_t = \sum_{i=1}^p \rho_i\, Y_{t - i} + \epsilon_t\]
Notice that, differently from MA models (see below) the mean is not specified, and assumed to be zero. If it is not zero (see here), under the assumption of stationarity of the time series, the mean has to be subtracted on both sides of the equation and from each term:
\[\large Y_t - \mu= \sum_{i=1}^p \rho_i\, \left(Y_{t - i}-\mu\right) + \epsilon_t\]
See also this answer. For instance in AR(2), subtracting the mean from both sides:
\[Y_t - \mu = \rho_1(Y_{t-1} - \mu) + \rho_2(Y_{t-2} - \mu) +\epsilon_t\]
Now moving \(\mu\) to the RHS:
\[Y_t = \mu -\rho_1\,\mu - \rho_2 \,\mu +\rho_1Y_{t-1}+ \rho_2 Y_{t-2}+ \epsilon_t\] and grouping \(c = \mu(1 - \rho_1 - \rho_2)\)
\[Y_t = c + \rho_1Y_{t-1}+ \rho_2 Y_{t-2}+ \epsilon_t\]
From here for AR(1),
\[\mathbb E[Y_t]=0\]
and from here:
\[\text{Var}(Y_t)= \frac{\sigma^2}{1-\rho^2}\]
From here for AR(1):
set.seed(2024)
n = 10^4
y = arima.sim(n=n,list(ar=c(0.15)), innov=rnorm(n))
Arima(y, order = c(1,0,0), include.mean = T)
## Series: y
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.1363 -0.0081
## s.e. 0.0099 0.0116
##
## sigma^2 = 1.001: log likelihood = -14191.94
## AIC=28389.89 AICc=28389.89 BIC=28411.52
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)
An example with positive rho values with arima
from here
set.seed(2024)
n = 10^4
y <- arima.sim(n=n, model=list(ar=c(0.1,0.8)))
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.0945 0.7962 -0.0618
## s.e. 0.0060 0.0060 0.0914
##
## sigma^2 = 1.002: log likelihood = -14198.26
## AIC=28404.51 AICc=28404.52 BIC=28433.35
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)
another example:
set.seed(2024)
n = 10^4
y <- arima.sim(model = list(order = c(1,0,0), ar=0.9), n = n)
Arima(y, order = c(1,0,0), include.mean = T)
## Series: y
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.8938 -0.0619
## s.e. 0.0045 0.0941
##
## sigma^2 = 0.9996: log likelihood = -14187
## AIC=28380 AICc=28380.01 BIC=28401.64
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)
This is the typical pure AR(1) with exponential decay of the ACF, and one single significant spike at lag 1 (positive or negative).
In AR(1) there is one single significant spike on the PACF, and exponential decay in the ACF plot.
Example of damped sine-wave:
library(tsibbledata)
y = as.ts(global_economy[global_economy$Code=="EGY",'Exports'])
par(mfrow=c(3,1))
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, length(y)))
p2 = ggAcf(y, main='ACF')
p0 = ggAcf(y, linewidth=5, main='Dampened Sinusoidal ACF') +
geom_path(linejoin = "round", linewidth=2, col='red') +
aes(y=ggAcf(y)$data$Freq, x=ggAcf(y)$data$lag) +
theme(axis.title.x=element_blank(), axis.title.y=element_blank()) +
geom_point(y=ggAcf(y)$data$Freq, x=ggAcf(y)$data$lag, size=4)
p3 = ggPacf(y, linewidth=5, main='PCAF')
(p1 | p2) / ( p3 | p0)
The sinusoidal ACF is typical of cyclical or seasonal data. In this case is cyclic of duration 4 to 8 years.
On the PACF, the last significant spike is \(4,\) so a plausible model would be AR(4). However, the best model is
ar <- auto.arima(y) # Autofit model
summary(ar)
## Series: y
## ARIMA(2,0,1) with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 mean
## 1.6764 -0.8034 -0.6896 20.1790
## s.e. 0.1111 0.0928 0.1492 0.9142
##
## sigma^2 = 8.046: log likelihood = -141.57
## AIC=293.13 AICc=294.29 BIC=303.43
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.04651341 2.736964 2.173253 -2.355658 11.65043 0.8721343
## ACF1
## Training set -0.02185372
guess <- Arima(y, order = c(4,0,0), include.mean = T)
summary(guess)
## Series: y
## ARIMA(4,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 mean
## 0.9861 -0.1715 0.1807 -0.3283 20.0986
## s.e. 0.1247 0.1865 0.1865 0.1273 1.0699
##
## sigma^2 = 7.885: log likelihood = -140.53
## AIC=293.05 AICc=294.7 BIC=305.41
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.02192829 2.684212 2.106488 -2.183598 11.33228 0.8453413
## ACF1
## Training set -0.03025037
The AICc criteria favors an ARIMA(2,0,1) with a mean by less than one point.
From here.
set.seed(2024)
n = 10^4
AR2.sim <- function(n, ar1, ar2, sd)
{
Xt = c(0,0)
for (i in 2:n)
{
Xt[i+1] = ar1*Xt[i] + ar2*Xt[i-1] + rnorm(1, mean=0,sd=sd)
}
Xt
}
y = ts(AR2.sim(n, -0.5, 0.3, 1))
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## -0.5124 0.2878 -0.0054
## s.e. 0.0096 0.0096 0.0082
##
## sigma^2 = 1: log likelihood = -14191.4
## AIC=28390.8 AICc=28390.8 BIC=28419.64
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, linewidth=5, lag.max=15) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, linewidth=5, lag.max=15) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)
and a second example:
set.seed(2024)
n = 10^4
y <- arima.sim(n=n, model=list(ar=c(-0.5,0.3)))
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## -0.5124 0.2874 -0.0055
## s.e. 0.0096 0.0096 0.0082
##
## sigma^2 = 0.9991: log likelihood = -14183.74
## AIC=28375.48 AICc=28375.49 BIC=28404.33
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)
Based on this answer, if instead of a mean we want an intercept as in the model:
\[X_{t} = 5 + 0.5X_{t-1}+\epsilon_t\]
the mean \(\mu\) will have to actually be \(\mu = 10,\) because, according to the derivation above upon subtracting the mean on both sides from \(X_t\) and \(X_{t-1},\) we are going to need \(\rho=0.5\) times \(mu\) to equal the intercept \(5\). Therefore, \(\text{interc} = \rho\times \mu \implies \mu=\text{interc}/\rho\) so that the expression above is the same as
\[X_t - 10 = 0.5(X_{t-1} - 10) + \epsilon_t .\] Now, defining \(Y_t = X_t - 10,\)
\[Y_t = .5 Y_{t-1} + \epsilon_t\]
and we just add \(10\) at the end:
set.seed(2024)
n = 10^4
yt <- arima.sim(list(order=c(1,0,0), ar=.5), n=n)
y <- yt + 10
Arima(y, order = c(1,0,0), include.mean = T)
## Series: y
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.4887 9.9860
## s.e. 0.0087 0.0196
##
## sigma^2 = 1.001: log likelihood = -14192.45
## AIC=28390.89 AICc=28390.9 BIC=28412.52
p1 = autoplot(yt) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(yt, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(yt, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(1)")
p1 / ( p2 | p3)
However, using the code in here and here it is much easier to avoid altogether the convolution filter, and code as a recurrent filter the generation of the time series as in:
# Toy example:
> x = rnorm(10)
> theta1 = 0.4
> theta2 = 0.7
> fil = c(theta1, theta2)
> filter(x, "recursive", filter=fil)
Time Series:
Start = 1
End = 10
Frequency = 1
[1] -0.5719836 0.7566848 -0.2348018 1.6904645 0.1553653 2.0795214 2.6251119 2.9932320
[9] 3.5073989 3.9858098
> (t1 = x[1])
[1] -0.5719836
> (t2 = t1 * theta1 + x[2])
[1] 0.7566848
> (t3 = t1 * theta2 + t2 * theta1 + x[3])
[1] -0.2348018
> (t4 = t2 * theta2 + t3 * theta1 + x[4])
[1] 1.690465
set.seed(2024)
library(forecast)
n <- 10^4
theta1 <- 0.78
theta2 <- 0.1
mu <- 70 # Desired mean
sd <- 1 # Desired SD
y <- ts(mu + filter(x = c(0, rnorm(n, 0, 1)), filter = c(theta1,theta2),
method = "recursive")[-1])
# Recovering the initial parameters:
Arima(y, order = c(2,0,0), include.mean = T)
## Series: y
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.7670 0.1068 69.9469
## s.e. 0.0099 0.0099 0.0792
##
## sigma^2 = 1: log likelihood = -14190.4
## AIC=28388.79 AICc=28388.8 BIC=28417.63
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF AR(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF AR(2)")
p1 / ( p2 | p3)
From here:
For MR(q):
\[\large Y_t =\mu+ \sum_{i=1}^q \theta_i\, \epsilon_{t - 1} + \varepsilon_t\]
where \(\mu\) is the mean of the series.
For MA(1) models, the \[\mathbb E[Y_t]=0\]
and \[\text{Var}(Y_t)=\sigma^2\left( 1 + \theta^2\right)\]
From here for MA(1):
set.seed(2024)
n = 1000
y <- arima.sim(model=list(ma=0.9), n=n)
Arima(y, order = c(0,0,1), include.mean = T)
## Series: y
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 0.9059 0.0086
## s.e. 0.0131 0.0592
##
## sigma^2 = 0.9664: log likelihood = -1401.71
## AIC=2809.42 AICc=2809.44 BIC=2824.14
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF MA(1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF MA(1)")
p1 / ( p2 | p3)
The characteristic MA pattern on the PACF is alternating and diminishing over time.
As with AR() processes, applying filter()
can generate
the time series without need for arima.sim()
and with the
advantage of being able to plug in a mean. The code is from page 23 of
this
document.
Here is how it works:
# Toy example:
> x = rnorm(10)
> phi1 = 0.4
> phi2 = 0.7
> fil = c(phi1, phi2)
> filter(x, method='convolution', side=1, filter=fil)
Time Series:
Start = 1
End = 10
Frequency = 1
[1] NA 0.4732177 1.1772258 -0.2822853 0.3438599 0.4165922 0.7885656 1.1353353
[9] 0.4015954 -1.0864283
> phi2 * x[1] + phi1 * x[2]
[1] 0.4732177
> phi2 * x[2] + phi1 * x[3]
[1] 1.177226
# Moving Average Model MA(2)
n <- 10^4
phi1 <- 0.78
phi2 <- 0.1
mu <- 70
sd <- 1
y <- ts(mu + filter(rnorm(n, 0, 1), method="convolution", sides=1, c(1,phi1, phi2))[-(1:50)])
Arima(y, order = c(0,0,2), include.mean = T)
## Series: y
## ARIMA(0,0,2) with non-zero mean
##
## Coefficients:
## ma1 ma2 mean
## 0.7734 0.0967 69.9906
## s.e. 0.0100 0.0100 0.0188
##
## sigma^2 = 1.004: log likelihood = -14139.26
## AIC=28286.52 AICc=28286.53 BIC=28315.35
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF MA(2)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF MA(2)")
p1 / ( p2 | p3)
Here are a few advantages of using mixed AR and MA processes:
Flexibility in capturing complex dynamics: AR models capture the linear relationship between the current observation and a linear combination of past observations. MA models, on the other hand, capture the linear relationship between the current observation and the error terms of previous observations. By combining both AR and MA components, an ARMA or ARIMA model can capture a wider range of temporal patterns and potentially model complex dynamics more accurately.
Accommodation of different patterns: AR components are suitable for capturing trends and persistence in a time series, while MA components are effective in capturing short-term shocks and irregularities. By incorporating both components, ARMA or ARIMA models can accommodate different patterns, both short-term and long-term, in the data.
Improved model fitting: In some cases, using a mixed ARMA or ARIMA model with both AR and MA components can provide a better fit to the data compared to using only AR or MA models separately. By allowing for both autoregressive and moving average effects, the model can capture both the long-term dependencies and the short-term fluctuations in the data, leading to improved accuracy in fitting the observed values.
Efficient estimation: When both the AR and MA components are invertible, the parameters of the ARMA or ARIMA model can be estimated efficiently using various estimation techniques, such as maximum likelihood estimation. The invertibility condition ensures that the model is well-defined and ensures stability in the model estimation process.
The model is
\[y_t = c + \phi_1y_{t-1} + \cdots + \phi_py_{t-p} + \theta_1\epsilon_{t-1} + \cdots + \epsilon_t \]
or in backshift notation,
\[y_t = c + \phi_1By_{t} + \cdots + \phi_pB^py_{t} + \theta_1B\epsilon_{t} + \cdots +\theta_qB^q \epsilon_t \]
moving \(y_t\) to the LHS:
\[(1 - \phi_1B - \cdots - \phi_pB^p)y_{t}=c+(\theta_1B + \cdots +\theta_qB^q) \epsilon_t\]
An ARIMA(1,1,1) model is
\[(1 - \phi_1B)\color{red}{(1-B)}y_t=c+(1+\theta_1B)\epsilon_t\]
where the part in red is the first difference.
There are two ways of writing the ARIMA model in general (from here):
\[(1 - \phi_1 B-\cdots -\phi_p B^p)y_t' =c +(1 + \theta_1B+ \cdots + \theta_qB^q)\epsilon_t\] where \[y_t' = (1-B)^dy_t\]
\[(1 - \phi_1 B-\cdots -\phi_p B^p)(y_t'-\mu) =(1 + \theta_1B+ \cdots + \theta_qB^q)\epsilon_t\]
where \(\mu\) is the mean of \(y_t'.\) Therefore,
\[c= \mu(1-\phi_1-\cdots-\phi_p)\]
From here:
\[c(1−ϕ_1B−⋯−ϕ_pB_p)(1−B)^dy_t=c+(1+θ_1B+⋯+θ_qB_q)e_t\]
where \(c=μ(1−ϕ_1−⋯−ϕ_p)\) and \(\mu\) is the mean of \((1-B)^d y_t.\)
Thus, the inclusion of a constant in a non-stationary ARIMA model is equivalent to inducing a polynomial trend of order \(d\) in the forecast function. (If the constant is omitted, the forecast function includes a polynomial trend of order \(d−1\).) When \(d=0,\) we have the special case that \(μ\) is the mean of \(y_t.\)
Including constants in ARIMA models using R
arima()
By default, the
arima()
command in R sets \(c=μ=0\) when \(d>0\) and provides an estimate of \(\mu\) when \(d=0.\) The parameter \(μ\) is called the ‘intercept’ in the R output. It will be close to the sample mean of the time series, but usually not identical to it as the sample mean is not the maximum likelihood estimate when \(p+q>0.\)The
arima()
command has an argumentinclude.mean
which only has an effect when \(d=0\) and isTRUE
by default. Settinginclude.mean=FALSE
will force \(μ=0.\)
Arima()
The
Arima()
command from the forecast package provides more flexibility on the inclusion of a constant. It has an argumentinclude.mean
which has identical functionality to the corresponding argument forarima()
. It also has an argumentinclude.drift
which allows \(μ\neq 0\) when \(d=1.\) For \(d>1,\) no constant is allowed as a quadratic or higher order trend is particularly dangerous when forecasting. The parameter \(\mu\) is called the ‘drift’ in the R output when \(d=1.\)There is also an argument
include.constant
which, ifTRUE
, will setinclude.mean=TRUE
if \(d=0\) andinclude.drift=TRUE
when \(d=1.\) Ifinclude.constant=FALSE
, bothinclude.mean
andinclude.drift
will be set toFALSE
. Ifinclude.constant
is used, the values ofinclude.mean=TRUE
andinclude.drift=TRUE
are ignored.When \(d=0\) and
include.drift=TRUE
, the fitted model fromArima()
is\[(1−ϕ_1B−⋯−ϕ_pB^p)(y_t−a−bt)=(1+θ_1B+⋯+θ_qB^q)e_t\]
In this case, the R output will label \(a\) as the ‘intercept’ and \(b\) as the ‘drift’ coefficient.
From here:
n <- 10^4
y <- arima.sim(model = list(order = c(2, 0, 1), ar = c(1, -.9), ma = .8), n = n)
p1 = autoplot(y) +
geom_point() + labs(x = "", y = "") + xlim(c(0, 100))
p2 = ggAcf(y, lag.max=15, linewidth=5) + ggtitle("ACF ARMA(2,1)")
p3 = ggPacf(y, lag.max=15, linewidth=5) + ggtitle("PACF ARMA(2,1)")
p1 / ( p2 | p3)
Arima(y, order = c(2,0,1), include.mean = T)
## Series: y
## ARIMA(2,0,1) with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 mean
## 0.9957 -0.8961 0.7968 -0.0091
## s.e. 0.0045 0.0045 0.0063 0.0200
##
## sigma^2 = 1.008: log likelihood = -14228.22
## AIC=28466.45 AICc=28466.45 BIC=28502.5
Find the explanation in here. The code is extracted from here
The seasonal ARIMA model incorporates both non-seasonal and seasonal factors in a multiplicative model. One shorthand notation for the model is
\[(p, d, q) \times (P, D, Q)S\]
with p = non-seasonal AR order, d = non-seasonal differencing, q = non-seasonal MA order, P = seasonal AR order, D = seasonal differencing, Q = seasonal MA order, and S = time span of repeating seasonal pattern. Without differencing operations, the model could be written more formally as
\[\Phi(B^S)\phi(B)(x_t-\mu)=\Theta(B^S)\theta(B)w_t\]
The non-seasonal components are:
\[\phi(B)=1-\phi_1B-\ldots-\phi_pB^p\] \[\theta(B) = 1+ \theta_1B+ \ldots + \theta_qB^q\]
The seasonal components are:
\[\Phi(B^S) = 1- \Phi_1 B^S - \ldots - \Phi_PB^{PS}\] \[\Theta(B^S) = 1 + \Theta_1 B^S + \ldots + \Theta_Q B^{QS}\]
m <- arima(ts(rnorm(100), freq=4), order=c(1,0,1), seasonal=c(1,1,1),
fixed = c(phi=0.5, theta=-0.4, Phi=0.3, Theta=-0.2))
nsim <- 100
y <- simulate(m, nsim=nsim)
autoplot(forecast(y, h=20))
# Increasing the number of samples for a more accurate fitting:
nsim <- 10000
y <- simulate(m, nsim=nsim)
fit <- Arima(y, order=c(1,0,1), seasonal=c(1,1,1))
fit
## Series: y
## ARIMA(1,0,1)(1,1,1)[4]
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.4549 -0.3538 0.3193 -0.2111
## s.e. 0.0867 0.0909 0.0892 0.0924
##
## sigma^2 = 2.113: log likelihood = -17921.95
## AIC=35853.89 AICc=35853.9 BIC=35889.94
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.