Chapter 1 Basics

This section covers fundamental concepts in time series analysis. There seems to be three modeling paradigms: i) the base R framework using native ts, zoo, and xts objects to model with the forecast package, ii) the tidyverts framework using tsibble objects to model with the fable package, and iii) the tidymodels framework using tibble objects with the timetk package. The base R framework is clunky, so I avoid it. The tidymodels seems to be geared toward machine-learning workflows which are still unfamiliar to me. So most of these notes use tidyverts.

library(tidyverse)
library(tsibble)   # extends tibble to time-series data.
library(feasts)    # feature extraction and statistics
library(fable)     # forecasting tsibbles.
library(tidymodels)
library(modeltime) # new time series forecasting framework.

1.1 Common Frameworks

Let’s review those three frameworks briefly. I’ll work with the fpp3::us_employment data of US monthly employment data. There is one row per month and series (225 months x 150 series).

us_employment_tibble <- 
  fpp3::us_employment %>% 
  as_tibble() %>% 
  mutate(Month = ym(Month)) %>%
  filter(year(Month) >= 2001)

glimpse(us_employment_tibble)
## Rows: 33,300
## Columns: 4
## $ Month     <date> 2001-01-01, 2001-02-01, 2001-03-01, 2001-04-01, 2001-05-01,…
## $ Series_ID <chr> "CEU0500000001", "CEU0500000001", "CEU0500000001", "CEU05000…
## $ Title     <chr> "Total Private", "Total Private", "Total Private", "Total Pr…
## $ Employed  <dbl> 109928, 110140, 110603, 110962, 111624, 112303, 111897, 1118…

Base R

ts is the base R time series package. The ts object is essentially a matrix of observations indexed by a chronological identifier. Because it is a matrix, any descriptive attributes need to enter as numeric, perhaps by one-hot encoding. A ts can only have one row per time observation, and the time series must be regularly spaced (no data gaps).

Define a ts object with ts(x, start, frequency) where frequency is the number of observations in the seasonal pattern: 7 for daily observations with a week cycle; 5 for weekday observations in a week cycle; 24 for hourly in a day cycle, 24x7 for hourly in a week cycle, etc. us_employment_tibble is monthly observations starting with Jan 1939. Had the series started in Feb, you would specify start = c(1939, 2). I filtered the data to the Total Private series to get one row per time observation.

us_employment_ts <- 
  us_employment_tibble %>% 
  filter(Title == "Total Private") %>%
  arrange(Month) %>%
  select(Employed) %>%
  ts(start = c(1939, 1), frequency = 12)

glimpse(us_employment_ts)
##  Time-Series [1:225, 1] from 1939 to 1958: 109928 110140 110603 110962 111624 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr "Employed"

zoo and xts

zoo (Zeileis’s ordered observations) has functions similar to those in ts, but also supports irregular time series. A zoo object contains an array of data values and an index attribute to provide information about the data ordering. zoo was introduced in 2014.

xts (extensible time series) extends zoo. xts objects are more flexible than ts objects while imposing reasonable constraints to make them truly time-based. An xts object is essentially a matrix of observations indexed by a time object. Create an xts object with xts(x, order.by) where order.by is a vector of dates/times to index the data. You can also add metadata to the xts object by declaring name-value pairs such as Title below.

library(xts)

us_employment_xts <-
  us_employment_tibble %>%
  filter(Title == "Total Private") %>%
  arrange(Month) %>%
  xts(.$Employed, order.by = .$Month, Title = .$Title)

glimpse(us_employment_xts)
## An xts object on 2001-01-01 / 2019-09-01 containing: 
##   Data:    character [225, 4]
##   Columns: Month, Series_ID, Title, Employed
##   Index:   Date [225] (TZ: "UTC")
##   xts Attributes:
##     $ Title: chr [1:225] "Total Private" "Total Private" "Total Private" "Total Private" ...

tidyverts

A tsibble, from the package of the same name, is a time-series tibble. Unlike the ts, zoo, and xts objects, a tsibble preserves the time index, making heterogeneous data structures possible. For example, you can re-index a tsibble from monthly to yearly analysis, or include one or more features per time element and fit a linear regression.

A tsibble object is a tibble uniquely defined by key columns plus a date index column. This structure accommodates multiple series, and attribute columns. The date index can be a Date, period, etc. Express weekly time series with yearweek(), monthly time series with yearmonth(), or quarterly with yearquarter().

us_employment_tsibble <- us_employment_tibble %>% 
  mutate(Month = yearmonth(Month)) %>%
  tsibble(key = c(Title), index = Month)

us_employment_tsibble %>% filter(Title == "Total Private") %>% glimpse()
## Rows: 225
## Columns: 4
## Key: Title [1]
## $ Month     <mth> 2001 Jan, 2001 Feb, 2001 Mar, 2001 Apr, 2001 May, 2001 Jun, …
## $ Series_ID <chr> "CEU0500000001", "CEU0500000001", "CEU0500000001", "CEU05000…
## $ Title     <chr> "Total Private", "Total Private", "Total Private", "Total Pr…
## $ Employed  <dbl> 109928, 110140, 110603, 110962, 111624, 112303, 111897, 1118…

A tsibble behaves like a tibble, so you can use tidyverse verbs. The only thing that will trip you up is that tsibble objects are grouped by the index, so group_by() operations implicitly include the index. Use index_by() if you need to summarize at a new time level.

# Group by Quarter implicitly includes the Month index column. Don't do this.
us_employment_tsibble %>% 
  group_by(Quarter = quarter(Month), Title) %>% 
  summarize(Employed = sum(Employed, na.rm = TRUE)) %>%
  tail(3)

# Instead, change the index aggregation level with index_by() and group with
# either group_by() or group_by_key()
us_employment_tsibble %>%
  index_by(Quarter = ~ quarter(.)) %>%
  group_by_key(Title) %>%
  summarise(Employed = sum(Employed, na.rm = TRUE)) %>%
  tail(3)

modeltime

The modeltime framework requires no special data object - it works with tibbles.

1.2 Fitting Models

The workflow for an explanatory model is: fit, verify assumptions, then summarize parameters. For a predictive model it’s: compare models with cross-validation, predict values.

Let’s continue working with the us_employment data set and use just the Total Private series. We’ll split the data into training and testing to support predictive modeling.

# Tidyverts (tv) framework with tsibbles.
tv_full <- us_employment_tibble %>% 
  filter(Title == "Total Private") %>%
  mutate(Month = yearmonth(Month)) %>%
  tsibble(key = c(Title), index = Month)

tv_train <- tv_full %>% filter(year(Month) <= 2015)

tv_test <- tv_full %>% filter(year(Month) > 2015)

# Modeltime (mt) with tibbles
mt_split <-
  us_employment_tibble %>% 
  filter(Title == "Total Private") %>% 
  timetk::time_series_split(date_var = Month, initial = "15 years", assess = "45 months") 

Fit some simple benchmark models. Some forecasting methods are extremely simple and surprisingly effective. The mean method projects the historical average, \(\hat{y}_{T+h|T} = \bar{y}.\) The naive method projects the last observation, \(\hat{y}_{T+h|T} = y_T.\) The seasonal naive method projects the last seasonal observation, \(\hat{y}_{T+h|T} = y_{T+h-m(k+1)}.\) The drift method projects the straight line from the first and last observation, \(\hat{y}_{T+h|T} = y_T + h\left(\frac{y_T - y_1}{T-1}\right).\) Modeltime doesn’t appear to support the mean or random walk models.

# Tidverts supports fitting multiple models at once.
tv_fit <- 
  tv_train %>%
  model(
    Mean = MEAN(Employed),
    Naive = NAIVE(Employed),
    SNaive = SNAIVE(Employed),
    Drift = RW(Employed ~ drift())
  )

# Modeltime follows the tidymodels style. 
mt_fit_naive <-
  modeltime::naive_reg() %>%
  set_engine("naive") %>%
  parsnip::fit(Employed ~ Month, data = rsample::training(mt_split))

mt_fit_snaive <-
  modeltime::naive_reg() %>%
  set_engine("snaive") %>%
  parsnip::fit(Employed ~ Month, data = rsample::training(mt_split))
## frequency = 12 observations per 1 year
mt_fit <- modeltime::modeltime_table(
  mt_fit_naive,
  mt_fit_snaive
)

The autoplot() and autolayer() functions take a lot of the headache out of plotting the results, especially since forecast() tucks away the confidence intervals in a distribution list object. Nevertheless, I like extracting the results manually.

# modeltime
mt_fc <- mt_fit %>%
  modeltime::modeltime_calibrate(testing(mt_split)) %>%
  modeltime::modeltime_forecast(
    new_data = testing(mt_split), 
    actual_data = us_employment_tibble %>% filter(Title == "Total Private")
  )

#  (not run)
# mt_fc %>% modeltime::plot_modeltime_forecast(.interactive = FALSE)

# tidyverts
tv_fc <- tv_fit %>%
  forecast(new_data = tv_test) %>%
  hilo(80) %>%
  mutate(
    lpi = map_dbl(`80%`, ~.$lower),
    upi = map_dbl(`80%`, ~.$upper)
  )

tv_fit %>%
  augment() %>%
  ggplot(aes(x = Month)) +
  geom_line(aes(y = Employed)) +
  geom_line(data = tv_fc, aes(y = .mean, color = .model)) +
  geom_ribbon(data = tv_fc, aes(ymin = lpi, ymax = upi, fill = .model), alpha = 0.2) +
  labs(title = "Forecast with 80% CI")

1.3 Evaluating Fit

Evaluate the model fit with residuals diagnostics.1 broom::augment() adds three columns to the model cols: .fitted, .resid, and .innov. .innov is the residual from the transformed data (if no transformation, it just equals .resid).

Innovation residuals should be independent random variables normally distributed with mean zero and constant variance (the normality and variance conditions are only required for inference and prediction intervals). Happily, feasts has just what you need.

tv_fit %>% 
  select("Naive") %>% 
  feasts::gg_tsresiduals() +
  labs(title = "Residuals Analysis")

The autocorrelation plot tests the independence assumption. The histogram plot tests normality. The residuals plot tests mean zero and constant variance. You can carry out a portmanteau test on the autocorrelation assumption. Two common tests are the Box-Pierce and the Ljung-Box. These tests check the likelihood of a combination of autocorrelations at once, without testing any one correlation, kind of like an ANOVA test. The Ljung-Box test statistic is a sum of squared \(k\)-lagged autocorrelations, \(r_k^2\),

\[Q^* = T(T+2) \sum_{k=1}^l(T-k)^{-1}r_k^2.\]

The test statistic has a \(\chi^2\) distribution with \(l - K\) degrees of freedom (where \(K\) is the number of parameters in the model). Use \(l = 10\) for non-seasonal data and \(l = 2m\) for seasonal data. If your model has no explanatory variables, \(K = 0.\) Reject the no-autocorrelation (i.e., white noise) assumption if p < .05.

tv_fit %>% 
  broom::augment() %>% 
  features(.var = .innov, features = ljung_box, lag = 10, dof = 0)
## # A tibble: 4 × 4
##   Title         .model lb_stat  lb_pvalue
##   <chr>         <chr>    <dbl>      <dbl>
## 1 Total Private Drift     43.3 0.00000442
## 2 Total Private Mean    1085.  0         
## 3 Total Private Naive     43.3 0.00000442
## 4 Total Private SNaive  1170.  0

1.4 Model Selection

Evaluate the forecast accuracy with the test data (aka, “hold-out set”, and “out-of-sample data”). The forecast error is the difference between the observed and forecast value, \(e_{T+h} = y_{T+h} - \hat{y}_{t+h|T}.\) Forecast errors differ from model residuals in that they come from the test data set and because forecast values are usually multi-step forecasts which include prior forecast values as inputs.

The major accuracy benchmarks are:

  • MAE. Mean absolute error, \(mean(|e_t|)\)
  • RMSE. Root mean squared error, \(\sqrt{mean(e_t^2)}\)
  • MAPE. Mean absolute percentage error, \(mean(|e_t / y_t|) \times 100\)
  • MASE. Mean absolute scaled error, \(MAE/Q\) where \(Q\) is a scaling constant calculated as the average one-period change in the outcome variable (error from a one-step naive forecast).

The MAE and RMSE are on the same scale as the data, so they are only useful for comparing models fitted to the same series. MAPE is unitless, but does not work for \(y_t = 0\), and it assumes a meaningful zero (ratio data). MASE is most useful for comparing data sets of different units.

Use accuracy() to evaluate a model.

tv_fit %>% 
  fabletools::forecast(new_data = tv_test) %>%
  fabletools::accuracy(data = tv_test) %>%
  select(.model, RMSE, MAE, MAPE, MASE)
## # A tibble: 4 × 5
##   .model   RMSE    MAE  MAPE  MASE
##   <chr>   <dbl>  <dbl> <dbl> <dbl>
## 1 Drift   2888.  2439.  1.93   NaN
## 2 Mean   13018. 12713. 10.1    NaN
## 3 Naive   4514.  3831.  3.02   NaN
## 4 SNaive  5961.  5435.  4.31   NaN

Time series cross-validation is a better way to evaluate a model. It breaks the dataset into multiple training sets by setting a cutoff at varying points and setting the test set to a single step ahead of the horizon. Function stretch_tsibble() creates a tsibble of initial size .init and appends additional data sets of increasing size .step. Normal cross-validation repeatedly fits a model to the data set with one of the rows left out. Since model() fits a separate model per index value, creating this long tsibble effectively accomplishes the same thing. Note the fundamental difference here though: time series CV does not leave out single values from points along the time series. It leaves out all points after a particular point along the time series - each sub-data set starts at the beginning and is uninterrupted until reaching the varying end points.

us_employment_tsibble %>%
  filter(Title == "Total Private") %>%
  stretch_tsibble(.init = 3, .step = 1) %>%
  # Fit a model for each key
  model(
    Mean = MEAN(Employed),
    Naive = NAIVE(Employed),
    SNaive = SNAIVE(Employed),
    Drift = RW(Employed ~ drift())
  ) %>%
  fabletools::forecast(h = 12) %>%
  fabletools::accuracy(data = tv_test)
## # A tibble: 4 × 11
##   .model Title         .type     ME   RMSE    MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>         <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drift  Total Private Test    712.  1749.  1420. 0.561  1.14 0.631 0.775 0.825
## 2 Mean   Total Private Test  11845. 12043. 11845. 9.43   9.43 5.27  5.33  0.822
## 3 Naive  Total Private Test   1169.  2023.  1685. 0.925  1.34 0.749 0.896 0.795
## 4 SNaive Total Private Test   2267.  2274.  2267. 1.81   1.81 1.01  1.01  0.641

  1. Residuals and errors are not the same thing. The residual is the difference between the observed and fitted value in the training data set. The error is the difference between the observed and fitted value in the test data set.↩︎