Chapter 5 Time Series Objects
5.1 Time Series Objects
Every object we manipulate in R is characterized by a specific structure. Objects’ structures vary depending on the type of object: a list, a matrix, or a data.frame, are different objects with different structures. Every structure has its own manipulation methods. For instance, it can be accessed and analyzed by using different functions and strings of code.
In R there are many different types of object. To get an overview you can refer to this handbook, to the R manual, or the chapter 5 of this free online book. In this course we are going to learn more about the data structures we have to deal with when conducting time series analysis in R, that is, the structure of time series objects and data sets.
Time series data sets in R can be represented by different objects. Specific libraries (coherent collections of functions) can give different structures to time series data sets.
There are many R libraries for handling and working with time series objects. Some of them are more general and other are useful to perform very specific analysis. On this page you can find a comprehensive list of the R libraries for time series analysis.
For now, we just need to know that different libraries can create time series objects with different structures which can be manipulated through different functions. This means that not all the objects can be analyzed with all the functions, as well that there is not always compatibility between R libraries. Many functions have been developed with reference to specific libraries and objects, or require a particular object structure. As a consequence, creating a time series object with a certain structure or a certain library can imply that we can use some functions and perform only a certain type of analysis. In other words, specific type of objects could introduce specific constrains to data analysis (and visualization), so it could be wise to plan in advance the necessary analyses, so as to select the necessary libraries and data structure.
We now introduce three types of objects that are commonly used to store and analyze time series data:
- The data.frame (in base R)
- The ts object (in base R)
- The xts object (created through the library xts).
We analyze the structures of these objects and their strengths and limitations. In the next chapter, we’ll also learn the methods available to visualize them.
5.1.1 Time Series as Data Frames
Data frames (data.frame) are the most common data set structure in R. A data.frame is simply a table cases by variables (each row is a case and each column is a variable).
To see an example of data.frame containing time series data, we can upload a data set containing the number of news articles mentioning the keyword “elections” published by USA news media. I retrieved this data set from MediaCloud, a free and open source platform for studying media ecosystem that tracks millions of stories published online. You can download the data set at this link.
We can upload the .csv file by using the function read.csv. The main argument of the read.csv function is the path of the file.
<- read.csv("./data/elections-stories-over-time-20210111144254.csv") elections_news
By using the function class we can see that this is a data.frame.
class(elections_news)
## [1] "data.frame"
We can check the first few rows of the data.frame by using the function head, which shows the first few rows of the data set, so as to get an idea of the structure of this simple data.frame.
head(elections_news)
## date count total_count ratio
## 1 2015-01-01 373 25611 0.01456405
## 2 2015-01-02 387 31932 0.01211950
## 3 2015-01-03 289 24646 0.01172604
## 4 2015-01-04 322 25513 0.01262102
## 5 2015-01-05 567 39982 0.01418138
## 6 2015-01-06 626 42366 0.01477600
This data.frame contains time series data: the first column contains dates, and the other columns contain the values of the observations. We can also see that the data.frame seems to contain daily data, where each row corresponds to a specific day. The data frame also includes, in the column “count”, the number of news articles mentioning the keyword “elections”, in the column “total_count”, the total number of news articles on all the topics, and in the column “ratio” the proportion of news articles mentioning the keyword (count/total_count).
The function head (and tail) can be impractical with data.frame including a lot of columns, so it could be better to use the function str to check the structure of the data.frame.
str(elections_news)
## 'data.frame': 2192 obs. of 4 variables:
## $ date : chr "2015-01-01" "2015-01-02" "2015-01-03" "2015-01-04" ...
## $ count : int 373 387 289 322 567 626 507 521 531 346 ...
## $ total_count: int 25611 31932 24646 25513 39982 42366 45163 44928 44041 29093 ...
## $ ratio : num 0.0146 0.0121 0.0117 0.0126 0.0142 ...
As you can see in the output of the function str, the format of the column date is Factor. The format is, in this case, automatically attributed by R, but (as we have already said) it can be specified before importing the data.
Factor is an appropriate format for categorical variables, but R includes a specific format for dates and times. In this case we have just a date, so we can convert it to a variable of type date. We can change the format of the variable by using the function as.Date.
$date <- as.Date(elections_news$date) elections_news
str(elections_news)
## 'data.frame': 2192 obs. of 4 variables:
## $ date : Date, format: "2015-01-01" "2015-01-02" "2015-01-03" ...
## $ count : int 373 387 289 322 567 626 507 521 531 346 ...
## $ total_count: int 25611 31932 24646 25513 39982 42366 45163 44928 44041 29093 ...
## $ ratio : num 0.0146 0.0121 0.0117 0.0126 0.0142 ...
We can also perform the same operation with tidyverse, by using the function mutate.
library(tidyverse)
<- elections_news %>%
elections_news mutate(date = as.Date(date))
A data.frame is the common format for data sets, including time series data sets. We can do many things with data stored in this format, such as creating plots and performing various types of analysis. However, to handle time series in R there are more specific data formats.
5.1.2 Time Series as TS objects
The basic object created to handle time series in R is the object of class ts. The name stands for “Time Series”.
An example of ts object is already present in R under the name of “AirPassengers”, a time series data set in ts format. We can load this data set with the function data.
data(AirPassengers)
By applying the function class we can see that this is an object of class ts.
class(AirPassengers)
## [1] "ts"
AirPassengers is a small data set so we can print all the data set to see its structure, which is an example of the standard structure of a ts object.
AirPassengers
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
By calling the str function we get synthetic information on the object.
str(AirPassengers)
## Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
The AirPassengers data set is a univariate time series representing monthly totals of international airline passengers from 1949 to 1960. As every time series, it has a start date and an end date. It also has a frequency, which is the frequency at which the observations were taken.
All this characteristics differentiate a ts object from a data.frame. The structure of a data.frame lacks the start and the end date, and the frequency value.
The functions start, end, and frequency, can be applied to a ts object to check their values. We started by saying that some functions work with some objects but not with other types of objects. This is an example. These functions, indeed, work with ts objects just because they are part of their structures, and are arguments usually specified when this kind of object is created. They do not work if applied to a data.frame object, since a data.frame structure does not include the start and end date, nor the frequency of observations. It can be seen that the ts structure is much more specific for time series data.
start(AirPassengers)[1]
## [1] 1949
end(AirPassengers)[1]
## [1] 1960
frequency(AirPassengers)[1]
## [1] 12
Importantly, the frequency of a time series is assumed to be regular over time. This applies to time series in general, and not just to ts objects. In this case, the time series starts on January 1949 and ends on December 1969, and has monthly frequency. Monthly frequency is indicated in ts as “12”, meaning 12 months. Indeed, the reference unit of a ts object is a year. So, quarterly data, for instance, have frequency equal to 4.
To create a ts object is necessary to follow specific steps and use specific functions. To exemplify the process of creation of a ts object we take the example of the data contained in the AirPassengers data set, and store them in a data.frame (you don’t need to learn how to do that, just copy and paste the code). A data.frame is the data set format you will probably start with, so it can be useful to see how to create a ts object starting from a data.frame.
<- seq.Date(from = as.Date("1949-01-01"),
date to = as.Date("1960-12-01"), by="month")
<- as.vector(AirPassengers)
passengers
<- data.frame("Date" = date,
data_frame_format "Passengers" = passengers)
To create a “ts” time series object starting from a data.frame, we need:
- To specify which column contains the observations. In this case, the column name is “Passengers”.
- We then need to specify the start and end date, which in this case are in the format year/month, but can be just years in case of yearly data. The ts format for the start/end date is the following: c=(YEAR, MONTH). The c represents the concatenate function, and it concatenates the year and the month in a single vector.
- Finally, we indicate the frequency of the time series observations. The frequency is specified based on the time period of a year, so in this case we have a frequency equal to 12, because we have monthly observation, meaning that we have 12 observation per year.
<- ts(data = data_frame_format$Passengers,
ts_format start=c(1949, 01),
end=c(1960, 12),
frequency = 12)
ts_format
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
class(ts_format)
## [1] "ts"
It’s important to notice that yearly, quarterly, and monthly data work fine with the ts structure, but more fine grained data create complications and are not totally suitable for a ts structure.
This is due to the fact that time series objects require the frequency of observations to be regular and in ts the observations have to be regular with reference to a year. Unfortunately, a time series that spans over many years cannot be composed of a constant number of days, since the number of days will be sometimes 365 and other time 366, in case of leap years. This is a limitation of ts objects. However, when dealing with monthly data or data with frequency lower than one month (such as quarterly data), ts works great.
5.1.3 Time Series as XTS/ZOO objects
Time series can be stored in object of class xts/zoo. This class of objects is created with the library xts, which is related and an extension of the package zoo (another package to deal with time series data). As other libraries, it requires to be installed and loaded.
# install.packages("xts")
library(xts)
The xts object is more flexible that the ts one.
We can create an xts time series by starting from the data.frame we have just created. Similarly to what required by ts, we need to specify:
- the column of the data.frame (or the vector) containing the data;
- the column of the data.frame (or the vector) containing the dates/times (which has to be in a date/time format);
- the frequency of observations.
We can use the data.frame already created with the AirPassengers data to create a new xts object.
<- xts(x = data_frame_format$Passengers,
xts_format order.by = data_frame_format$Date,
frequency = 12)
class(xts_format)
## [1] "xts" "zoo"
Also the structure of this object, like the ts one, includes the range of dates of the time series, with it starting and ending date.
str(xts_format)
## An 'xts' object on 1949-01-01/1960-12-01 containing:
## Data: num [1:144, 1] 112 118 132 129 121 135 148 148 136 119 ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
head(xts_format)
## [,1]
## 1949-01-01 112
## 1949-02-01 118
## 1949-03-01 132
## 1949-04-01 129
## 1949-05-01 121
## 1949-06-01 135
5.2 Exercise
- Download the data set Austrian_Local_Media, a data set including metrics from Facebook pages of Austrian Local Media (monthly observations from January 2015 to December 2020 on quantity of posts and interactions)
- Set the “beginning_of_interval” to the appropriate “Date” format
- Create a ts object, called “ts_object_1”, using as data values the post_count values
- Create a xts object, called “xts_object_1”, using as data values the post_count values
- Use the Tidyverse function “mutate” to create a new column with the number of interaction by post (total_interactions/post_count) and call it “interaction_ratio”
- Create a ts object, called “ts_object_2”, using as data values the interaction_ratio values
- Create a xts object, called “xts_object_2”, using as data values the interaction_ratio values
- Try to plot these objects, by using the command plot.ts(ts_object_1), plot.ts(ts_object_2), plot.xts(xts_object_1), plot.xts(xts_object_2)