Mediation Analysis

Author

Dr. Nicola Righetti, PhD

Learning goals

What is the Simple Mediation Model
How to estimate a Simple Mediation Model using the PROCESS software (in R)
How to interpret the output of the estimation of the Simple Mediation Model using the PROCESS software
How to take into account confounding variables, the casual order, measure the effect size, and work with multiple $Xs$ and $Ys$ variables.
When and how to estimate models with more than one mediator (the parallel and multiple mediator models), and how to interpret them.
How to estimate and interpret models with a multicategorical antecedent variable.

In this unit we’ll learn the elements of mediation analysis, with particular attention to the most basic (but also popular) mediation model, consisting of a causal antecedent variable linked to a single consequent variable through an intermediary variable or mediator. This is called simple mediation model.

Download and load the PROCESS software

Download from Moodle (folder “Scripts”) the PROCESS software for R. The software is written by the author of the book we are using during this course, Andrew F. Hayes. The software is also available on https://www.PROCESSmacro.org/download.html, where you can find the version for R and other statistical software. In this course we’ll use the R version (several examples in the book are from the SPSS version).
Create a folder “PROCESS_R” in your R project directory, and save the “PROCESS.R” there
Run the code below to load the process functions

source("PROCESS_R/process.R")


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
 
PROCESS is now ready for use.
Copyright 2021 by Andrew F. Hayes ALL RIGHTS RESERVED

Download and load the data set

Download the pmi data set and save it in your “data” folder, in the R project directory.
Load the data by using the read.csv function.

pmi_data <- read.csv("data/pmi.csv")

Case study description

Participants read a news story describing global economic conditions leading to a possible sugar shortage and an increase in price.

Researchers told them that the story is about to be published:

on the front page of a newspaper (front-page condition, cond = 0)
or in the inside of an economic supplement (back page condition, cond = 1).

This is the independent variable ($X$), and is a dichotomous variable. In this case it expresses an experimentally manipulated condition.

Participants then answered a series of questions:

about their intention to buy sugar (DV = reported reaction, reaction)
and aimed to assess (1) their perception of media’s influence on the general public’s intention to buy sugar (presumed media influence, pmi), and (2) how important the topic was to the global economic crisis (perceived importance of the topic, import).

Research question: does the location of the article affect behavioral intentions indirectly through presumed media influence?

The question is about a mechanism at work that leads to different behavioral intentions after being told about the different locations of the article. The appropriate statistical model is a mediated model. This is the kind of model we use to answer questions about mechanisms.

The simple mediation model

Estimating a mediation model implies finding three different coefficients (a, b, and c', and three derived effects (total, indirect, direct), by fitting three different regression models.

The total effect (c) is just the coefficient (c) we would find by fitting a simple regression model (i.e. $Y = i + cX$)
The indirect effect (ab) is the effect of $X$ on $Y$, through the mediator $M$. It’s obtained by multiplying $a * b$
The direct effect (c’)is the partial effect of $X$ on $Y$ after controlling for $M$.

In practice, a simple mediation model decompose the total effect of $X$ on $Y$, in a direct and indirect effect.

When Y and M are estimated using OLS regression, meaning analyzed as continuous variables using the OLS criterion for maximizing fit of the model (or when using a maximum-likelihood-based model of continuous outcomes, another method in place of the OLS), we have the following equivalence between the total effect, and the direct and indirect effect.

TOTAL EFFECT = DIRECT EFFECT + INDIRECT EFFECT \[c = ab + c'\]

This simple equation does not apply when using other modeling methods that deviate from what is known in the generalized linear modeling literature as an “identity link” in the model.

The three regression model part of the Simple Mediation Models are:

\[0) \hat Y = i_1 + c_X\] \[1) \hat M = i_2 + a_X\] \[2) \hat Y = i_3 + c'_X + bM\]

Estimation with PROCESS

Using OLS regression, we use the process function to estimate equations 1 and 2 (in the previous slide) and thereby to get a, b, and c′ along with standard regression statistics such as R-squared (R2) for each of the equations. It also creates a section of output containing the direct and indirect effects of $X$.

The function requires you to specify:

The data set
The dependent variable $y$
The independent variable $x$
The mediator variable $m$

Moreover, we specify:

The total parameter, which we set to 1 (total = 1). It is used to tell the software to generate the total effect of $X$ on $Y$ (which we called c).
model=4, a parameter used to estimate mediation models.
We also use a seed with an arbitrary number. This is used to ensure replicability of the results, since the model estimation involves a procedure using random number generation (the seed ensures that the randomly generated sequences are the same if you repeat the estimation).

process(pmi_data, y = "reaction", x = "cond", m = "pmi", 
        total = 1, normal = 1, model = 4, progress=0,
        seed=31216)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 4       
    Y : reaction
    X : cond    
    M : pmi     

Sample size: 123

Custom seed: 31216


*********************************************************************** 
Outcome Variable: pmi

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.1808    0.0327    1.7026    4.0878    1.0000  121.0000    0.0454

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    5.3769    0.1618   33.2222    0.0000    5.0565    5.6973
cond        0.4765    0.2357    2.0218    0.0454    0.0099    0.9431

*********************************************************************** 
Outcome Variable: reaction

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.4538    0.2059    1.9404   15.5571    2.0000  120.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    0.5269    0.5497    0.9585    0.3397   -0.5615    1.6152
cond        0.2544    0.2558    0.9943    0.3221   -0.2522    0.7609
pmi         0.5064    0.0970    5.2185    0.0000    0.3143    0.6986

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: reaction

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.1603    0.0257    2.3610    3.1897    1.0000  121.0000    0.0766

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.2500    0.1906   17.0525    0.0000    2.8727    3.6273
cond        0.4957    0.2775    1.7860    0.0766   -0.0538    1.0452

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI
     0.4957    0.2775    1.7860    0.0766   -0.0538    1.0452

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI
     0.2544    0.2558    0.9943    0.3221   -0.2522    0.7609

Indirect effect(s) of X on Y:
       Effect    BootSE  BootLLCI  BootULCI
pmi    0.2413    0.1322    0.0143    0.5254

Normal theory test for indirect effect(s):
       Effect        se         Z         p
pmi    0.2413    0.1300    1.8559    0.0635

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 5000

Interpretation of coefficients

Two people who differ by one unit in $X$ (front-page condition vs. back-page condition) are estimated to be differ by 0.4957 units (95% CI [0.0538, 1.0452]) on average in their intention to buy sugar (total effect).
They differ by 0.2413 units (95% CI [0.0143, 0.5254]) on average as a result of the indirect effect of page manipulation on buying intention through presumed media influence.
The rest of the difference, 0.2544 units (95% CI [-0.2522, 0.7609]), is the direct effect of page manipulation on reported reactions, which is independent of the effects of page manipulation on reaction through PMI.

Inference: test based on normal theory and bootstrapping

When coming to statistical inference, and thus to the “statistical significance” of the coefficients, the output reports p-values and confidence intervals.

Inferential methods have different characteristics for the direct effect and the indirect effect: Inference for the direct effect (and the total effect) is simple and non-controversial, while the case of indirect effect is more complex.

Regarding the direct effect, the inference can be framed in terms of a p-value (through hypothesis testing) or confidence interval. In the example above, the direct effect of the experimental condition, independent from the mediator, amounts to 0.2544 units (95% CI [-0.2522, 0.7609], p-value: 0.3221). In this case, the p-value and the confidence interval show that the effect is not statistically significant: the p-value is above the standard threshold of 0.05, while the confidence interval includes zero (when it does so, it means that it is possible that it is equal to zero, hence, it is considered to be not statistical significant).

Regarding the indirect effect, there are two possibilities:

Calculating the p-value through a hypothesis test, relying the assumption of normality of the sampling distribution.
Using a particular technique called bootstrapping.

In the bottom part of the output of process, are both the indirect effect(s) of X on Y, in the confidence interval form, and the normal theory test for indirect effect(s), in the p-value form.

Hayes suggests that, based on statistical research, the recommendation is, in general, to avoid the normal theory test for indirect effect(s) approach, and to interpret the confidence intervals obtained using bootstrapping. This approach would be better in several empirical circumstances, and more powerful in detecting statistical significance. We can even avoid calculating the normal theory test by removing normal = 1 from the function process.

Bootstrapping and normal theory test do not necessarily lead to the same conclusions. For instance, in the example, the p-value of the indirect effect is not statistically significant (p-value = 0.0635, > 0.05), while the bootstrapping confidence interval is statistically significant (it does not include zero: 95% CI [0.0143, 0.5254]).

Bootstrapping

Confidence intervals can also be calculated through standard errors based on the Central Limit Theorem, based on theoretical assumptions regarding the sampling distribution. In this case, instead, they are calculated through bootstrapping, which is a procedure based on resampling with replacement. Also in this case, it is of fundamental importance that the sample is a miniature representation of the original population.

Resampling with replacement is a resampling method that create new samples by choosing, in a random way, the cases that compose an initial sample, where the same case can be chosen more than once (random sampling with repleacement):

Suppose case 1 in the original sample is “Joe.” Joe happened to be contacted for participation in the study and provided data to the effort. In the resampling process, Joe functions as a stand-in for himself and anyone else like him in the pool of potential research participants, as defined by Joe’s measurements on the variables in the model. The original sampling could have sampled several Joes or none, depending in part on the luck of the draw.

In bootstrapping, a large number of samples are drawn from the sample (e.g. 5,000 is the default in PROCESS. Also 10,000 is commonly used), using the just mentioned random sampling with replacement.

Next, the indirect effect (the statistic of interest in this case) is calculated based on each sample.

Finally, the confidence intervals are found by dividing the resulting distribution of indirect effects in 100 parts (percentiles), and taking the value corresponding to the 2.5 percentile (lower boundary of the 95% confidence interval), and the 97.5 percentile (upper boundary of the 95% confidence interval).

A difference between confidence intervals calculated using standard errors (relying on the “theoretical” sampling distribution), and those calculated through bootstrapping, is that the former is always symmetrical (the upper and lower bound of the confidence interval are equidistant from the point estimate), while the latter can be asymmetrical, depending on the shape of the distribution.

Bootstrapping is particularly useful relative to the normal theory approach in smaller samples, because it is in smaller samples that the non-normality of the sampling distribution of ab is likely to be most severe, the large sample asymptotic of the normal theory approach are harder to trust, and the power advantages of bootstrapping are more pronounced.

Example with continuous X

In the previous example the independent variable ($X$), was a dichotomous variable. In this case it expressed an experimentally manipulated condition, but a variable can be used in a mediation model even when in non-experimental studies.

In other case, the independent variable can be continuous. No modifications are necessary to the mathematics or procedures described to estimate these effects, and the interpretation of these effects otherwise remains unchanged.

Download the estress data from Moodle, save the file into your “data” folder and load it.

estress <- read.csv("data/estress.csv")

The data set includes information on 262 entrepreneurs who responded to an online survey about recent performance of their business as well as their emotional and cognitive reactions to the economic climate.

Researchers proposed that economic stress ($X$, variable estress) leads to a desire to disengage from entrepreneurial activities ($Y$, variable withdraw) as a result of the depressed affect ($M$, variable affect) such stress produces, which in turns leads to a desire to disengage from entrepreneurship.

More specifically, the experience of stress results in feelings of despondency and hopelessness, and the more such feelings of depressed affect result, the greater the desire to withdraw from one’s role as a small-business owner to pursue other vocational activities.

So depressed affect was hypothesized as a mediator of the effect of economic stress on withdrawal intentions.

process(estress, y = "withdraw", x = "estress", m = "affect", 
        total = 1, normal = 1, model = 4, progress=0,
        seed=100770)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 4       
    Y : withdraw
    X : estress 
    M : affect  

Sample size: 262

Custom seed: 100770


*********************************************************************** 
Outcome Variable: affect

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.3401    0.1156    0.4650   33.9988    1.0000  260.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    0.7994    0.1433    5.5777    0.0000    0.5172    1.0816
estress     0.1729    0.0296    5.8308    0.0000    0.1145    0.2313

*********************************************************************** 
Outcome Variable: withdraw

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.4247    0.1804    1.2841   28.4946    2.0000  259.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    1.4471    0.2520    5.7420    0.0000    0.9508    1.9433
estress    -0.0768    0.0524   -1.4667    0.1437   -0.1800    0.0263
affect      0.7691    0.1031    7.4627    0.0000    0.5662    0.9721

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: withdraw

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.0641    0.0041    1.5543    1.0718    1.0000  260.0000    0.3015

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    2.0619    0.2620    7.8691    0.0000    1.5459    2.5778
estress     0.0561    0.0542    1.0353    0.3015   -0.0506    0.1629

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI
     0.0561    0.0542    1.0353    0.3015   -0.0506    0.1629

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI
    -0.0768    0.0524   -1.4667    0.1437   -0.1800    0.0263

Indirect effect(s) of X on Y:
          Effect    BootSE  BootLLCI  BootULCI
affect    0.1330    0.0333    0.0713    0.2033

Normal theory test for indirect effect(s):
          Effect        se         Z         p
affect    0.1330    0.0291    4.5693    0.0000

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 5000

Reporting the estimated coefficients:

Two entrepreneurs who differ by one unit in their economic stress are estimated to differ by 0.133 (95% CI [0.0713, 0.2033]) units in their reported intentions to withdraw from their business as a result of the tendency for those under relatively more economic stress to feel more depressed affect (because a is positive), which in turn translates into greater withdrawal intentions (because b is positive).

This indirect effect is statistically different from zero, as revealed by a 95% bootstrap confidence interval that is entirely above zero (95% CI [0.0713, 0.2033])

Statistical controls

If two variables $M$ (e.g., how much children watch television) and $Y$ (e.g., being outweighed) are confounded due to their association with some variable $C$ (e.g., how much parents encourage a healthy lifestyle in their children), then the association between $M$ and $Y$ should not exist among people who are equal on $C$ (e.g., have parents equally encouraging a healthy lifestyle in their children), no matter what they are different on $M$ (e.g., how much they watch television).

In regression, if we add a variable $C$, we can get the partial coefficient of $M$, we interpret as the weight of the $M$ variable, holding all the other variables in the model constant (including $C$):

Confounding association due to $C$ can be ruled out by including $C$ as a predictor.
Adding $C$ to the models of $M$ and $Y$ will also remove $C$ as a confounding threat to a causal claim about the association between $X$ and $M$ and $X$ and $Y$ as well as between $M$ and $Y$.

More than one confounding variable can be included in the model.

We can make an example using the estress data set and research problem. In this research project nothing was manipulated, and potential confounds abound. For example, the indirect effect may be a manifestation of nothing other than individual differences such as perceptions of one’s own confidence and skill in managing a business. People who feel relatively more confident in their abilities may tend to feel relatively less stress in general, perhaps are less prone to feeling negative and down about their business under any circumstances.

If so, then statistically controlling for such an individual difference when assessing the indirect effect of economic stress should weaken or eliminate it.

That is, among people equal in their confidence, there should be no evidence of an indirect effect of economic stress on withdrawal intentions through depressed affect, because this variable has been taken out of the process that, by this reasoning, induces spurious association between $X$ and $M$ and between $M$ and $Y$. But if the indirect effect persists even when holding confidence constant, a causal claim remains viable.

In the data set is included a measure of entrepreneurial self-efficacy (ese).

Those relatively high in entrepreneurial self-efficacy did report feeling relatively less economic stress ($r = −0:158; p = 0.010$), relatively less business-related depressed affect ($r = −0.246; p < 0.001$), and reported relatively weaker intentions to withdraw from entrepreneurship ($r = −0:243; p < 0.001$). So spurious or epiphenomenal association are plausible alternative explanations for at least some of the relationship observed between economic stress, depressed affect, and withdrawal intentions.

To illustrate that more than a single variable can be used as a statistical control, we also include other statistical controls, namely sex of the participant (sex in the data, 0 = female, 1 = male) and length of time in the business, in years (tenure in the data) as predictors.

The only difference between the code to fit a mediation model with covariates and the code we already saw, is the addition of the covariates listed in a vector (using the c function) after the parameter cov.

By default, any variable in the covariate list will be included as additional antecedent variables in the model of each of the consequent.

process(estress, y = "withdraw", x = "estress", m = "affect", 
        cov = c("ese", "sex", "tenure"),
        total = 1, model = 4, progress=0,
        seed=100770)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 4       
    Y : withdraw
    X : estress 
    M : affect  

Covariates: 
       ese sex tenure

Sample size: 262

Custom seed: 100770


*********************************************************************** 
Outcome Variable: affect

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.4039    0.1631    0.4452   12.5231    4.0000  257.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    1.7855    0.3077    5.8033    0.0000    1.1796    2.3914
estress     0.1593    0.0297    5.3612    0.0000    0.1008    0.2179
ese        -0.1549    0.0444   -3.4892    0.0006   -0.2423   -0.0675
sex         0.0148    0.0857    0.1726    0.8631   -0.1540    0.1836
tenure     -0.0108    0.0063   -1.7227    0.0861   -0.0232    0.0016

*********************************************************************** 
Outcome Variable: withdraw

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.4539    0.2060    1.2586   13.2824    5.0000  256.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    2.7461    0.5502    4.9913    0.0000    1.6626    3.8295
estress    -0.0935    0.0527   -1.7751    0.0771   -0.1973    0.0102
affect      0.7071    0.1049    6.7420    0.0000    0.5006    0.9137
ese        -0.2121    0.0764   -2.7769    0.0059   -0.3625   -0.0617
sex         0.1274    0.1441    0.8838    0.3776   -0.1565    0.4112
tenure     -0.0021    0.0106   -0.1940    0.8463   -0.0230    0.0189

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: withdraw

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.2550    0.0650    1.4763    4.4667    4.0000  257.0000    0.0017

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    4.0087    0.5603    7.1548    0.0000    2.9053    5.1120
estress     0.0191    0.0541    0.3535    0.7240   -0.0874    0.1257
ese        -0.3216    0.0808   -3.9789    0.0001   -0.4808   -0.1624
sex         0.1379    0.1561    0.8831    0.3780   -0.1695    0.4453
tenure     -0.0097    0.0115   -0.8491    0.3966   -0.0323    0.0128

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI
     0.0191    0.0541    0.3535    0.7240   -0.0874    0.1257

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI
    -0.0935    0.0527   -1.7751    0.0771   -0.1973    0.0102

Indirect effect(s) of X on Y:
          Effect    BootSE  BootLLCI  BootULCI
affect    0.1127    0.0292    0.0594    0.1734

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 5000

Comparing the output for the model controlling for sex, tenure, and entrepreneurial self-efficacy to the output excluding these controls, it can be seen that substantively, nothing has really changed.

However, this kind of analysis may be done in order to see how sensitive or susceptible the results from a comparable analysis without such controls is to alternative explanations involving those variables being controlled, or it may be done because it is known a priori or based on preliminary analyses that certain variables may be producing spurious association between key variables in the causal system.

Ruling out spurious association as alternative explanations is an important part of any causal argument that includes associations that are only correlational in nature. We’ll see more about that after the exercise.

Exercise

Read the following paper and replicate the study 1 analysis using the provided data. Carefully observe how the data, the measures, the method and the analysis are presented in the paper.

Thomas, M. F., Binder, A., & Matthes, J. (2022). The agony of partner choice: The effect of excessive partner availability on fear of being single, self-esteem, and partner choice overload. Computers in Human Behavior, 126, 106977.

Follow these steps:

Check the fitted mediation model (Figure 1) and take note of the variables in the model.
Read about participants in the experiment (section 2.2) and focus your attention on the measures (section 2.3) to understand how the variables were measured. Except for one variable, which was measured by one item in the survey, the other two variables were measured by two and three items. Check the variables in the data set, and identify the items necessary to construct the three variables in the model.
In this case, the final variable is obtained by averaging the items. The data set already include the final variables. Identify them.
Continue reading the Measures section (2.3) and take note of the control variables used in the model. Identify the variables in the data set. In this case, the control variable “Dummy_Female” is used in the model in place of “gender”.
Take note of people excluded from the analysis and the reason why. You also need to remove these subjects from the data set before running the analysis. You can use the dplyr function filter.
Explore the variables to learn about the coding using the haven function print_labels (e.g.: print_labels(dat$RsStatus)
Fit the mediation model using Process.

# Variables ----------------

# Independent variable (X): DAuse
# Moderator (M): Perceived Partner Availability (ppa)
# Dependent variable (Y): Fear of Being Single (fbs)
# Control variables: age, gender, RsStatus

# Seventeen people were excluded from analysis ----------------
# because they had not disclosed their relationship status (RsStatus=3)

library(tidyverse)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.1     ✔ purrr   1.0.1
✔ tibble  3.1.8     ✔ dplyr   1.1.0
✔ tidyr   1.3.0     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 1.0.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

dat_final <- dat %>%
  filter(RsStatus != 3)

# Estimating the mediation model with Process  ----------------

process(dat_final, y = "FOBS", x = "DAuse", m = "PartnerAvail", 
        cov = c("age", "Dummy_Female", "RsStatus"),
        total = 1, normal = 1, model = 4, progress=0,
        boot = 10000, 
        decimals = 9.2,
        seed = 31216)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                    
Model : 4           
    Y : FOBS        
    X : DAuse       
    M : PartnerAvail

Covariates: 
       age Dummy_Female RsStatus

Sample size: 650

Custom seed: 31216


*********************************************************************** 
Outcome Variable: PartnerAvail

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
       0.28      0.08      1.25     13.81      4.00    645.00      0.00

Model: 
                 coeff        se         t         p      LLCI      ULCI
constant          2.59      0.25     10.38      0.00      2.10      3.07
DAuse             0.15      0.03      4.52      0.00      0.08      0.21
age               0.00      0.00      0.62      0.54     -0.00      0.01
Dummy_Female     -0.44      0.09     -4.89      0.00     -0.62     -0.26
RsStatus          0.12      0.10      1.24      0.21     -0.07      0.31

*********************************************************************** 
Outcome Variable: FOBS

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
       0.34      0.11      1.43     16.63      5.00    644.00      0.00

Model: 
                 coeff        se         t         p      LLCI      ULCI
constant          2.33      0.29      8.12      0.00      1.77      2.90
DAuse             0.14      0.04      3.80      0.00      0.07      0.20
PartnerAvail      0.11      0.04      2.66      0.01      0.03      0.19
age              -0.02      0.00     -5.50      0.00     -0.03     -0.01
Dummy_Female     -0.17      0.10     -1.77      0.08     -0.37      0.02
RsStatus          0.46      0.10      4.49      0.00      0.26      0.66

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: FOBS

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
       0.32      0.10      1.44     18.84      4.00    645.00      0.00

Model: 
                 coeff        se         t         p      LLCI      ULCI
constant          2.62      0.27      9.81      0.00      2.10      3.15
DAuse             0.15      0.04      4.32      0.00      0.08      0.22
age              -0.02      0.00     -5.41      0.00     -0.03     -0.01
Dummy_Female     -0.22      0.10     -2.30      0.02     -0.41     -0.03
RsStatus          0.47      0.10      4.60      0.00      0.27      0.68

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI
       0.15      0.04      4.32      0.00      0.08      0.22

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI
       0.14      0.04      3.80      0.00      0.07      0.20

Indirect effect(s) of X on Y:
                Effect    BootSE  BootLLCI  BootULCI
PartnerAvail      0.02      0.01      0.00      0.03

Normal theory test for indirect effect(s):
                Effect        se         Z         p
PartnerAvail      0.02      0.01      2.25      0.02

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 10000

# Create the variables ourself ----------------

# Moderator (M): Perceived Partner Availability (ppa)
# two items: Partner_Availabilty1 and Partner_Availabilty2

dat_final <- dat_final %>%
  mutate(ppa = (Partner_Availabilty1 + Partner_Availabilty2)/2)

# Create the dependent variable (Y) ----------------
# Fear of Being Single (fbs)

dat_final <- dat_final %>%
  mutate(fbs = (FearBeingSingle1 + FearBeingSingle2 + FearBeingSingle3)/3)

# Estimating the mediation model with Process  ----------------

process(dat_final, y = "fbs", x = "DAuse", m = "ppa", 
        cov = c("age", "Dummy_Female", "RsStatus"),
        total = 1, normal = 1, model = 4, progress=0,
        boot = 10000, 
        decimals = 9.2,
        seed=31216)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
             
Model : 4    
    Y : fbs  
    X : DAuse
    M : ppa  

Covariates: 
       age Dummy_Female RsStatus

Sample size: 650

Custom seed: 31216


*********************************************************************** 
Outcome Variable: ppa

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
       0.28      0.08      1.25     13.81      4.00    645.00      0.00

Model: 
                 coeff        se         t         p      LLCI      ULCI
constant          2.59      0.25     10.38      0.00      2.10      3.07
DAuse             0.15      0.03      4.52      0.00      0.08      0.21
age               0.00      0.00      0.62      0.54     -0.00      0.01
Dummy_Female     -0.44      0.09     -4.89      0.00     -0.62     -0.26
RsStatus          0.12      0.10      1.24      0.21     -0.07      0.31

*********************************************************************** 
Outcome Variable: fbs

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
       0.34      0.11      1.43     16.63      5.00    644.00      0.00

Model: 
                 coeff        se         t         p      LLCI      ULCI
constant          2.33      0.29      8.12      0.00      1.77      2.90
DAuse             0.14      0.04      3.80      0.00      0.07      0.20
ppa               0.11      0.04      2.66      0.01      0.03      0.19
age              -0.02      0.00     -5.50      0.00     -0.03     -0.01
Dummy_Female     -0.17      0.10     -1.77      0.08     -0.37      0.02
RsStatus          0.46      0.10      4.49      0.00      0.26      0.66

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: fbs

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
       0.32      0.10      1.44     18.84      4.00    645.00      0.00

Model: 
                 coeff        se         t         p      LLCI      ULCI
constant          2.62      0.27      9.81      0.00      2.10      3.15
DAuse             0.15      0.04      4.32      0.00      0.08      0.22
age              -0.02      0.00     -5.41      0.00     -0.03     -0.01
Dummy_Female     -0.22      0.10     -2.30      0.02     -0.41     -0.03
RsStatus          0.47      0.10      4.60      0.00      0.27      0.68

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI
       0.15      0.04      4.32      0.00      0.08      0.22

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI
       0.14      0.04      3.80      0.00      0.07      0.20

Indirect effect(s) of X on Y:
       Effect    BootSE  BootLLCI  BootULCI
ppa      0.02      0.01      0.00      0.03

Normal theory test for indirect effect(s):
       Effect        se         Z         p
ppa      0.02      0.01      2.25      0.02

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 10000

Confounding variables, causal order, effect sizes, and multiple Xs and Ys

Learning outcomes

Learn to rule out alternative explanation in mediation models with reference to
1. confounding variables and
2. alternative casual explanations
Learn the meaning of “effect size” and learn how to measure it
Learn how to work with multiple $X$ and $Y$ variables, and the potential pitfalls thereof

Confounding variables

An apparent causal association between $X$ and $M$ ($X -> M$) may be actually due to some other variable that $X$ is actually affecting, so there is the risk one will go away with the mistaken conclusion that $X$ affects $Y$ indirectly through $M$ when in fact the other variable is the mechanism variable through which $X$ exerts its effect indirectly. We can call this phenomenon as epiphenomenal association.

That is, the association between $M$ and $Y$ may be an epiphenomenon of the fact that $X$ affects some other variable not in the model, and that other variable affects $Y$, but because $M$ is correlated with that other variable, it appears that $M$ is the variable through with $X$’s effect on $Y$ is carried.

Other times the association between variables is spurious. For example, the fact that children who watch relatively more television are more likely to be overweight does not imply with certainty that excessive television viewing causes weight problems. Perhaps parents who don’t encourage a healthy lifestyle are more likely to purchase and feed their children less healthy food that is high in fat and calories and are also less likely to encourage their children to play sports, exercise, or engage in other behaviors that are better for their bodies than just watching television. So it isn’t necessarily the excessive television viewing causing the weight problems.

When $X$ is not experimentally manipulated, then things get even worse. Absent random assignment to values of $X$, all of the associations in a mediation model are susceptible to confounding and epiphenomenal association, not just the association between $M$ and $Y$.

Causal order

Mediation is a causal process, and among the criteria for claiming that an association is cause–effect is establishing that the cause precedes the effect in time. Even experimental manipulation and random assignment to $X$ all but guarantees that $X$ precedes $M$ and $Y$ in a mediation model.

This is because random assignment largely ensures that the groups that define $X$ are equal on $M$ and $Y$ on average at the beginning of the study. Any differences observed between $M$ and $Y$ following random assignment must have occurred after the assignment of cases to groups.

But random assignment does not ensure that $M$ precedes $Y$ in time. Who is to say that the direction of causal flow runs from $X$ to $M$ to $Y$? Perhaps the true causal sequence is $X$ to $Y$ to $M$.

For example, in the presumed media influence study, it could be argued that if people believe they should take action ($Y$) in response to the article about a possible sugar shortage in the country ($X$), they then project that decision onto the public at large ($M$) as a form of rationalization for their own beliefs and chosen course of action (and not that believe they should take action ($Y$) in response to the article ($X$) because thinking other will do the same).

In the estress study, where depressed affect ($M$) was construed as a mediator of the effect of economic stress ($X$) on withdrawal intentions ($Y$) ($X→M→Y$), an alternative proposal is that people who begin to ponder giving up on a business ($Y$) start putting less time into the enterprise, which in time hurts the profit margin so that economic stresses mount ($X$), and the owner begins to start getting depressed about having to abandon his or her business ($M$) ($Y→X→M$).

In non-experimental studies, any sequence of causal ordering of $X$, $M$, and $Y$ must be entertained as a potential candidate for the direction of causal flow. Hopefully, strong theory or logical impossibility or implausibility precludes some of these. Sometimes certain alternative directions of causal flow are so implausible that they can be discounted without difficulty.

Approach

In an attempt to entertain alternative direction of causal flow, one procedure some investigators employ is to estimate a mediation model corresponding to the alternative explanation to see whether the direct and indirect effects are consistent with what that alternative order predicts.

For instance, when economic stress was specified as the mediator of the effect of withdrawal intentions on depressed affect, there was no evidence of such a process at work, as a bootstrap confidence interval for the indirect effect contained zero.

Similarly, when this procedure was applied to the presumed media influence study by treating presumed media influence as the final outcome and intentions to buy sugar as the mediator, the results were not consistent with this alternative direction of causal flow.

Effect size

The direct and indirect effects in the mediation model are scaled in terms of the metrics of $X$ and $Y$. Depending on the measurement scale, the absolute size of the direct and indirect effects say nothing about whether the effects are large or small in a practical or theoretical sense. Measure of effect size are used to understand if the effect is more or less large or small.

Standardization is a way to make values independent from the specific measurement scales. For instance, the Pearson’s correlation coefficient is a standardized measure of variance. In a similar way, we can use standardization to create two measures of effect size that apply to the direct, indirect, and total effects in a mediation model: the partially standardized effect and the completely standardized effect.

Partially standardized effect

The partially standardized effect is calculated by dividing the direct (c’) and indirect (ab) effect by the standard deviation of the $Y$ variable:

\[c'_{ps} = \frac{c'}{SD_Y}\]

\[ab_{ps} = \frac{ab}{SD_Y}\]

For instance, in the estress study, the partially standardized indirect effect is 0.107, meaning that two entrepreneurs who differ by one unit on X (economic stress) differ by about one-tenth of a standard deviation (0.107) in their intentions to withdraw from entrepreneurship as a result of the effect of stress on depressed affect.

Completely standardized effect

Thepartially standardized effect is still expressed in the original $X$ scale (“two entrepreneurs who differ by one unit on $X$… differ by about one-tenth (0.107) of a standard deviation…”).

The completely standardized effect standardizes both the $X$ and the $Y$:

\[c'_{cs} = \frac{SD_X(c')}{SD_Y} = SD_X(c'_ps)\]

\[ab_{cs} = \frac{SD_X(ab)}{SD_Y} = SD_X(ab_ps)\]

These two measures are identical to the direct and indirect effects when those effects are calculated using standardized regression coefficients, or when standardized $X$, $M$, and $Y$ variables are used in the model rather than $X$, $M$, and $Y$ in their original metric.

In the estress study, one standard deviation change in economic stress leads to a 0.152 standard deviations change in withdrawal intentions, as a result of the effect of stress on affect which in turn influences withdrawal intentions.

The completely standardized effect is generally not meaningful if $X$ is a dichotomous variable, thus, in this case, it is not recommended.

Effect size in PROCESS

To calculate the partial and complete standardized effect size, just add effsize=1. The process software also generates confidence intervals for the partially and completely standardized indirect effects using bootstrapping.

process(estress, y = "withdraw", x = "estress", m = "affect", 
        cov = c("ese", "sex", "tenure"),
        total = 1, model = 4, progress = 0,
        effsize = 1,
        seed=100770)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 4       
    Y : withdraw
    X : estress 
    M : affect  

Covariates: 
       ese sex tenure

Sample size: 262

Custom seed: 100770


*********************************************************************** 
Outcome Variable: affect

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.4039    0.1631    0.4452   12.5231    4.0000  257.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    1.7855    0.3077    5.8033    0.0000    1.1796    2.3914
estress     0.1593    0.0297    5.3612    0.0000    0.1008    0.2179
ese        -0.1549    0.0444   -3.4892    0.0006   -0.2423   -0.0675
sex         0.0148    0.0857    0.1726    0.8631   -0.1540    0.1836
tenure     -0.0108    0.0063   -1.7227    0.0861   -0.0232    0.0016

*********************************************************************** 
Outcome Variable: withdraw

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.4539    0.2060    1.2586   13.2824    5.0000  256.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    2.7461    0.5502    4.9913    0.0000    1.6626    3.8295
estress    -0.0935    0.0527   -1.7751    0.0771   -0.1973    0.0102
affect      0.7071    0.1049    6.7420    0.0000    0.5006    0.9137
ese        -0.2121    0.0764   -2.7769    0.0059   -0.3625   -0.0617
sex         0.1274    0.1441    0.8838    0.3776   -0.1565    0.4112
tenure     -0.0021    0.0106   -0.1940    0.8463   -0.0230    0.0189

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: withdraw

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.2550    0.0650    1.4763    4.4667    4.0000  257.0000    0.0017

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    4.0087    0.5603    7.1548    0.0000    2.9053    5.1120
estress     0.0191    0.0541    0.3535    0.7240   -0.0874    0.1257
ese        -0.3216    0.0808   -3.9789    0.0001   -0.4808   -0.1624
sex         0.1379    0.1561    0.8831    0.3780   -0.1695    0.4453
tenure     -0.0097    0.0115   -0.8491    0.3966   -0.0323    0.0128

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI      c_cs
     0.0191    0.0541    0.3535    0.7240   -0.0874    0.1257    0.0218

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI     c'_cs
    -0.0935    0.0527   -1.7751    0.0771   -0.1973    0.0102   -0.1068

Indirect effect(s) of X on Y:
          Effect    BootSE  BootLLCI  BootULCI
affect    0.1127    0.0292    0.0594    0.1734

Completely standardized indirect effect(s) of X on Y:
          Effect    BootSE  BootLLCI  BootULCI
affect    0.1286    0.0330    0.0668    0.1968

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 5000

Multiple X variables

Researchers sometimes propose that several causal agents ($X$ variables) simultaneously transmit their effects on the same outcome through the same mediator(s).

The computation of the direct and indirect effects in models with multiple $X$ variables requires no modification to the procedure discussed thus far.

The process software cannot fit models with more than one $X$ variable. To estimate such a model, you have to fit more than one model, indicating the $X$ variables as covariates, in the cov parameter.

For instance, consider a mediator model with one mediator (MED), one outcome (DV), and three $X$ variables (IV1, IV2, and IV3). To estimate the effects of IV1, IV2, and IV3 on DV directly and indirectly through MED:

process(data set, y = "DV", x = "IV1", m = "MED", 
        cov = c("IV2", "IV3"), model = 4, seed = 10)
process(data set, y = "DV", x = "IV2", m = "MED", 
        cov = c("IV1", "IV3"), model = 4, seed = 10)
process(data set, y = "DV", x = "IV3", m = "MED", 
        cov = c("IV1", "IV2"), model = 4, seed = 10)

As with any other predictors in a regression model, one should be aware of the multicollinearity problem. The stronger the associations between the variables in the model, the greater the potential of such a problem.

The danger in including multiple $Xs$ in a mediation model, as when including statistical controls, is the possibility that highly correlated $Xs$ will cancel out each others’ effects. This is a standard concern in linear models involving correlated predictors.

As a result, one could find that when included as the sole $X$, each variable exerts a direct and/or indirect effect on $Y$ through $M$, but when considered together, none of them appears to have any effect at all.

Multiple Y variables

In the same way it is possible to include several $Y$ variables, when investigators are interested in the direct and indirect effects of some putative causal antecedent on several different outcome variables.

For example, some researchers estimated the direct and indirect effects of neuroticism (X) on anxiety symptoms (Y1), depression symptoms (Y2), and sleep difficulties (Y3), with worry (M1) and rumination (M2) specified as mediators of neuroticism’s effect.

A close examination of this model shows that it is really just $k$ simple mediation models with a common $X$ and $M$.

PROCESS can be used to estimate the paths in a model such as in Figure 4.6 by running $k$ process functions, substituting one $Y$ variable for another at each run and seeding the random number generator with a common seed for bootstrapping.

More than one mediator

Learning outcomes

Understand the difference between parallel and serial multiple mediation models
Learn to use process to fit parallel and serial multiple mediation models
Learn to read the output of the model

Parallel and Serial Mediator models

Models with more than one mediator allow a variable’s effect to be transmitted to another through multiple mediators (mechanisms) simultaneously.

Two forms of multiple mediator models are introduced:

Mediation models where mediators operate in parallel, without affecting one another
Mediation models where mediators operate in serial, with mediators linked together in a causal chain

The simple mediation model is frequently used by researchers, but it often oversimplifies the kinds of phenomena that researchers study. There are several reasons to include more than one mediator:

By including more than one mediator in a model simultaneously, it is possible to pit theories against each other by statistically comparing indirect effects that represent different theoretical mechanisms
A mediator could be related to an outcome due to epiphenomenality and not because is causally influencing the outcome
A specific causal relation may be itself mediated (i.e. a direct or indirect effect)

The parallel multiple mediator model

In a parallel multiple mediator model, the antecedent variable $X$ is modeled as influencing consequent $Y$ directly as well as indirectly through two or more mediators, with the condition that no mediator causally influences another.

Comparing theories

For example, some researchers simultaneously examined three potential mediators of the effectiveness of a 30-session, 1-year experimental weight loss intervention among middle-aged women:

Emotional eating (e.g., eating to placate a negative mood)
Restrained eating (e.g., not eating after feeling full)
Perceived barriers to exercise

They found that relative to women randomly assigned to a control weight-loss program, those who experienced the experimental method did lose more weight over the year.

The mediation analysis suggested that:

The intervention reduced frequency of emotional eating and increased restraint while eating, which in turn resulted in greater weight loss.
Independent of these two mechanisms, there was no evidence that the intervention influenced weight loss by changing perceived barriers to exercise.

Epiphenomenality

Establishing an indirect effect of $X$ on $Y$ through $M$ through a simple mediation analysis does not imply that $M$ is the only mechanism at work linking $X$ to $Y$.

The indirect effect could also be due to an epiphenomenal association between the $M$ in a simple mediation model and the “true” mediator or mediators causally between $X$ and $Y$.

For instance, in the Presumed Media Influence Study (pmi data set) any variable correlated with the mediator presumed media influence ($M$) and also affected by the experimental manipulation of article location ($X$) could be the actual mediator transmitting the effect of location on intentions to buy sugar.

The authors recognized this and so had the foresight to measure a variable related to another possible mechanism: perceived issue importance (import variable).

Perhaps people infer, from where an article is published in the newspaper, the extent to which the issue is something worthy of attention, and thereby potentially something one should think about and perhaps act upon.

So they measured people’s beliefs about how important the potential sugar shortage was (import) using two questions that were aggregated to form a perceived importance measure.

Issue importance (import) is actually correlated (r = 0.282; p < 0.01) with presumed media influence (pmi), so the epiphenomenal explanation for the pmi mediator is plausible.

To clarify the role of the two possible mediators, we can fit a parallel multiple mediator model with two mediators, by using the same code used for the simple mediation model, just adding more than one variable following the parameter m.

Another difference is that we add the parameter contrast = 1 to conduct a test of differences between specific indirect effects with bootstrapping.

Also, the results include the total indirect effect (the indirect effect summed across all mediators), and the indirect effect of each mediator.

process(pmi_data, y = "reaction", x = "cond", m = c("import", "pmi"), 
        model = 4, total = 1, contrast = 1, effsize = 1, progress = 0,
        seed = 10)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 4       
    Y : reaction
    X : cond    
   M1 : import  
   M2 : pmi     

Sample size: 123

Custom seed: 10


*********************************************************************** 
Outcome Variable: import

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.1809    0.0327    2.9411    4.0942    1.0000  121.0000    0.0452

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.9077    0.2127   18.3704    0.0000    3.4866    4.3288
cond        0.6268    0.3098    2.0234    0.0452    0.0135    1.2401

*********************************************************************** 
Outcome Variable: pmi

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.1808    0.0327    1.7026    4.0878    1.0000  121.0000    0.0454

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    5.3769    0.1618   33.2222    0.0000    5.0565    5.6973
cond        0.4765    0.2357    2.0218    0.0454    0.0099    0.9431

*********************************************************************** 
Outcome Variable: reaction

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.5702    0.3251    1.6628   19.1118    3.0000  119.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant   -0.1498    0.5298   -0.2828    0.7778   -1.1989    0.8993
cond        0.1034    0.2391    0.4324    0.6662   -0.3701    0.5768
import      0.3244    0.0707    4.5857    0.0000    0.1843    0.4645
pmi         0.3965    0.0930    4.2645    0.0000    0.2124    0.5806

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: reaction

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.1603    0.0257    2.3610    3.1897    1.0000  121.0000    0.0766

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.2500    0.1906   17.0525    0.0000    2.8727    3.6273
cond        0.4957    0.2775    1.7860    0.0766   -0.0538    1.0452

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI      c_ps
     0.4957    0.2775    1.7860    0.0766   -0.0538    1.0452    0.3197

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI     c'_ps
     0.1034    0.2391    0.4324    0.6662   -0.3701    0.5768    0.0667

Indirect effect(s) of X on Y:
          Effect    BootSE  BootLLCI  BootULCI
TOTAL     0.3923    0.1636    0.0896    0.7329
import    0.2033    0.1163    0.0024    0.4684
pmi       0.1890    0.1042    0.0066    0.4175
(C1)      0.0144    0.1483   -0.2681    0.3141

Partially standardized indirect effect(s) of X on Y:
          Effect    BootSE  BootLLCI  BootULCI
TOTAL     0.2530    0.1032    0.0581    0.4652
import    0.1312    0.0740    0.0016    0.2975
pmi       0.1219    0.0667    0.0045    0.2646
(C1)      0.0093    0.0960   -0.1758    0.2001

Specific indirect effect contrast definition(s):
(C1)     import   minus    pmi

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 5000

Interpretation

The interpretation of the total and direct effect do not change, while indirect effect includes the indirect effects of each mediator.

Also, if we use contrast = 1, there is a test of statistically significance of the difference between the indirect effects. In this case the output includes also the description of variables tested.

In this case, the tested difference is import - pmi. The difference is positive (0.0144) meaning that, with reference to the internal page condition, those assigned to the front-page condition have stronger intentions to buy sugar (by 0.0144 units) as result of the mediation of import compared to the effect of the mediation of pmi. The difference is not statistically significant, though.

In the output you can also read the standardized effects, if you require it. Notice that in this case only the partial standardized effect is calculated for the indirect effect. Indeed, the $X$ of this data set is dichotomous, and completely standardized effects are not easy to interpret in this case. Instead, process can produce partially and completely standardized measures of total, direct, and indirect effects in mediation models when $X$ is a numerical continuum.

The Serial Multiple Mediator model

A distinguishing feature of the parallel mediator model is the assumption that no mediator causally influences another. In practice, mediators will be correlated, but this model specifies that they are not causally so.

The serial multiple mediator model investigates a mechanism in which $X$ causes $M_1$, which in turn causes $M_2$, and so forth, concluding with $Y$ as the final consequent.

Example

Some researchers compared the anxiety of residents living in a low-income housing development located in a middle-class neighborhood to a matched group who applied to live in the housing development but remained on the waiting list.

They argued that life in a middle-class housing development ($X$) would reduce ($M_1$) exposure to neighborhood disorder (e.g., crime, homeless people, drugs and drug use, violence) relative to those living elsewhere, which would in turn reduce ($M_2$) the number of stressful life experiences, which in turn would translate into ($Y$) fewer anxiety symptoms.

Considering the Presumed Media Influence study data set, it is plausible that the perceived issue importance ($M_1$) influences people’s beliefs about how others are going to be influenced by the media $M_2$: “This article will be published on the front page, so it must be important, and people will take notice of such an important matter and act by buying sugar to stock up. Therefore, I should go out and buy sugar before supplies are all gone!”

Estimation with PROCESS

To fit a serial multiple mediator model:

Use model = 6 instead of model = 4.
Pay attention to the order of the variables listed following the parameter m: unlike in model 4 where order is ignored, when model 6 is specified, the order matters!. The order of the variables in the list of mediators is taken literally as the causal sequence, with the first mediator variable in the list causally prior to the second in the list, and so forth.
You can add contrast = 1 if you also want to test differences between specific indirect effects.

process(pmi_data, y = "reaction", x = "cond", m = c("import", "pmi"), 
        model = 6, total = 1, contrast = 1, boot = 10000, progress = 0,
        seed = 031216)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 6       
    Y : reaction
    X : cond    
   M1 : import  
   M2 : pmi     

Sample size: 123

Custom seed: 31216


*********************************************************************** 
Outcome Variable: import

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.1809    0.0327    2.9411    4.0942    1.0000  121.0000    0.0452

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.9077    0.2127   18.3704    0.0000    3.4866    4.3288
cond        0.6268    0.3098    2.0234    0.0452    0.0135    1.2401

*********************************************************************** 
Outcome Variable: pmi

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.3114    0.0970    1.6027    6.4428    2.0000  120.0000    0.0022

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    4.6104    0.3057   15.0836    0.0000    4.0053    5.2156
cond        0.3536    0.2325    1.5207    0.1310   -0.1068    0.8139
import      0.1961    0.0671    2.9228    0.0041    0.0633    0.3290

*********************************************************************** 
Outcome Variable: reaction

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.5702    0.3251    1.6628   19.1118    3.0000  119.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant   -0.1498    0.5298   -0.2828    0.7778   -1.1989    0.8993
cond        0.1034    0.2391    0.4324    0.6662   -0.3701    0.5768
import      0.3244    0.0707    4.5857    0.0000    0.1843    0.4645
pmi         0.3965    0.0930    4.2645    0.0000    0.2124    0.5806

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: reaction

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.1603    0.0257    2.3610    3.1897    1.0000  121.0000    0.0766

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.2500    0.1906   17.0525    0.0000    2.8727    3.6273
cond        0.4957    0.2775    1.7860    0.0766   -0.0538    1.0452

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Total effect of X on Y:
     effect        se         t         p      LLCI      ULCI
     0.4957    0.2775    1.7860    0.0766   -0.0538    1.0452

Direct effect of X on Y:
     effect        se         t         p      LLCI      ULCI
     0.1034    0.2391    0.4324    0.6662   -0.3701    0.5768

Indirect effect(s) of X on Y:
         Effect    BootSE  BootLLCI  BootULCI
TOTAL    0.3923    0.1660    0.0876    0.7389
Ind1     0.2033    0.1146    0.0049    0.4519
Ind2     0.1402    0.1005   -0.0456    0.3549
Ind3     0.0488    0.0353   -0.0002    0.1354
(C1)     0.0631    0.1563   -0.2375    0.3788
(C2)     0.1546    0.0987   -0.0012    0.3787
(C3)     0.0915    0.1082   -0.1231    0.3164

Specific indirect effect contrast definition(s):
(C1)     Ind1   minus    Ind2
(C2)     Ind1   minus    Ind3
(C3)     Ind2   minus    Ind3

Indirect effect key:
Ind1 cond    ->    import    ->    reaction              
Ind2 cond    ->    pmi    ->    reaction              
Ind3 cond    ->    import    ->    pmi    ->    reaction

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 10000

Interpretation

Regarding the interpretation:

The first part of the output is always the same, including different regression models for each part of the overall mediation model.
Also the table on total effect can be interpreted as usual (total effect of $X$ on $Y$ without considering anything else than $X$ and $Y$).
The direct effect is always the same (effect of $X$ on $Y$ holding constant the other variables and mediators).

Instead, the table on Indirect effect(s) of $X$ on $Y$ is a bit different. For this model with two mediators in serial, there are three indirect effects (Ind1, Ind2, Ind3). Their meaning is explained by the table Indirect effect key: each one represents a specific path through the variables. The different indirect effect can be interpreted with reference to this legend.

In this case: Ind1 is the indirect effect of article location ($X$) on reactions ($Y$) through perceived importance (import) of the sugar shortage. It is positive and significant.
- It means that those told the article would appear on the front page (cond) perceived the sugar shortage as more important (more, because the coefficients of cond is positive (0.6268) in the regression table, where “Outcome Variable” is import).
- Also, this increased importance was associated with an increased intention to buy sugar (because the coefficient is positive, equal to 0.3244, in the regression table with Outcome Variable: reaction).

This indirect effect is specific to import and independent from the other mediators (in this case pmi, presumed media influence).

The second specific indirect effect (Ind2) is the indirect effect of article location on reactions through only presumed media influence, and is not statistically significant.
The third indirect effect (Ind3) is the specific indirect effect of article location on reactions through perceived importance (import) and presumed media influence (pmi) in serial and is not statistically significant.

The table also include the statistical significance of the differences between the indirect effects (C1, C2, and C3) since we have used contrast = 1 in the code. To interpret these differences read the legend Specific indirect effect contrast definition(s).

With more than three mediators, a model can be a blend of parallel and serial mediation processes. With process you can fit also this kind of models: see APPENDIX A of the book, pp. 584-612 for a visual presentation of the pre-programmed models. Model 80, 81, and 82 are blend of parallel and serial mediation processes.

Mediation with a multicategorical antecedent

Learning Outcomes

Understand the characteristic of a multicategorical variable (and the difference between dichotomous and continuous variables)
Learn to use progress to fit a mediation model with multicategorical X variable
Learn to interpret the output of the model

Mediation with a multicategorical antecedent

As far as now we have discussed models with a dichotomous or continuous $X$. The interpretation of these models is, on the most basic level, based on a 1-unit change in the $X$.

However, there are also cases where $X$ is a multicategorical variable. Multicategorical variables are variables that can take on three or more categorical values. The values of a categorical variable are also called modalities or categories.

For instance, we may have a multicategorical $X$ variable with three modalities corresponding to three experimental conditions: information in mainstream newspaper (cond=0), information in alternative newspaper (cond=1), and information on social media (cond=2).

The interpretation, in this case, is a bit different from that of dichotomous or continuous variables.

Before considering multicategorical variables, let’s recap the interpretation of models with dichotomous and continuous $X$.

Recap

Interpreting models with a continuous X

Continuous variables are quantitative variables, resulting from a measurement process, and can take on several values.

Considering, for example, the estress data set, we found that the indirect effect of the economic stress condition ($X$) on the intention to withdraw ($Y$) through the mediator of depressed affect ($M$), was 0.1330.

How can we interpret this coefficient?

Two businessman who differ by one unit in their economic stress are estimated to differ by 0.133 unit in their reported intentions to withdraw from their business (…as a result of the tendency for those under relatively more economic stress to feel more depressed affect, which in turn translates into greater withdrawal intentions). Or similarly: a one-unit increase in economic stress is associated with an increase of 0.133 units, on average, in withdraw intention.

Interpreting models with a dichotomous X

A dichotomous $X$ variable can take on just two values, which are usually coded as 0 and 1.

For instance, in the pmi data set, there are only two (experimental) condition (cond): people were said the article was going to be published in the middle of an economic supplement of the newspaper (cond = 0), or on the front-page (cond = 1).

Using the variable “perceived media influence” (pmi) as a mediator, we found that the indirect effect of $X$ on $Y$ was equal to 0.2413 (the indirect effect quantifies the effect of $X$ on $Y$ through a mediator $M$). How do you interpret this coefficient?

Relative to those assigned to the interior page condition (cond=0), those who read an article they were told was to be published in the front page of the newspaper (cond=1) were, on average, 0.241 units higher in their likelihood of buying sugar (…as a result of the effect of the location of the article on presumed media influence which, in turn, putatively affected people’s intentions to buy sugar).

Notice that with dichotomous variables, we don’t interpret the results with reference to a one unit increase in $X$, but relative to those assigned to the condition cond=0.

The group coded with 0 is the reference group and the coefficients in the model are interpreted with reference to this group. The coefficients express differences with the reference group.

We said: Relative to those assigned to the interior page condition (cond=0) those who read an article they were told was to be published in the front page of the newspaper (cond=1) were, on average, 0.241 units higher in their likelihood of buying sugar.

That is to say: $Y_{COND_1} = Y_{COND_0} + 0.241$

Let’s make another example, with a dichotomous variable where MALE=0 and FEMALE=1, and a simple regression model as follows:

\[Y = 0.5X + e\]

Let’s say $Y$ is “intuitiveness”.

How do we interpret this equation?

In abstract terms, we can interpret $Y = 0.5X + e$ by saying that for a one-unit increase in X, there is half a unit (0.5) increase, on average, in $Y$ (plus some error $e$).

But since $X$ is dichotomous, a one-unit increase in a dichotomous variable means switching from one modality to the other, in this case from MALES (0) to FEMALES (1).

In other terms, we can interpret the coefficients of the regression model with reference to MALE=0, and say: compared to MALES, we would expect, on average, FEMALE to be half a point more intuitive.

Mediation with a multicategorical antecedent

Similarly to dichotomous variables, we interpret a multicategorial variable in relation to a reference group.

Similarly to how we interpret the results from a dichotomous $X$ relative to the category coded as $0$ (e.g.: “relative to those assigned to the interior page condition (cond=0), those who read an article they were told was to be published in the front page of the newspaper (cond=1)…”), we interpret a multicategorical $X$ relatively to one of its modality used as a reference group.

Thus, when working with a multicategorical variable, we compare two or more group of cases, to the one we use as reference.

We can select the group we want as the reference group. Often, we choose the reference group coherently with our research focus, so as to obtain meaningful results.

Example

To make an example of model with multicategorical $X$ we use the protest data set.

In this study, 129 participants, all of whom were female, received a written account of the fate of a female attorney (Catherine) who lost a promotion to a less qualified male as a result of discriminatory actions of the senior partners.

After reading this story, which was the same in all conditions, the participants were given a description of how Catherine responded to this sexual discrimination. Those randomly assigned to three conditions

no protest condition (coded protest=0) learned that though very disappointed by the decision, Catherine decided not to take any action against this discrimination and continued working at the firm.
Those assigned to the individual protest condition (coded protest=1) were told that Catherine approached the partners to protest the decision, while giving various explanations as to why the decision was unfair that revolved around her, such as that she was more qualified for the job, and that it would hurt her career.
But those randomly assigned to a collective protest condition (protest=2) were told Catherine protested and framed her argument around how the firm has treated women in the past, that women are just as qualified as men, and that they should be treated equally.

Following this manipulation of Catherine’s response to the discrimination, the participants responded to a set of questions measuring how appropriate they perceived her response was for this situation. Higher scores on this variable (respappr in the data file) reflect a stronger perception of appropriateness of the response.

Finally, the participants were asked to respond to six questions evaluating Catherine. Their responses were aggregated into a measure of liking, such that participants with higher scores liked her relatively more (liking in the data file).

protest <- read.csv("data/protest.csv")

To tell the process function that $X$ is multicategorical (the $X$ variable can contain up to nine modalities), use the mcx option (for multicategorical $X$), with an argument following an equals sign telling how to code the groups. To perform our analysis with a standard coding procedure, we use mcx=1 (“indicator coding”).

process(protest, y = "liking", x = "protest", m = "respappr",
        mcx = 1, total = 1, model = 4, progress = 0,
        seed = 30217)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 4       
    Y : liking  
    X : protest 
    M : respappr

Sample size: 129

Custom seed: 30217

Coding of categorical X variable for analysis: 
    protest        X1        X2
     0.0000    0.0000    0.0000
     1.0000    1.0000    0.0000
     2.0000    0.0000    1.0000

*********************************************************************** 
Outcome Variable: respappr

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.5106    0.2607    1.3649   22.2190    2.0000  126.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.8841    0.1825   21.2881    0.0000    3.5231    4.2452
X1          1.2612    0.2550    4.9456    0.0000    0.7565    1.7659
X2          1.6103    0.2522    6.3842    0.0000    1.1111    2.1095

*********************************************************************** 
Outcome Variable: liking

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.5031    0.2531    0.8427   14.1225    3.0000  125.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.7103    0.3074   12.0711    0.0000    3.1020    4.3187
X1         -0.0037    0.2190   -0.0169    0.9865   -0.4371    0.4297
X2         -0.2202    0.2280   -0.9658    0.3360   -0.6715    0.2310
respappr    0.4119    0.0700    5.8844    0.0000    0.2734    0.5504

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: liking

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.2151    0.0463    1.0676    3.0552    2.0000  126.0000    0.0506

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    5.3102    0.1614   32.9083    0.0000    4.9909    5.6296
X1          0.5158    0.2255    2.2870    0.0239    0.0695    0.9621
X2          0.4431    0.2231    1.9863    0.0492    0.0016    0.8845

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Relative total effects of X on Y:
      effect        se         t         p      LLCI      ULCI
X1    0.5158    0.2255    2.2870    0.0239    0.0695    0.9621
X2    0.4431    0.2231    1.9863    0.0492    0.0016    0.8845

Omnibus test of total effect of X on Y:
    R2-chng         F       df1       df2         p
     0.0463    3.0552    2.0000  126.0000    0.0506
----------

Relative direct effects of X on Y:
      effect        se         t         p      LLCI      ULCI
X1   -0.0037    0.2190   -0.0169    0.9865   -0.4371    0.4297
X2   -0.2202    0.2280   -0.9658    0.3360   -0.6715    0.2310

Omnibus test of direct effect of X on Y:
    R2-chng         F       df1       df2         p
     0.0087    0.7286    2.0000  125.0000    0.4846

----------

Relative indirect effects of X on Y:

protest    ->    respappr    ->    liking

      Effect    BootSE  BootLLCI  BootULCI
X1    0.5195    0.1518    0.2471    0.8442
X2    0.6633    0.1677    0.3680    1.0141

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 5000

The group coded with protest=0 (no protest group), is specified as the reference group. With indicator coding (mcx=1), the process function uses the group with the numerically smallest code in the variable specified as X as the reference group.

If you would rather have a different group as the reference group, you have to recode $X$ so that the desired reference group is coded with the numerically smallest value.

Process automatically transforms the multicategorical variable with three modalities, in two variables $X_1$ (protest=1, individual protest) and $X_2$ (protest=2, collective protest), which coefficients can be interpreted with reference to the reference group (protest=0, no protest).

The output reports a legend with the coding of the multicategorical $X$ variable used for the analysis.

process(protest, y = "liking", x = "protest", m = "respappr",
        mcx = 1, total = 1, model = 4, progress = 0,
        seed = 30217)


********************* PROCESS for R Version 4.0.1 ********************* 
 
           Written by Andrew F. Hayes, Ph.D.  www.afhayes.com              
   Documentation available in Hayes (2022). www.guilford.com/p/hayes3   
 
*********************************************************************** 
                
Model : 4       
    Y : liking  
    X : protest 
    M : respappr

Sample size: 129

Custom seed: 30217

Coding of categorical X variable for analysis: 
    protest        X1        X2
     0.0000    0.0000    0.0000
     1.0000    1.0000    0.0000
     2.0000    0.0000    1.0000

*********************************************************************** 
Outcome Variable: respappr

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.5106    0.2607    1.3649   22.2190    2.0000  126.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.8841    0.1825   21.2881    0.0000    3.5231    4.2452
X1          1.2612    0.2550    4.9456    0.0000    0.7565    1.7659
X2          1.6103    0.2522    6.3842    0.0000    1.1111    2.1095

*********************************************************************** 
Outcome Variable: liking

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.5031    0.2531    0.8427   14.1225    3.0000  125.0000    0.0000

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    3.7103    0.3074   12.0711    0.0000    3.1020    4.3187
X1         -0.0037    0.2190   -0.0169    0.9865   -0.4371    0.4297
X2         -0.2202    0.2280   -0.9658    0.3360   -0.6715    0.2310
respappr    0.4119    0.0700    5.8844    0.0000    0.2734    0.5504

************************ TOTAL EFFECT MODEL *************************** 
Outcome Variable: liking

Model Summary: 
          R      R-sq       MSE         F       df1       df2         p
     0.2151    0.0463    1.0676    3.0552    2.0000  126.0000    0.0506

Model: 
             coeff        se         t         p      LLCI      ULCI
constant    5.3102    0.1614   32.9083    0.0000    4.9909    5.6296
X1          0.5158    0.2255    2.2870    0.0239    0.0695    0.9621
X2          0.4431    0.2231    1.9863    0.0492    0.0016    0.8845

*********************************************************************** 
Bootstrapping in progress. Please wait.

************ TOTAL, DIRECT, AND INDIRECT EFFECTS OF X ON Y ************

Relative total effects of X on Y:
      effect        se         t         p      LLCI      ULCI
X1    0.5158    0.2255    2.2870    0.0239    0.0695    0.9621
X2    0.4431    0.2231    1.9863    0.0492    0.0016    0.8845

Omnibus test of total effect of X on Y:
    R2-chng         F       df1       df2         p
     0.0463    3.0552    2.0000  126.0000    0.0506
----------

Relative direct effects of X on Y:
      effect        se         t         p      LLCI      ULCI
X1   -0.0037    0.2190   -0.0169    0.9865   -0.4371    0.4297
X2   -0.2202    0.2280   -0.9658    0.3360   -0.6715    0.2310

Omnibus test of direct effect of X on Y:
    R2-chng         F       df1       df2         p
     0.0087    0.7286    2.0000  125.0000    0.4846

----------

Relative indirect effects of X on Y:

protest    ->    respappr    ->    liking

      Effect    BootSE  BootLLCI  BootULCI
X1    0.5195    0.1518    0.2471    0.8442
X2    0.6633    0.1677    0.3680    1.0141

******************** ANALYSIS NOTES AND ERRORS ************************ 

Level of confidence for all confidence intervals in output: 95

Number of bootstraps for percentile bootstrap confidence intervals: 5000

Interpretation of the Total Effect

Let’s start focusing on the total effect model, and let’s write the equation based on the coefficients.

\[Y_{liking} = 5.3102 + 0.5158(X_1) + 0.4431(X_2)\]

The total effect model is the total effect of $X$ on $Y$ without taking into account any other variable. With a multicategorical $X$ it results in the average value of $Y$ for each modality of the multicategorical $X$.

tapply(protest$liking, protest$protest, mean)

       0        1        2 
5.310244 5.826047 5.753333

In particular, the intercept (5.3102) is the average value fo $Y$ when all the $Xs$ are equal to zero. But when all the $Xs$ are equal to zero we have the average $Y$ of the reference group (check the legend with the coding).

We also said we interpret the other coefficients with reference to the reference group, and indeed the other coefficients are express with reference to the intercept. Remember that when a modality of $X$ is “on” it is equal to 1, while the others are equal to 0 (check the legend with the coding again).

Total model: \[Y_{liking} = 5.3102 + 0.5158(X_1) + 0.4431(X_2)\]

\[X_0 = 5.3102 + 0.5158*0 + 0.4431*0 = 5.3102\] \[X_1 = 5.3102 + 0.5158*1 + 0.4431*0 = 5.8260\] \[X_1= 5.3102 + 0.5158*0 + 0.4431*1 = 5.7533\]

From the Omnibus test of total effect of X on Y we can deduce that the total effect of $X$ on $Y$ (thus irrespective of whether that effect is direct or indirect) is (almost) significant:

\[R^2 = 0.046; F(2; 126) = 3.055; p = 0.051\]

The results suggest that her response to the discrimination ($X$) did influence how she was perceived ($Y$) (n.r.: strictly speaking, the p-value is just above the 0.05 threshold. I am following the Hayes interpretation here).

Interpretation of the Direct Effect

The direct effect is the effect of $X$ on $Y$ holding the mediator (and possible other covariates) constant, where “constant” means that the mediator is held at its average value ($M_{respappr} = 4.866279$).

mean(protest$respappr)

[1] 4.866279

The resulting estimates of $Y$ are also called adjusted means, that is estimates of $Y$ for each group when the covariates are set to their sample mean. In a mediation model, the mediator mathematically functions like a covariate, with $X$’s effect on $Y$ estimated after controlling for $M$.

The equation coefficients for the direct effect model can be found the model for the $Y$ outcome variable, in this case liking: \[3.7103 − 0.0047(X_1) − 0.2202(X_2) + 0.4119(M)\]

Plugging the average value of $M$ in the equation: \[3.7103 − 0.0047(X_1) − 0.2202(X_2) + 0.4119(4.866279)\]

To derive the direct effect of each modality of the $X$ variable, we proceed in the same way as before, substituting the variables with 0 and 1 (NP = no protest, X=0; IP = individual protest, X=1, CP = collective protest, X=2):

\[Y_{NP} = 3.71 − 0.005(0) - 0.22(0) + 0.412(4.866) = 5.715\] \[Y_{IP} = 3.71 − 0.005(1) − 0.22(0) + 0.412(4.866) = 5.710\] \[Y_{CP} = 3.71 − 0.005(0) − 0.22(1) + 0.412(4.866) = 5.495\]

There is no need to calculate these values by hand if you just want the relative direct effects of $X$ on $Y$ (bottom part of the the output). They express the direct effect in terms of differences with the reference group (DI = direct effect), also reporting the statistical significance of the difference (not significant, in this case):

$X1_{DI} = X1_{DI} - X0_{DI} = 5.71002 - 5.71472 = -0.0047$ $X2_{DI} = X1_{DI} - X0_{DI} = 5.49452 - 5.71472 = -0.2202$

The Omnibus test of direct effect of $X$ on $Y$ calculates the statistical significance of the overall direct effect of $X$ on $Y$: different types of protest do not directly impact the liking of Catherine, when we control for out the effect of the mediator (by keeping it constant).

This equation also provides an estimate of respappr ($b$), the effect of the mediator (perceived response appropriateness) on liking of Catherine among participants told the same thing about her behavior. In other words, this coefficient is the effect of $M$ on $Y$ holding constant $X$.

Among two people told the same thing about Catherine’s response, the person who perceived her behavior as one unit higher in appropriateness liked her 0.4119 units more. The more appropriate Catherine’s behavior was perceived as being for the situation, the more she was liked.

Interpretation of the Indirect Effect

Finally, in the last part of the output there is the estimated effect of $X$ on $Y$ through $M$ (“Relative indirect effects of X on Y”).

The coefficient of $X1$ (individual protest) and $X2$ (collective protest) can be interpreted in relation to the reference group (no protest):

\[X1 = X0 + 0.5195\] \[X1 = X0 + 0.6633\]

It means that relative to not protesting at all, protesting with an individualistic focus enhanced the likability of Catherine by 0.520 units, because individually protesting was seen as more appropriate than not protesting, and this translated into a more positive evaluation.

Likewise, collectively protesting enhanced the likability of Catherine by 0.663 units relative to not protesting, as collectively protesting was seen as more appropriate than not protesting, and this translated into a more positive evaluation of her.

Reporting results

You can report the results of the model by using a table like the following (D1 and D2 are the variables we called X1 and X2):

Alternative coding method

Process implement an alternative coding method to those just seen.

Using the same example, it may be worthwhile to examine the effect of protesting, whether collectively or individually, relative to not protesting at all, as well the effect of protesting individually relative to collectively.

There is a way of representing the three groups with a coding system that provides precisely this information, it also goes by the name Helmert coding. In the Hayes book, chapter 2 on Multicategorical Antecedents, you can find more information about that.

Final observations

Collinearity

In a multiple mediator model, a specific indirect effect quantifies the influence of $X$ on $Y$ through a particular mediator while holding constant other mediators.

This is useful when the mediators are somewhat correlated. But when the correlation between mediators becomes too large, the usual problems with collinearity in regression models begin to take hold and muddle the results.

Collinearity between predictors increases sampling variance in estimates of their partial relationships with an outcome, and such sampling variance will propagate throughout the estimates of indirect effects and increase the width of confidence intervals (or the p-values).

Including correlated mediators in the model allows you to disentangle spurious and epiphenomenal association from potential causal associations, but this comes at the cost of greater sampling variance and reduced power for tests of indirect effects especially when the sample size is small. Reduced power means that it is harder to detect significant effects, i.e. results can wrongly show non-significant effects.

Total indirect effect

In a parallel multiple mediator model with several mediators, it is possible that the total indirect effect is non-significant, even if one or more of the partial indirect effects are significant, for instance:

In a model with several mediators, only one is actually transmitting $X$$’s effect on $Y$. The inclusion of a bunch of potential mediators in the model reduces power for tests of indirect effects (see previous slides)
Since the total indirect effect is a sum of all specific indirect effects, if those indirect effects differ in sign but are of similar magnitude, their sum very well may be zero or nearly so.

Escapes from these apparently paradoxical inconsistencies:

Acknowledge the uncertainty inherent in our estimates as communicated through confidence intervals. The fact that a confidence interval for an effect contains zero does not mean the effect is zero. It means that zero is in the realm of possibility, or that one cannot say with certainty what the direction of the effect is.
Discount the relevance of the total indirect effect when interpreting the results. In some situations, the total indirect effect will have little substantive or theoretical value.

Summary

In this unit we learned:

What is the Simple Mediation Model.
How to estimate a Simple Mediation Model using R process software.
How to interpret the output of the estimation of the Simple Mediation Model using the process software, using dichotomous and continuous $X$ variables.
How to rule out alternative explanations with reference to:
1. Confounding variables;
2. Alternative casual explanations.
The meaning of effect size and how to measure it through partial and complete standardized effects.
How to fit models with multiple $X$ and $Y$ variables.
The difference between parallel and serial multiple mediation models.
How to use progress to fit parallel and serial multiple mediation models and how to interpret them.
The characteristic of a multicategorical variable (and the difference between dichotomous and continuous variables).
How to use R PROGRESS to fit a mediation model with multicategorical X variable, and how to interpret the output of the model.