Cross Temporal Meta-Analysis Ignores the Nesting of Studies within Time

A Re-Analysis of Visontay, Mewton, Sunderland, and Slade (2020)

Visontay et al. (2020) CTMA

Recently, a study by Visontay, Mewton, Sunderland, and Slade (2020) published in Drug and Alcohol Dependence used cross-temporal meta-analysis (CTMA) to estimate changes in young adults’ harmful alcohol consumption between 1989 and 2015. Although compelling, the results of this study rest upon the application of CTMA, which has been long criticized for its reliance on a variety of tenuous assumptions (see Donnellan, Trzesniewski, & Robins, 2009; Trzesniewski, Donnellan, & Robins, 2008). For example, CTMA is based on ecological correlations (see Rudolph & Zacher, 2017), a long-known statistical misspecification (Robinson, 1950). More recently Rudolph, Costanza, Wright, and Zacher (2019) demonstrated through Monte Carlo simulations that CTMA can misestimate population effect sizes by as much as eight times their true value.

Nesting and Non-Independence in CTMA

Beyond these established issues, another concern largely unspoken of in the CTMA literature is the non-independence of studies over time (cf. Rudolph et al., 2019). Specifically, because CTMA collects studies from multiple time points and treats the year of data collection as a predictor (e.g., of young adults’ harmful alcohol consumption; Visontay et al., 2020), there is a possibility that multiple studies will be “nested” within any given year of data collection. For example, such nesting would be present if data from “Study 1” and “Study 2” of the same outcome were both collected in the year 2020 and were both included in a CTMA model. This matters for a variety of reasons. For example, because time is treated as a substantive variable in CTMA models, the statistical significance of time-based parameters is assumed to reflect changing attitudes, values, and behaviors (e.g., young adults’ harmful alcohol consumption). If such parameters are sensitive to the nesting of studies within time, CTMA models could reach incorrect conclusions about the extent to which such changes have occurred.

Moreover, statistically speaking, this nesting results in non-independence, which is an issue because CTMA is based on a weighted least squares (WLS) model. Like other linear models, WLS models make the assumption that each modeled unit (i.e., year of data collection) provides a unique piece of statistical information, unrelated to the information provided by other modeled units in the sample. The non-independence resulting from nesting violates this assumption, often leading to downwardly biased (i.e., overly liberal) estimates of the standard errors associated with parameter estimates, leading to potentially incorrect statistical conclusions. The assumption of non-independence is violated when multiple studies representing any given year of data collection are present, but not appropriately accounted for in one’s CTMA model. For example, if we are to believe that there are broad changes over time that can be captured by CTMA, it is likely that studies coming from the same year of data collection are more alike than studies coming from different years of data collection. To demonstrate why this is an issue by means of an example, a re-analysis of the data presented in Table 1 of Visontay et al. (2020) is warranted.

Re-Analysis of Visontay et al. (2020)

Data and code to reproduce these results is available via the Open Science Framework, which are also outlined below. One way to assess the presence of nesting in CTMA data, is to consider intraclass correlations (\(ICC_1\)), which index the degree of between-unit variability in an outcome. Thus, 1.00 – \(ICC_1\) indexes the degree of within-unit variability in said outcome. As an example, if \(ICC_1\) = 0.50, then 1.00 – 0.50 = .50, suggesting that .50 × 100% or 50% of the variance in the outcomes occurs between units, whereas 50% occurs within units. “Units” in this sense can be variously defined, for example, as year of data collection. Deriving the \(ICC_1\) value from Visontay et al. (2020) Table 1, treating year of data collection as a grouping variable, suggests that \(ICC_1\) = .43, which means that 1.00 – . 43 × 100% = 57.00% of the variance observed in young adults’ harmful alcohol consumption between 1989 and 2015 occurs within year of data collection. This means that there is an appreciable nesting of studies-within-year of data collection that is being ignored by not accounting for the dependence between each study and its year of data collection in the CTMA model.

There are multiple types of statistical models that can account for non-independence and higher-level grouping structures, for example, mixed effects models (e.g., Snijders & Bosker, 1999). One simple to implement approach to (at least partially) account for this nesting in a CTMA framework is to use robust variance estimation (RVE; Pustejovsky & Tipton, 2018) to specify cluster robust standard errors for the WLS model. To some extent, cluster robust standard errors can help to account for the nesting of studies within year of data collection when conducting CTMA. This approach yields equivalent parameter estimates to the standard WLS CTMA model but adds adjustments to these parameters’ standard errors to account, in part, for the non-independence associated with the nesting of studies within year of data collection.1

Re-analyzing the data from Table 1 of Visontay et al. (2020) with cluster robust standard errors, the parameter estimate for year of data collection (\(B\) = -.13, \(SE_{robust}\) = 0.058, \(p\) = .09, 95% CI: -.30 to .04) is not statistically significant at \(p\) < .05. Of note too, the results of a more conventional mixed effects model with time specified as a random effect are reported below; importantly, the same conclusions are drawn from both analyses (i.e., that there is a non-significant effect of year of data collection; \(B\) = -.112, \(SE\) = 0.060, \(p\) = .062, 95% CI: -.230 to .006).

Thus, the effect reported by Visontay et al. (2020), and taken as evidence for declines in alcohol consumption in young adults between 1989 and 2015, is not detected when the nesting of studies within year of data collection is taken into account in the estimate of the standard error for this parameter. The robust estimate of the standard error (\(SE_{robust}\) = 0.058) is 10.34% higher (i.e., more conservative) than that reported by Visontay et al. (2020) (\(SE\) = 0.052) [1.00 – (\(SE\) = 0.052 ÷ \(SE_{robust}\) = 0.058) × 100% = 10.34%]. Thus, this re-analysis with a more appropriate model suggest that the findings presented by Visontay et al., (2020) are likely an artefact of ignoring this nesting, and that the conclusions drawn from the standard CTMA model are sensitive to this oversight. It is therefore necessary to urge caution in applying the findings of Visontay et al., (2020) to understanding estimated changes in young adults’ harmful alcohol consumption over time.


The re-analysis presented here calls into question the fundamental conclusions of Visontay et al., (2020, p. 2) that “…harmful alcohol consumption in young adults may have declined between 1989 and 2015.” Using cluster robust estimates of standard errors that account for the nesting of studies within year of data collection suggests that there was not a statistically appreciable change in young adults’ harmful alcohol consumption between 1989 and 2015. Importantly, the issue of ignoring issues of nesting and non-independence in CTMA is not exclusive to the work of Visontay et al., (2020). Indeed, every CTMA to date has failed to account for this issue, calling into question a relatively large body of literature that is based on this methodology. This observation, coupled with the existing critiques of this methodology (see Rudolph et al., 2019) means that researchers should interpret the results of CTMAs with a high degree of scrutiny, and consider possible sources of non-independence when doing so.


Donnellan, M. B., Trzesniewski, K. H., & Robins, R. W. (2009). An emerging epidemic of narcissism or much ado about nothing? Journal of Research in Personality, 43(3), 498–501.

Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672-683.

Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15(3), 351–357.

Rudolph, C. W., Costanza, D. P., Wright, C., & Zacher, H. (2019). Cross-Temporal Meta-Analysis: A Conceptual and Empirical Critique. Journal of Business and Psychology, 1-18.

Rudolph, C. W., & Zacher, H. (2017). Considering generations from a lifespan developmental perspective. Work, Aging and Retirement, 3(2), 113–129.

Snijders, T., & Bosker, R. (1999). Multilevel analysis: An introduction to basic and advanced multilevel modeling (2nd ed.). Thousand Oaks, CA: Sage Publications.

Trzesniewski, K. H., Donnellan, M. B., & Robins, R. W. (2008). Is “generation me” really more narcissistic than previous generations? Journal of Personality, 76(4), 903–918.

Visontay, R., Mewton, L., Sunderland, M., Prior, K., & Slade, T. (2020). Changes over time in young adults’ harmful alcohol consumption: A cross-temporal meta-analysis using the AUDIT. Drug and Alcohol Dependence, 108172, doi:

Data & Code Underlying These Analyses

Load Required Packages


Read in Data [Table 1 of Visontay et al. 2020]

                      ~Study_Year, ~Year_Data_Collected, ~Sample_Type,        ~Country,    ~N, ~Mean,  ~SD,     ~w,
          "Fleming et al. (1991)",                1989L,          "U", "United States",  989L,     9,  5.8,   29.4,
                "Clements (1998)",                1996L,          "U", "United States",  306L,   3.9,  3.8,  21.19,
                "Lennings (1998)",                1996L,          "U",     "Australia",  183L,   7.5,  5.9,   5.26,
            "Kypri et al. (2002)",                2000L,          "U",   "New Zealand", 1480L,  8.92, 6.82,  31.82,
      "Stahlbrandt et al. (2007)",                2000L,          "U",        "Sweden",  556L,   9.8,    5,  22.24,
        "Andersson et al. (2007)",                2003L,          "U",        "Sweden", 2032L,  7.29, 4.65,  93.98,
      "Kills Small et al. (2007)",                2005L,          "U", "United States",   88L,  9.24, 6.18,    2.3,
          "O’Brien et al. (2010)",                2005L,          "U",     "Australia", 1028L, 10.13, 6.59,  23.67,
         "Blomeyer et al. (2013)",                2006L,         "CS",       "Germany",  268L,  4.55, 4.26,  14.77,
                  "Zverev (2008)",                2006L,          "U",        "Malawi",  787L,   8.3,  8.1,     12,
          "Hallett et al. (2012)",                2007L,          "U",     "Australia", 7237L,   7.4, 6.43, 175.04,
     "Young and de Klerk (2008)a",                2007L,          "U",  "South Africa", 2049L,  8.94,  7.2,  39.53,
          "Balodis et al. (2010)",                2008L,          "U",        "Canada",   90L,  7.66, 5.42,   3.06,
        "Young and Mayson (2010)",                2008L,          "U",  "South Africa",  318L,  8.23, 6.57,   7.37,
     "Young and de Klerk (2008)b",                2008L,          "U",  "South Africa", 1119L,  8.84,  6.9,   23.5,
           "Moreno et al. (2012)",                2009L,          "U", "United States",  224L,   5.8,  4.9,   9.33,
          "Olthuis et al. (2011)",                2009L,          "U", "United States", 1555L,   6.1,  5.9,  44.67,
           "Prat and Adan (2011)",                2009L,          "U",         "Spain",  517L,  4.53,  3.8,   35.8,
                  "Claros (2010)",                2010L,          "U", "United States",  199L,  4.67, 4.38,  10.37,
           "Ridout et al. (2012)",                2010L,          "U",     "Australia",  158L,  9.43, 6.05,   4.32,
          "Kreusch et al. (2013)",                2011L,          "U",       "Belgium",   61L,  8.93, 5.37,   2.16,
        "MacKillop et al. (2013)",                2011L,          "U", "United States",  354L,  6.57, 5.23,  12.94,
            "Young et al. (2013)",                2011L,          "U", "United States",  200L,  5.33, 5.05,   7.84,
             "Choi et al. (2015)",                2013L,          "U",   "South Korea",  448L, 13.34, 7.99,   7.02,
           "Snipes et al. (2015)",                2013L,          "U", "United States",  751L,  4.83, 5.39,  25.85,
    "Whitney and Froiland (2015)",                2013L,          "U", "United States",   62L,  8.55, 6.11,   1.66,
         "Baranger et al. (2016)",                2014L,          "U", "United States",  727L,  4.85, 3.76,  51.42,
        "Lindgren et al. (2016)a",                2014L,          "U", "United States",  360L,  5.22, 4.89,  15.06,
              "Lindgren (2016) b",                2014L,          "U", "United States",  450L,  5.37, 4.76,  19.86,
       "Marczinski et al. (2016)",                2014L,          "U", "United States",  146L,  6.19, 4.61,   6.87,
     "Brunelle and Hopley (2017)",                2015L,          "U",        "Canada",  175L,   7.3, 5.11,    6.7,
  "Pereira-Morales et al. (2017)",                2015L,          "U",      "Colombia",  274L,   4.7,  4.3,  14.82

Table of Counts of Studies within Year (Demonstrating Nesting)

data %>%
  dplyr::select(Year_Data_Collected) %>%
## .
## 1989 1996 2000 2003 2005 2006 2007 2008 2009 2010 2011 2013 2014 2015 
##    1    2    2    1    2    2    2    3    3    2    3    3    4    2

Compute \(ICC_1\) for Year of Data Collection

ICC_MOD<-aov(Mean~as.factor(Year_Data_Collected), data, weights=w)

ICC_MOD_ICC1<-ICC1(ICC_MOD) %>% print() 
## [1] 0.4255725
#57.44% of the variance occurs within year
## [1] 57.44275

Confirm CTMA Results

M1<-data %>%
  lm(Mean~Year_Data_Collected, data=., weights=w) 

M1 %>% summary()
## Call:
## lm(formula = Mean ~ Year_Data_Collected, data = ., weights = w)
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.9299  -3.4399   0.4035   4.4644  18.9651 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)  
## (Intercept)         274.34826  103.91499   2.640   0.0130 *
## Year_Data_Collected  -0.13322    0.05179  -2.573   0.0153 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 8.187 on 30 degrees of freedom
## Multiple R-squared:  0.1807, Adjusted R-squared:  0.1534 
## F-statistic: 6.618 on 1 and 30 DF,  p-value: 0.01529

Apply CTMA with Cluster Robust SEs via clubSandwich

# Tests
coef_test(M1, vcov = "CR2", cluster = data$Year_Data_Collected, coefs = "All") 
##                  Coef Estimate       SE d.f. p-val (Satt) Sig.
## 1         (Intercept)  274.348 116.5335  3.6       0.0855    .
## 2 Year_Data_Collected   -0.133   0.0581  3.6       0.0908    .
# Confidence Intervals
conf_int(M1, vcov = "CR2", cluster = data$Year_Data_Collected, coefs = "All")
##                  Coef Estimate       SE Lower 95% CI Upper 95% CI
## 1         (Intercept)  274.348 116.5335      -64.002     612.6987
## 2 Year_Data_Collected   -0.133   0.0581       -0.302       0.0353

Mixed Effects Model

M2<-data %>% 
  lmer(Mean~Year_Data_Collected+(1|Year_Data_Collected), data=., weights=w)

M2 %>% summary()
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Mean ~ Year_Data_Collected + (1 | Year_Data_Collected)
##    Data: .
## Weights: w
## REML criterion at convergence: 142
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.9878 -0.3231  0.1334  0.5292  2.6307 
## Random effects:
##  Groups              Name        Variance Std.Dev.
##  Year_Data_Collected (Intercept)  1.013   1.007   
##  Residual                        48.026   6.930   
## Number of obs: 32, groups:  Year_Data_Collected, 14
## Fixed effects:
##                      Estimate Std. Error        df t value Pr(>|t|)  
## (Intercept)         232.19474  120.63658  12.55822   1.925   0.0772 .
## Year_Data_Collected  -0.11224    0.06013  12.55011  -1.867   0.0855 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Correlation of Fixed Effects:
##             (Intr)
## Yr_Dt_Cllct -1.000

  1. I chose to use RVE here because it adjusts standard errors (SEs) and leaves the parameter estimates the same. For the purposes of this example, this makes comparisons between “normal” and “robust” CTMA models easier. However, two caveats about the application of cluster robust standard errors and RVE bear some mention here (many thanks to James Pustejovsky for pointing these out!). First, RVE tends to be less powerful than model-based standard errors, and owning to the relatively small sample sizes considered here, it is possible that this test is not optimally calibrated in terms of type-I error rates. Moreover, RVE provides a rather broad correction to SEs, and will account for any un-modeled error within clusters. Thus, the larger standard errors in the RVE model does not necessarily imply that there is clustering at the level of study year, and could rather reflect other sources of unmodeled between-sample heterogeneity. Still, the triangulation of these findings across both RVE and mixed effects models should give us pause about the conclusions drawn from the CTMA model.↩︎

Cort W. Rudolph
Associate Professor of Industrial & Organizational Psychology