STRUCTURAL EQUATION MODELING: A MULTIDISCIPLINARY 96Expanding the Bayesian structural equation, multilevel and mixture models to logit,negative-binomial, and nominal variablesTihomir Asparouhov and Bengt MuthénMplusABSTRACTKEYWORDSRecent work on the Polya-Gamma distribution provides a breakthrough for the Bayesian modeling oflogit, count, and nominal variables. We describe how the methodology is incorporated in the Mplusmodeling framework and illustrate it with several examples: logistic latent growth models, multilevel IRT,multilevel time-series models for count data, multilevel nominal regression, and nominal factor analysis.Polya-Gamma distribution;Bayesian estimation;Negative-binomial SEM;Nominal SEMIntroductionThe Bayesian estimation of the structural equation modelingframework implemented in Mplus is described in Asparouhovand Muthén (2010a). The framework includes mixture modelsas well as multilevel models. The observed endogenous vari ables can be continuous normally distributed outcomes as wellas categorical outcomes based on the probit link function. Inthis note we describe how the Mplus framework is extended toinclude these new types of variables: count variables based onthe Poisson and the negative-binomial distributions, nominalvariables as well as binary variables based on the logit linkfunction. This expanded framework is similar to the ML esti mation framework implemented in Mplus, see Muthén andAsparouhov (2007). The model is largely unchanged, but wecan now utilize the more efficient Bayesian estimation.Numerical integration is typically required in the ML estima tion of latent variable models with non-normal outcomes.Therefore, the number of latent variables and random effectsin the model cannot exceed 3 or 4 because the computationalspeed grows exponentially with the number of latent variables.In the Bayesian estimation, the computational speed growslinearly with the number of latent variables, and generally, anunlimited number of latent variables can be used withoutsubstantially increasing the computational time.Categorical variables in the Asparouhov and Muthén(2010a) framework, based on the probit link function, utilizethe conceptualization of an underlying latent variable. For eachcategorical variable Y, there is a latent variable Y that is cutaccording to certain threshold parameters to obtain the cate gorized observed variable Y. The existence of such an under lying latent variable is immediately provided by the probit linkfunction. The latent variable construct is very important in theBayesian estimation. Any structural model that can be formu lated for Y, can also be formulated as a linear model for Y . Forthis linear model, all conditional distributions (for structuralparameters, latent factors, random effects, and missing data) inthe MCMC estimation are explicit. Provided with conjugatepriors, the MCMC estimation is fast and efficient.For other types of variables such as count, logit-categorical,or nominal, a conceptualization of such underlying latentvariable had not been found until very recently. Fox’s (2010)work on item response modeling, for example, in the absenceof underlying latent variables, utilizes the Metropolis-Hastingsalgorithm as a part of the MCMC algorithm. Such an approachtends to be more computationally demanding, less efficientthan explicit conditional distributions, more difficult to imple ment in a generalized framework, and more prone to slowmixing estimation. Groundbreaking recent work by Pillowand Scott (2012) and Polson et al. (2013) yields a underlyinglatent variable methodology for negative-binomial and Poissoncount variables, logit-binary variables and nominal variables.The approach utilizes the Polya-Gamma (PG) distribution andis uniquely suited for structural, multilevel and mixture mod els. With the PG method, regression parameters with conjugatenormal priors can be updated in the MCMC estimation fromexplicit conditional normal distributions. Similarly, normallydistributed latent variables or random effects (random slopes,loadings or intercepts) can be updated from explicit condi tional normal distributions. These explicit conditional distri butions produce highly efficient MCMC estimation. Kim et al.(2018) compare the Polya-Gamma based estimation method toother Bayesian methods in the context of structural equationmodels with logit-binary variables and found that the PolyaGamma method yields superior performance.For Mixture models, the categorical latent class variablesare generally modeled as nominal variables. Using the PGmethodology, we can now also extend the Mplus Bayesianmixture framework to include latent class variable regressionon other variables. Within the MCMC estimation, where thelatent class variable is imputed in each iteration, the latentclass regression is simply a nominal variable regression and thePG methodology applies.CONTACT Tihomir [email protected] Stoner Ave. Los Angeles, CA 90066.We are indebted to Mårten Schultzberg for sharing his explorations of the Polya-Gamma methodology and to Alberto Maydeu-Olivares for helpful discussions on thenominal factor analysis model. 2021 Taylor & Francis Group, LLC
2ASPAROUHOV AND MUTHÉNMissing data on count/nominal/categorical variables aretypically easy to resolve in likelihood based methods (Bayesand ML). However, missing data on the predictors/covariatesof such variables is not. In the ML estimation for example, suchmissing data lead to additional dimensions of numerical inte gration. Even for a simple logit regression model, the dimen sions of numerical integration could easily become substantialdepending on the amount of missing data in the covariates. Inthis situation the PG methodology can be utilized as well. Themissing values can be imputed in the MCMC estimation fromexplicit conditional distributions. This also extends to themixture modeling situation where the latent class variable hasmissing predictors. Furthermore, the method can be applied tothe three-stage estimation, see Asparouhov and Muthén(2014a), where the final step in the estimation is conductedwith the Bayes estimator instead of the ML estimator. Anillustration of that approach is provided in Asparouhov andMuthén (2020a).There is one drawback of the PG methodology that makesthe Bayesian estimation slower, however. If categorical data aremodeled with the probit link function, the underlying normalvariable has a conditional distribution that varies linearly withthe model covariates. The model parameters remain the sameacross individuals, which makes matrix manipulations effi cient. This does not apply to the PG methodology. The under lying variable for count/nominal/logit variables hasa conditional distribution that varies across observations.Matrix manipulation must be performed separately for eachobservation. This affects the computational speed of the esti mation but it does not affect the generality of the methodologyor the mixing efficiency.In the next section we describe the PG methodology andhow it is implemented in Mplus 8.5. We then illustrate theBayesian estimation with several simulation studies.Bayesian estimation for models with logit, count andnegative-binomial variablesWe begin by providing a formal definition for the PG distribu tion. A random variable W has a Polya-Gamma distributionwith parameters b and c, i.e. W,PGðb; cÞ, if W is obtained asthe following infinite sumW¼11 X2π2 k¼1 ðkGk20:5Þ þ ðc ð2πÞÞ2(1)where Gk are independent gamma random variables with dis tribution Gaðb; 1Þ. It is not possible to simplify the aboveinfinite sum, but in most cases the sum can be approximatedas a finite sum. For example, the first 500 terms of the abovesum provide a good approximation for most values of b and c.The PGðb; cÞ distribution takes only positive values and hasmeanEðWÞ ¼ ðb ð2cÞÞ tanhðc 2Þ(2)and varianceVarðWÞ ¼ ðb ð4c3 ÞÞðsinhðcÞcÞ ðcoshðc 2ÞÞ2 :(3)For large values of b and c ( 200) the PG distribution can beapproximated by a normal distribution. Also, for large valuesof c, the variance of the distribution goes to 0 and so thedistribution becomes approximately equal to a constant. Notealso that the parameter b is always positive, while the parameterc can be positive and negative and the PG distribution isindependent of the sign of c.To be able to use this distribution in the MCMC estimation,an efficient method for generating PG random variables isneeded. Multiple such methods have been proposed inWindle et al. (2014). Mplus utilizes four of these methods:finite sum approximation, normal approximation, saddlepoint approximation, and Devroye’s approximation.Depending on the parameters b and c, one of the four methodsis used. Such a hybrid approach is designed to find an optimalcompromise between the speed of the computation and thequality of the approximation.Next we describe how the PG distribution is used to facil itate the Bayesian estimation for logistic, negative-binomial,and nominal regressions. Note that the PG distribution is notused to model a dependent variable. It is used only to constructa normally distributed underlying latent variable from anobserved logit, count, or nominal variable.Logistic regressionConsider the logistic regression for a binary outcome Y (withoutcome values of 0 and 1) given by the equationPðY ¼ 0Þ ¼11 þ ExpðβXÞ(4)where X represents a set of covariates and β represents a set ofparameters that are to be estimated.The underlying variable Y in this case is constructed in twostep. In the first step we generateW,PGð1; βXÞ:(5)In the second step we compute Y as follows:Y 0:5Y ¼ pffiffiffiffiffi :W(6)The logistic regression model (4) for Y implies the followinglinear regression model for Y pffiffiffiffiffiY ¼ β W X þ ε;(7)where ε is a standard normal random variable. This linearregression can be used to estimate the regression coefficientsβ. Note that the predictor variables in the linear regression arepffiffiffiffiffiW X rather than the original covariates X used in the logisticregression. Once the underlying latent variable Y is generated,the logistic regression equation for Y is replaced by the linearregression for Y . Thus any structural/multivariate/multilevelmodel involving the logistic regression for Y is transformedinto a structural/multivariate/multilevel model for Y . Suchmodels are then estimated as in Asparouhov and Muthén(2010a).
STRUCTURAL EQUATION MODELING: A MULTIDISCIPLINARY JOURNALTo be more specific, the MCMC estimation proceeds asfollows. In each MCMC iteration, we generate W and com pute Y . We then update any structural parameters, randomeffects, latent variables and missing data, using the linearmodel for Y . Note again that the covariates in the logisticpffiffiffiffiffiequation are changed to W X. If those same covariates areused as predictors in another equation, they are unchangedfor that other equation or are changed according toa different PG deviate. Also note that if the covariates Xmust be updated due to missing values or if the covariatespffiffiffiffiffiinclude latent variables, the scale coefficient W is thenattached to the regression parameters, i.e., in the Gibbssampler step that updates X, the coefficients β in the logisticpffiffiffiffiffiregression are replaced with the coefficients β W in thelinear regression of Y .In the above description, we did not use the intercept in thelogistic regression. The intercept is a special case of a regressionparameter for a covariate that is the constant 1. In the Mplusimplementation, however, the intercept is actually used and iscalled a threshold τ. In line with the probit regression, theactual Mplus parameterization for the logistic regression is asfollowsPðY ¼ 0Þ ¼Expðτ βXÞ1¼:1 þ Expðτ βXÞ 1 þ Expð τ þ βXÞNegative-binomial regressionFirst, we discuss the properties of the negative-binomial dis tribution and then we will extend that discussion to the nega tive-binomial regression model.Suppose that a variable Y has a negative-binomial distribu tion. The probability distribution function for Y is given by kþr 1(9)PðY ¼ kÞ ¼pr ð1 pÞk ;r 1where r and p are the parameters of the distribution. Theinterpretation of this distribution is that Y represents thenumber of successes in a sequence of Bernoulli (binary) trialsbefore the occurrence of the r th failure, where the parameterp represents the failure probability. Such an interpretationrequires that r is an integer parameter but in fact the negativebinomial distribution is defined for any positive value r usingthe notation that kþr 1r 1 ¼kYΓðk þ rÞr¼Γðk þ 1ÞΓðrÞ i¼11þi;i(10)where Γ is the Gamma function. The mean of Y is(8)Thus the logistic regression intercept is equivalent to thethreshold parameter in Mplus but is with the opposite sign.The PG methodology does not extend to nonbinary orderedpolytomous variables with the logit link function. The Mplusimplementation allows for the simultaneous modeling of binaryvariables with the logit link function and nonbinary orderedpolytomous variables with the probit link function. To use suchmodeling the option LINK PROBIT LOGIT must be specifiedin the ANALYSIS command. The natural extension of the PGmethodology to nonbinary categorical variables leads to nom inal variables rather than ordinal. We discuss the PG methodol ogy for nominal variable further below.In a multivariate model where a binary variables Y is used togenerate an underlying latent variables Y , a natural questionarises regarding the possible correlation between Y and othervariables in the model. Such correlation can easily be esti mated, however, the correlation does not naturally translateinto some kind of an association between the observed binaryvariables Y and the other variables. In the Mplus implementa tion, such correlations are not pursued at this time. The maintool for correlating binary variables in this logit based model ing framework is to use normally