class: center, middle, inverse, title-slide # More efficient SEM via R? π ### Dr Ewan Carr
@ewancarr
### Department of Biostatistics & Health Informatics
### SEMantics
16
th
March 2021 --- class: middle, center <style type="text/css"> .code-bg-green .remark-code, .code-bg-red .remark-code * { background-color:#FFFDE0!important; .cod } .tiny{font-size: 50%} .big{font-size: 140%} .very-small{font-size: 60%} </style> .big[ <code> ewancarr.github.io/semantics-mplusautomation </code> ] --- # Three minute version β 1. `MplusAutomation` is an R package that allows us to programatically interact with Mplus without leaving R π 2. We can create input files (`.inp`), run them in Mplus, and extract the outputs. 3. This is great for **reproducibility.** ### This talk **Why?** The joys of Mplus π€¦, and the problem we're trying to solve. **How?** What is `MplusAutomation`? How does it work? * Formatting your data for Mplus * Generating input files * Extracting output **Going further** π RMarkdown, `purrr` π, parallel processing, `lavaan`... ??? Who is this for? Anyone interested in fitting SEM models R users, future R users, Mplus users Obviously using R, but will hopefully be approachable to all --- class: middle, center ![:scale 85%](images/tweet.png) --- class: middle, center .pull-left[ <br><br><br><br> # Do we still need this? ] .pull-right[.center[![](images/hug_annotated.png)]] --- class: middle, center, inverse ## What is Mplus? <br> <br> ## What is is the problem we're trying to solve? --- class: spaced # The joys of using Mplus π’ .pull-left[ 1. Write an input file. 2. Click `Run`. 3. Review the output file, all 20 pages... π 4. Scroll down to the section of output you actually need. 5. Keep scrolling... 6. Copy the required outputs into a spreadsheet? 7. Reformat by hand. 8. Copy into your manuscript. ] .pull-right[ .center[![](images/mplus.png) .content-box-blue[ To reproduce your analysis, just repeat steps 1 to 8. π ] ]] --- .content-box-blue[ This is bad enough for a single input file π. But some applications require many input files. * Simulation studies π * Latent class analysis π’ * Tools that build on Mplus π ] -- ![](images/mplus_trees.png) --- class: middle #### [Discernment of Mediator and Outcome Measurement in the PACE trial](https://www.medrxiv.org/content/10.1101/2021.01.25.21250436v1) .pull-left[ .small[ *Ewan Carr, Silia Vitoratou, Trudie Chalder, and Kimberley Goldsmith* ] <br><br><br><br><br><br><br><br> ![](images/pace_medrxiv.png) .tiny[<https://www.medrxiv.org/content/10.1101/2021.01.25.21250436v1>] ] .pull-right[.center[ ![:scale 150%](images/pace.png) ]] --- class: center ### π Reasons to avoid point-and-click analyses π« -- .center[ .pull-left[ ### 1. Efficiency ![](images/xkcd_worth_the_time.png) .tiny[<https://xkcd.com/1205/>] ] .pull-right[ ### 2. Reproducibility ![](images/vadar_redrawn.png) ] ] --- class: inverse, center, middle # What is MplusAutomation? --- `MplusAutomation` is an R package that talks to Mplus. .content-box-yellow[.small[ MplusAutomation is a package for R that **facilitates complex latent variable analyses in Mplus** involving comparisons among many models and parameters. More specifically, MplusAutomation provides tools to accomplish **3 objectives**: 1. To **create and manage Mplus syntax** for groups of related models; 2. To **automate the estimation** of many models; 3. And to **extract**, aggregate, and compare fit statistics, parameter estimates, and ancillary model outputs. ]] .pull-left[ .small[ It's been around since 2010, regularly updated. To get started: * Read the [paper](https://www.tandfonline.com/doi/full/10.1080/10705511.2017.1402334) and [vignette](https://cran.r-project.org/web/packages/MplusAutomation/vignettes/Vignette.pdf) π * Take a look at the [GitHub page](https://github.com/michaelhallquist/MplusAutomation) and [forum](https://groups.google.com/forum/#!forum/mplusautomation) π * Keep listening! π If you use it, cite it! .very-small[See `citation("MplusAutomation")`.] ]] .pull-right[ ![](images/mplusautomation_journal.png) ] --- ## How does it work? ![](images/flowchart.png) --- ## Show me! π€ .large[ ```r library(MplusAutomation) library(texreg) # Specify the model cfa <- " DATA: FILE = data/ex5.1.dat; VARIABLE: NAMES = y1-y6; USEVAR = y1-y3; MODEL: f1 BY y1-y3; " # Create the input file writeLines(cfa, "cfa.inp") # Run in Mplus runModels("cfa.inp") # Read the output file fit <- readModels("cfa.out") ``` ] --- .pull-left[ ```r htmlreg(fit, summaries = c("Observations", "CFI", "SRMR")) ``` ] .pull-right[ <table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;color: #000000;border-top: 2px solid #000000;"> <caption>CFA, single factor</caption> <thead> <tr> <th style="padding-left: 5px;padding-right: 5px;"> </th> <th style="padding-left: 5px;padding-right: 5px;">input</th> </tr> </thead> <tbody> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">Y1<-F1</td> <td style="padding-left: 5px;padding-right: 5px;">1.00 [ 1.00; 1.00]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y2<-F1</td> <td style="padding-left: 5px;padding-right: 5px;">1.12 [ 0.93; 1.32]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y3<-F1</td> <td style="padding-left: 5px;padding-right: 5px;">1.02 [ 0.85; 1.19]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y1<-Intercepts</td> <td style="padding-left: 5px;padding-right: 5px;">-0.02 [-0.15; 0.10]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y2<-Intercepts</td> <td style="padding-left: 5px;padding-right: 5px;">0.03 [-0.10; 0.15]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y3<-Intercepts</td> <td style="padding-left: 5px;padding-right: 5px;">0.04 [-0.09; 0.16]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">F1<->F1</td> <td style="padding-left: 5px;padding-right: 5px;">0.91 [ 0.66; 1.15]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y1<->Y1</td> <td style="padding-left: 5px;padding-right: 5px;">1.06 [ 0.87; 1.25]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y2<->Y2</td> <td style="padding-left: 5px;padding-right: 5px;">0.80 [ 0.60; 0.99]</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Y3<->Y3</td> <td style="padding-left: 5px;padding-right: 5px;">1.01 [ 0.82; 1.20]</td> </tr> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">Observations</td> <td style="padding-left: 5px;padding-right: 5px;">500</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">CFI</td> <td style="padding-left: 5px;padding-right: 5px;">1.00</td> </tr> <tr style="border-bottom: 2px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">SRMR</td> <td style="padding-left: 5px;padding-right: 5px;">0.00</td> </tr> </tbody> </table> ] --- ## Again! π€© ```r map(`2:6`, ~ str_glue("DATA: FILE = ex7.3.dat; VARIABLE: NAMES = u1-u4 x1-x10; USEVARIABLES = u1-u4; CATEGORICAL = u1-u4; CLASSES = c (`{.x}`); ANALYSIS: TYPE = MIXTURE;") %>% write_lines(., str_glue("lca/`{.x}`class.inp"))) runModels("lca") fit <- readModels("lca") ``` .small[ ```r map_dfr(fit, "summaries") %>% select(NLatentClasses, Observations, AIC, aBIC, LL, Entropy) %>% kable() ``` | NLatentClasses| Observations| AIC| aBIC| LL| Entropy| |--------------:|------------:|--------:|--------:|--------:|-------:| | 2| 500| 1948.488| 1957.853| -965.244| 0.904| | 3| 500| 1953.115| 1967.682| -962.557| 0.810| | 4| 500| 1962.883| 1982.653| -962.441| 0.745| | 5| 500| 1972.883| 1997.856| -962.441| 0.579| | 6| 500| 1982.883| 2013.059| -962.441| 0.562| ] --- class: spaced The rest of this talk... ### Getting started 1. Getting data into Mplus 2. Generating input files 4. Running models 3. Extracting and formatting output ### Going further 4. Running models in parallel with `furrr` 5. Summarising many models with `map` 5. What about `lavaan`? ### Q&A --- # A brief interlude into R (Sorry). * `MplusAutomation` is great, but where possible I want to use generic functions. (I'd rather .deep-orange[learn something once] that can be applied everywhere). * Much of what I'll talk about today builds on two R functions: `str_glue` and `map`. .pull-left[ .center[ ![:scale 200px](images/stringr.png) ]] .pull-right[ .center[ ![:scale 200px](images/purrr.png) ]] --- # 𧡠`str_glue` `str_glue` does [string interpolation](https://en.wikipedia.org/wiki/String_interpolation). ```r x <- 23 str_glue("The number is {x}.") ``` ``` ## The number is 23. ``` -- ```r today <- ymd("2021-03-16") str_glue("Today is {wday(today, label = TRUE, abbr = FALSE)}.") ``` ``` ## Today is Tuesday. ``` -- We can also provide a list (or vector) of inputs: ```r str_glue("This is number {1:3}") ``` ``` ## This is number 1 ## This is number 2 ## This is number 3 ``` --- # πΎ `map` `map` applies a function to a list. ```r map_dbl(1:6, sqrt) ``` ``` ## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 ``` -- <br> You can use an existing function (e.g. `sqrt`) or a temporary function: ```r map_dbl(1:10, `~ .x^2`) map_dbl(1:10, `function(x) { x^2 }`) # These two lines are equivalent ``` ``` ## [1] 1 4 9 16 25 36 49 64 81 100 ``` --- # πΆββοΈ `walk` If you don't need the outputs, use `walk`: ```r walk(list_of_things, some_function) ``` Useful where your function carries out some action (e.g. running a model in Mplus) but where you don't need to store the result of this action. <br><br><br><br><br><br><br><br><br> .right[Back to `MplusAutomation`...] --- class: middle, center ![](images/stages_1.png) --- # πΎ Getting data into Mplus .pull-left[Mplus requires your data to be in a specific format. I have .teal[no idea] what this is. π€· ] .pull-right[ .middle[ ```r DATA: FILE = `ex3.1.dat`; VARIABLE: NAMES = y1 x1 x3; MODEL: y1 ON x1 x3; ``` ]] -- <br> In the past, I've used `stata2mplus`. Now, `prepareMplusData`: ```r # Load some data data(HolzingerSwineford1939) # Reformat for Mplus prepareMplusData(HolzingerSwineford1939, "hs1939.dat") ``` -- There are a few options (see `?prepareMplusData` for details) β e.g. to drop variables or only write data on changes. ```r # Importing a Stata dataset df <- read_dta("stata_dataset.dta") prepareMplusData(df, "df.dat") ``` --- π‘ `prepareMplusData` will also generate input files: ```r inp <- prepareMplusData(HolzingerSwineford1939, "hs1939.dat") ``` ``` ## ## Factor: school ## Conversion: ## level number ## Grant-White 1 ## Pasteur 2 ## TITLE: Your title goes here ## DATA: FILE = "hs1939.dat"; ## VARIABLE: ## NAMES = id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7 x8 x9; ## MISSING=.; ``` ```r print(inp) ``` ``` ## [1] "TITLE: Your title goes here\n" ## [2] "DATA: FILE = \"hs1939.dat\";\n" ## [3] "VARIABLE: \n" ## [4] "NAMES = id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7 x8 x9; \n" ## [5] "MISSING=.;\n" ``` --- class: middle, center ![](images/stages_2.png) --- # Generating input files `MplusAutomation` provides the `createModels` function. I prefer to use generic R functions instead (e.g. `str_glue` and `map`). -- ### Example 1: A static input file ```r # Write your input file inp <- " TITLE: this is an example of a simple linear regression for a continuous observed dependent variable with two covariates DATA: FILE IS ex3.1.dat; VARIABLE: NAMES ARE y1 x1 x3; MODEL: y1 ON x1 x3;" # Save it writeLines(inp, "ex3_1.inp") # Run it runModels("ex3_1.inp") ``` --- ### Example 2: A dynamic input file What if we wanted to repeat this model for several outcomes? ```r inp <- "DATA: FILE IS ex3.1.dat; VARIABLE: NAMES ARE y1 y2 y3 x; MODEL: `y1` ON x;" ``` -- We can use `str_glue` and `walk` for this: ```r walk(c("y1", "y2", "y3"), function(x) { # Generate the input file for this outcome input_file <- str_glue("DATA: FILE IS ex3.1.dat; VARIABLE: NAMES ARE y1 y2 y3 x; MODEL: `{x}` ON x;") # Save writeLines(input_file, paste0(`x`, ".inp")) }) ``` -- .pull-left[ `y1.inp` .code-bg-green[ ```bash DATA: FILE IS ex3.1.dat; VARIABLE: NAMES ARE y1 y2 y3 x; MODEL: y1 ON x; ``` ]] .pull-right[ `y2.inp` .code-bg-green[ ```bash DATA: FILE IS ex3.1.dat; VARIABLE: NAMES ARE y1 y2 y3 x; MODEL: y2 ON x; ``` ]] --- class: spacing ### Example 3: Another dynamic example Suppose we wanted to fit a latent class model with between 2 and 8 classes. ```r n_classes <- 2:8 ``` ``` ## 2 3 4 5 6 7 8 ``` ```r walk(`n_classes`, function(x) { input_file <- str_glue("DATA: FILE = ex4.4.dat; VARIABLE: NAMES = y1-y8; CLASSES = c(`{x}`); ANALYSIS: TYPE = MIXTURE;") writeLines(input_file, str_glue("lca_`{x}`.inp")) }) ``` -- .content-box-blue[ This process can be summarised as: 1. Make a **list of inputs** (e.g. dependent variables, numbers of classes) 2. Update the model using **placeholders**. This can be as simple π or complex π¦ as you need. ] --- Suppose we want to fit this latent class model to 100 datasets π±. ```r datasets <- dir_ls("data", glob = "d*.dat") n_classes <- 2:8 ``` And maybe, consider two predictors of class membership, `x1` and `x2`. ```r predictors <- c("x1", "x2") ``` -- We can produce list containing **all combinations** of these input parameters with `cross`: ```r all_combinations <- cross(list(d = datasets, nc = n_classes, p = predictors)) ``` This produces a list with 7 `\(\times\)` 2 `\(\times\)` 100 = 1400 items. --- .pull-left[ Here's the first item: ```r all_combinations[[1]] ``` ``` ## $d ## data/dataset1.dat ## ## $nc ## [1] 2 ## ## $p ## [1] "x1" ``` And the 90<sup><small>th</small></sup> item: ```r all_combinations[[90]] ``` ``` ## $d ## data/ex7.3.dat ## ## $nc ## [1] 7 ## ## $p ## [1] "x2" ``` ] .pull-right[ Here's the second item: ```r all_combinations[[2]] ``` ``` ## $d ## data/dataset2.dat ## ## $nc ## [1] 2 ## ## $p ## [1] "x1" ``` <br> .content-box-green[ Note that the parameters inside each item are labelled: `d` refers to the dataset `nc` refers to the number of classes `p` refers to the predictors ] ] --- We can then use these in our placeholders: ```r walk(all_combinations, function(x) { input_file <- str_glue(" DATA: FILE = {`x$d`}; VARIABLE: NAMES = u1-u4 x1 x2; CLASSES = c ({`x$nc`}); ANALYSIS: TYPE = MIXTURE; ALGORITHM = INTEGRATION; MODEL: %OVERALL% c ON {`x$p`}; %c#1% [u1$1-u4$1]; %c#2% [u1$1-u4$1]; OUTPUT: TECH1 TECH8;") # Generate a filename and save fn <- str_glue("{path_ext_remove(path_file(x$d))}_{nc}_{p}.inp") writeLines(input_file, fn) } ``` This will generate 1400 input files, e.g. ``` dataset1_2_x1.inp dataset1_3_x1.inp dataset1_4_x1.inp dataset1_5_x1.inp ... ``` --- class: middle, center ![](images/stages_3.png) --- # Run models with `runModels` .pull-left[ **1)** You can run a single model: ```r runModels("cfa.inp") ``` <br> **3)** You can print outputs: ```r runModels("cfa.inp", showOutput=TRUE) ``` ] .pull-right[ **2)** Or a folder of models: ```r runModels("a_folder_of_models") ``` <br><br> ``` ## ## Running model: cfa.inp ## System command: "mplus" "cfa.inp" ## ## Mplus VERSION 8.4 (Linux) ## MUTHEN & MUTHEN ## ## Running input file 'cfa.inp'... ## ## Beginning Time: 18:57:53 ## Ending Time: 18:57:53 ## Elapsed Time: 00:00:00 ## ## Output saved in 'cfa.out'. ``` ] --- A few points: * `runModels` will try and guess where Mplus is installed. * You can optionally specify `filefilter` to run a subset models. If you're fitting many models, `walk` may be more robust: ```r # Create a list of input files to_run <- dir_ls("models", glob = "*.inp") # Pass each file to runModels walk(to_run, runModels) ``` (This also brings other benefits, like easy parallelization, as we'll see later). --- class: center, middle ![](images/stages_4.png) --- # Parse outputs with .small[`readModels`] .center[ ![](images/from_to.png) ] --- ``` Mplus VERSION 8.4 (Linux) MUTHEN & MUTHEN 03/16/2021 03:00 PM INPUT INSTRUCTIONS DATA: FILE = ../data/ex5.1.dat; VARIABLE: NAMES = y1-y6; USEVAR = y1-y3; MODEL: f1 BY y1-y3; INPUT READING TERMINATED NORMALLY SUMMARY OF ANALYSIS Number of groups 1 Number of observations 500 Number of dependent variables 3 Number of independent variables 0 Number of continuous latent variables 1 Observed dependent variables Continuous Y1 Y2 Y3 Continuous latent variables F1 ``` --- ``` Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Input data file(s) ../data/ex5.1.dat Input data format FREE UNIVARIATE SAMPLE STATISTICS UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS Variable/ Mean/ Skewness/ Minimum/ % with Percentiles Sample Size Variance Kurtosis Maximum Min/Max 20%/60% 40%/80% Median Y1 -0.022 -0.050 -3.958 0.20% -1.236 -0.395 -0.039 500.000 1.971 -0.363 3.588 0.20% 0.328 1.240 Y2 0.026 -0.139 -5.193 0.20% -1.096 -0.321 0.044 500.000 1.949 0.107 3.703 0.20% 0.385 1.211 Y3 0.035 0.169 -3.907 0.20% -1.138 -0.398 -0.060 500.000 1.953 -0.135 4.159 0.20% 0.303 1.173 THE MODEL ESTIMATION TERMINATED NORMALLY ``` --- ``` MODEL FIT INFORMATION Number of Free Parameters 9 Loglikelihood H0 Value -2450.381 H1 Value -2450.381 Information Criteria Akaike (AIC) 4918.763 Bayesian (BIC) 4956.694 Sample-Size Adjusted BIC 4928.128 (n* = (n + 2) / 24) Chi-Square Test of Model Fit Value 0.000 Degrees of Freedom 0 P-Value 0.0000 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.000 Probability RMSEA <= .05 0.000 CFI/TLI CFI 1.000 TLI 1.000 Chi-Square Test of Model Fit for the Baseline Model Value 363.624 Degrees of Freedom 3 P-Value 0.0000 SRMR (Standardized Root Mean Square Residual) Value 0.000 ``` --- ``` MODEL RESULTS Two-Tailed Estimate S.E. Est./S.E. P-Value F1 BY Y1 1.000 0.000 999.000 999.000 Y2 1.125 0.099 11.378 0.000 Y3 1.020 0.089 11.474 0.000 Intercepts Y1 -0.022 0.063 -0.354 0.723 Y2 0.026 0.062 0.410 0.682 Y3 0.035 0.062 0.555 0.579 Variances F1 0.908 0.125 7.257 0.000 Residual Variances Y1 1.063 0.096 11.114 0.000 Y2 0.799 0.100 8.002 0.000 Y3 1.009 0.095 10.586 0.000 ``` --- ``` QUALITY OF NUMERICAL RESULTS Condition Number for the Information Matrix 0.595E-01 (ratio of smallest to largest eigenvalue) Beginning Time: 12:05:35 Ending Time: 12:05:35 Elapsed Time: 00:00:00 MUTHEN & MUTHEN 3463 Stoner Ave. Los Angeles, CA 90066 Tel: (310) 391-9971 Fax: (310) 391-8971 Web: www.StatModel.com Support: Support@StatModel.com Copyright (c) 1998-2019 Muthen & Muthen ``` --- ## `readModels` `readModels` can take a single output file: ```r output <- readModels("input_files/ex6.11.out") ``` Or an entire folder: ```r all_outputs <- readModels("input_files") ``` We get back a list containing different elements of the output: ```r names(output) ``` ``` ## [1] "input" "warnings" "errors" ## [4] "data_summary" "sampstat" "covariance_coverage" ## [7] "summaries" "invariance_testing" "parameters" ## [10] "class_counts" "indirect" "residuals" ## [13] "tech1" "tech3" "tech4" ## [16] "tech7" "tech8" "tech9" ## [19] "tech10" "tech12" "tech15" ## [22] "fac_score_stats" "gh5" ``` --- This can be a little daunting: ```r str(output) ``` ``` ## List of 23 ## $ input :List of 4 ## ..$ title : chr "this is an example of a piecewise growth model for a continuous outcome" ## ..$ data :List of 1 ## .. ..$ file: chr "ex6.11.dat" ## ..$ variable:List of 1 ## .. ..$ names: chr "y1-y5" ## ..$ model : chr [1:5] "\ti s1 | y1@0 y2@1 y3@2 y4@2 y5@2;" " \ti s2 | y1@0 y2@0 y3@0 y4@1 y5@2;" "" "" ... ## ..- attr(*, "class")= chr [1:2] "mplus.inp" "list" ## ..- attr(*, "start.line")= int 6 ## ..- attr(*, "end.line")= int 15 ## $ warnings : list() ## ..- attr(*, "class")= chr [1:2] "mplus.warnings" "list" ## $ errors : list() ## ..- attr(*, "class")= chr [1:2] "mplus.errors" "list" ## $ data_summary : list() ## ..- attr(*, "class")= chr [1:2] "mplus.data_summary" "list" ## $ sampstat :List of 1 ## ..$ univariate.sample.statistics: num [1:5, 1:14] 500 500 500 500 500 ... ## .. ..- attr(*, "dimnames")=List of 2 ## .. .. ..$ : chr [1:5] "Y1" "Y2" "Y3" "Y4" ... ## .. .. ..$ : chr [1:14] "Sample Size" "Mean" "Variance" "Skewness" ... ## ..- attr(*, "class")= chr [1:2] "mplus.sampstat" "list" ## $ covariance_coverage: list() ## $ summaries :Classes 'mplus.summaries' and 'data.frame': 1 obs. of 31 variables: ## ..$ Mplus.version : chr "8.4" ## ..$ Title : chr "this is an example of a piecewise growth model for a continuous outcome" ## ..$ AnalysisType : chr "GENERAL" ## ..$ DataType : chr "INDIVIDUAL" ## ..$ Estimator : chr "ML" ## ..$ Observations : num 500 ## ..$ NGroups : num 1 ## ..$ NDependentVars : num 5 ## ..$ NIndependentVars : num 0 ## ..$ NContinuousLatentVars: num 3 ## ..$ Parameters : num 14 ## ..$ ChiSqM_Value : num 5.24 ## ..$ ChiSqM_DF : num 6 ## ..$ ChiSqM_PValue : num 0.513 ## ..$ ChiSqBaseline_Value : num 1588 ## ..$ ChiSqBaseline_DF : num 10 ## ..$ ChiSqBaseline_PValue : num 0 ## ..$ LL : num -3706 ## ..$ UnrestrictedLL : num -3704 ## ..$ CFI : num 1 ## ..$ TLI : num 1 ## ..$ AIC : num 7440 ## ..$ BIC : num 7499 ## ..$ aBIC : num 7455 ## ..$ RMSEA_Estimate : num 0 ## ..$ RMSEA_90CI_LB : num 0 ## ..$ RMSEA_90CI_UB : num 0.054 ## ..$ RMSEA_pLT05 : num 0.929 ## ..$ SRMR : num 0.01 ## ..$ AICC : num 7441 ## ..$ Filename : chr "ex6.11.out" ## ..- attr(*, "filename")= chr "ex6.11.out" ## $ invariance_testing : list() ## $ parameters :List of 1 ## ..$ unstandardized:Classes 'mplus.params' and 'data.frame': 34 obs. of 6 variables: ## .. ..$ paramHeader: chr [1:34] "I.|" "I.|" "I.|" "I.|" ... ## .. ..$ param : chr [1:34] "Y1" "Y2" "Y3" "Y4" ... ## .. ..$ est : num [1:34] 1 1 1 1 1 0 1 2 2 2 ... ## .. ..$ se : num [1:34] 0 0 0 0 0 0 0 0 0 0 ... ## .. ..$ est_se : num [1:34] 999 999 999 999 999 999 999 999 999 999 ... ## .. ..$ pval : num [1:34] 999 999 999 999 999 999 999 999 999 999 ... ## .. ..- attr(*, "filename")= chr "input_files/ex6.11.out" ## $ class_counts : list() ## $ indirect : list() ## $ residuals : list() ## $ tech1 : list() ## $ tech3 : list() ## $ tech4 : list() ## $ tech7 : list() ## $ tech8 :List of 1 ## ..$ psr:Classes 'mplus.psr.data.frame' and 'data.frame': 0 obs. of 0 variables ## ..- attr(*, "class")= chr [1:2] "mplus.tech8" "list" ## $ tech9 : list() ## ..- attr(*, "class")= chr [1:2] "mplus.tech9" "list" ## $ tech10 : list() ## $ tech12 : list() ## ..- attr(*, "class")= chr [1:2] "mplus.tech12" "list" ## $ tech15 :List of 1 ## ..$ conditional.probabilities: chr(0) ## ..- attr(*, "class")= chr [1:2] "mplus.tech15" "list" ## $ fac_score_stats : list() ## ..- attr(*, "class")= chr [1:2] "mplus.facscorestats" "list" ## $ gh5 : list() ## - attr(*, "class")= chr [1:2] "mplus.model" "list" ## - attr(*, "filename")= chr "input_files/ex6.11.out" ``` --- But note that each section is labelled: ```r names(output) ``` ``` ## [1] "input" "warnings" "errors" ## [4] "data_summary" "sampstat" "covariance_coverage" ## [7] "summaries" "invariance_testing" "parameters" ## [10] "class_counts" "indirect" "residuals" ## [13] "tech1" "tech3" "tech4" ## [16] "tech7" "tech8" "tech9" ## [19] "tech10" "tech12" "tech15" ## [22] "fac_score_stats" "gh5" ``` So we can explore each in turn. Here's the sample statistics: ```r output$sampstat ``` ``` ## $univariate.sample.statistics ## Sample Size Mean Variance Skewness Kurtosis Minimum Maximum %Min %Max ## Y1 500 0.443 1.367 0.035 -0.359 -2.639 3.613 0.2 0.2 ## Y2 500 1.584 1.732 0.070 -0.150 -2.277 5.758 0.2 0.2 ## Y3 500 2.595 2.318 0.027 -0.225 -1.458 7.146 0.2 0.2 ## Y4 500 4.535 2.491 -0.045 -0.004 -0.255 9.106 0.2 0.2 ## Y5 500 6.535 3.269 0.026 0.182 1.538 12.294 0.2 0.2 ## 20% 40% Median 60% 80% ## Y1 -0.566 0.132 0.373 0.725 1.483 ## Y2 0.509 1.213 1.520 1.895 2.748 ## Y3 1.188 2.200 2.583 2.999 3.956 ## Y4 3.272 4.051 4.462 4.827 5.904 ## Y5 5.000 6.175 6.626 7.042 7.931 ## ## attr(,"class") ## [1] "mplus.sampstat" "list" ``` --- Here are the model parameters: ```r output$parameters ``` ``` ## $unstandardized ## paramHeader param est se est_se pval ## 1 I.| Y1 1.000 0.000 999.000 999.000 ## 2 I.| Y2 1.000 0.000 999.000 999.000 ## 3 I.| Y3 1.000 0.000 999.000 999.000 ## 4 I.| Y4 1.000 0.000 999.000 999.000 ## 5 I.| Y5 1.000 0.000 999.000 999.000 ## 6 S1.| Y1 0.000 0.000 999.000 999.000 ## 7 S1.| Y2 1.000 0.000 999.000 999.000 ## 8 S1.| Y3 2.000 0.000 999.000 999.000 ## 9 S1.| Y4 2.000 0.000 999.000 999.000 ## 10 S1.| Y5 2.000 0.000 999.000 999.000 ## 11 S2.| Y1 0.000 0.000 999.000 999.000 ## 12 S2.| Y2 0.000 0.000 999.000 999.000 ## 13 S2.| Y3 0.000 0.000 999.000 999.000 ## 14 S2.| Y4 1.000 0.000 999.000 999.000 ## 15 S2.| Y5 2.000 0.000 999.000 999.000 ## 16 S1.WITH I -0.029 0.049 -0.589 0.556 ## 17 S2.WITH I 0.059 0.036 1.642 0.101 ## 18 S2.WITH S1 -0.031 0.026 -1.184 0.237 ## 19 Means I 0.462 0.052 8.955 0.000 ## 20 Means S1 1.071 0.030 35.945 0.000 ## 21 Means S2 1.957 0.030 64.314 0.000 ## 22 Intercepts Y1 0.000 0.000 999.000 999.000 ## 23 Intercepts Y2 0.000 0.000 999.000 999.000 ## 24 Intercepts Y3 0.000 0.000 999.000 999.000 ## 25 Intercepts Y4 0.000 0.000 999.000 999.000 ## 26 Intercepts Y5 0.000 0.000 999.000 999.000 ## 27 Variances I 0.985 0.100 9.827 0.000 ## 28 Variances S1 0.240 0.037 6.408 0.000 ## 29 Variances S2 0.219 0.042 5.265 0.000 ## 30 Residual.Variances Y1 0.394 0.077 5.156 0.000 ## 31 Residual.Variances Y2 0.525 0.043 12.185 0.000 ## 32 Residual.Variances Y3 0.501 0.068 7.354 0.000 ## 33 Residual.Variances Y4 0.483 0.048 10.089 0.000 ## 34 Residual.Variances Y5 0.559 0.103 5.424 0.000 ``` --- `summaries` is a data frame containing model fit indices: ```r names(output$summaries) ``` ``` ## [1] "Mplus.version" "Title" "AnalysisType" ## [4] "DataType" "Estimator" "Observations" ## [7] "NGroups" "NDependentVars" "NIndependentVars" ## [10] "NContinuousLatentVars" "Parameters" "ChiSqM_Value" ## [13] "ChiSqM_DF" "ChiSqM_PValue" "ChiSqBaseline_Value" ## [16] "ChiSqBaseline_DF" "ChiSqBaseline_PValue" "LL" ## [19] "UnrestrictedLL" "CFI" "TLI" ## [22] "AIC" "BIC" "aBIC" ## [25] "RMSEA_Estimate" "RMSEA_90CI_LB" "RMSEA_90CI_UB" ## [28] "RMSEA_pLT05" "SRMR" "AICC" ## [31] "Filename" ``` We can access individual items with `$`: .pull-left[ ```r output$summaries$Observations ``` ``` ## [1] 500 ``` ```r output$summaries$AIC ``` ``` ## [1] 7440.342 ``` ] .pull-right[ ```r output$summaries$ChiSqM_Value ``` ``` ## [1] 5.244 ``` ```r output$summaries$Filename ``` ``` ## [1] "ex6.11.out" ``` ] --- You can use these estimates directly in RMarkdown documents. For example, this text: <code> ``` The model included \`r fit$Observations` participants. The AIC and BIC were \`r fit$AIC` and \`r fit$BIC`, respectively. ``` </code> Would become: <code> The model included 500 participants. The AIC and BIC were 7440.342 and 7499.346, respectively. </code> --- ## Using `texreg` to produce tables The `texreg` package can take Mplus output from `readModels` and produce regression tables: .pull-left[ ```r library(texreg) cfa <- readModels("cfa.out") screenreg(cfa, single.row = TRUE, ci.force = TRUE, ci.test = NA, stars = 0) ``` <br> <br> Use `screenreg` for console output, `texreg` for LaTeX, `htmlreg` for HTML. Read more [here](https://cran.r-project.org/web/packages/texreg/vignettes/texreg.pdf). ] .pull-right[ ``` ## Reading model: cfa.out ``` ``` ## ## ==================================== ## input ## ------------------------------------ ## Y1<-F1 1.00 [ 1.00; 1.00] ## Y2<-F1 1.12 [ 0.93; 1.32] ## Y3<-F1 1.02 [ 0.85; 1.19] ## Y1<-Intercepts -0.02 [-0.15; 0.10] ## Y2<-Intercepts 0.03 [-0.10; 0.15] ## Y3<-Intercepts 0.04 [-0.09; 0.16] ## F1<->F1 0.91 [ 0.66; 1.15] ## Y1<->Y1 1.06 [ 0.87; 1.25] ## Y2<->Y2 0.80 [ 0.60; 0.99] ## Y3<->Y3 1.01 [ 0.82; 1.20] ## ==================================== ``` ] --- class: middle, center, inverse # Going further... --- # Running models in parallel Mplus can make use of multiple CPU cores with the `PROCESSORS` option. But sometimes we'd rather run **many instances of Mplus in parallel** (rather than a single instance using multiple cores). For example, a model that doesn't benefit from increasing `PROCESSORS` but that we need to run 1000 times. For this, we can use `furrr`: .pull-left[ ```r install.packages("furrr") library(furrr) ``` `furrr` provides parallel equivalents to `map` and `walk`, all starting with the `future_` prefix: * `future_map` * `future_walk` ] .pull-right[ .center[ ![:scale 200px](images/furrr.png) ]] --- ## Example using `furrr` We saw earlier how we could use `walk` and `runModels` to run all input files in a directory: ```r # Create a list containing the paths of the input files all_models <- dir_ls("models", glob = "*.inp") # Repeatedly run Mplus for each input file in the list walk(all_models, runMplus) ``` This will happen sequentially π€. To run this in parallel π: ```r # Load library, specify number of CPU cores to use library(furrr) plan(multisession, workers = 24) # Run in parallel future_walk(all_models, runMplus) ``` --- ## Summarising many models with `map` `readModels` returns a list containing different part of an output file. For a single model, we can access the required statistics directly: ```r output$summaries$AIC ``` To summarise many models, `map` can help. -- ```r # First, run models in Mplus for 2-8 latent classes: walk(dir_ls("input_files", regexp = "lca_.*inp"), runModels) ``` ```r # Then import each output file into R: fit <- readModels("input_files", filefilter = "lca_.*") ``` ``` ## Reading model: input_files/lca_2.out ## Reading model: input_files/lca_3.out ## Reading model: input_files/lca_4.out ## Reading model: input_files/lca_5.out ## Reading model: input_files/lca_6.out ## Reading model: input_files/lca_7.out ## Reading model: input_files/lca_8.out ``` --- We now have a list (`fit`) containing each model. For example, here's the first model: ```r fit[[1]] ``` ``` ## Estimated using MLR ## Number of obs: 2000, number of (free) parameters: 25 ## ## Fit Indices: ## ## CFI = NA, TLI = NA, SRMR = NA ## RMSEA = NA, 90% CI [NA, NA], p < .05 = NA ## AIC = 49889.395, BIC = 50029.417 ## NULL ``` We can access specific elements manually: ```r fit[[1]]$summaries$AIC ``` ``` ## [1] 49889.39 ``` But this doesn't scale well. --- Instead, we can use `map`: ```r model_fit <- map_dfr(fit, ~ select(.x$summaries, Classes = NLatentClasses, AIC, aBIC, LL)) ``` .small[π€ For each model, select the "summaries" element, and from this, select the required columns. Then combine all the models into single data frame.] .pull-left[.small[ ```r ggplot(model_fit, aes(x = Classes, y = aBIC)) + geom_point() + geom_line() ``` ![](index_files/figure-html/unnamed-chunk-60-1.png)<!-- --> ]] .pull-right[.small[ ```r kable(model_fit) ``` | Classes| AIC| aBIC| LL| |-------:|--------:|--------:|---------:| | 2| 49889.39| 49949.99| -24919.70| | 3| 47975.19| 48057.60| -23953.60| | 4| 46222.40| 46326.63| -23068.20| | 5| 45661.36| 45787.40| -22778.68| | 6| 45229.21| 45377.07| -22553.61| | 7| 44959.39| 45129.06| -22409.70| | 8| 44763.12| 44954.61| -22302.56| ]] --- # What about `lavaan`? .pull-left[ 1. Sometimes, you just need Mplus. 2. Most of what we've covered (`str_glue`, `map`, `walk`) can be applied to `lavaan`. `tidy` from the `broom` package takes the place of `readModels`. ] .pull-right[ ![](images/elephant.jpg) ] --- .small[ ```r library(lavaan) library(broom) mediators <- c("M1", "M2", "M3") fit <- map(mediators, function(m) { model <- str_glue("# direct effect Y ~ c*X # mediator {m} ~ a*X Y ~ b*{m} # indirect effect (a*b) ab := a*b # total effect total := c + (a*b)") return(sem(model, data = dat)) }) names(fit) <- mediators ``` ] .pull-left[ .small[ ```r fit %>% map_dfr(tidy, .id = "mediator") %>% filter(label %in% c("ab", "total")) %>% select(mediator, label, estimate, std.error) %>% kable() ``` ]] .pull-right[ .small[ |mediator |label | estimate| std.error| |:--------|:-----|---------:|---------:| |M1 |ab | 0.1362914| 0.0830406| |M1 |total | 0.4829034| 0.1446811| |M2 |ab | 0.1928951| 0.0913675| |M2 |total | 0.4829034| 0.1446811| |M3 |ab | 0.3325815| 0.0996009| |M3 |total | 0.4829034| 0.1446811| ]] --- class: inverse, middle, center # Wrapping up... --- # Summary .large[ 1. Removing manual steps from our analyses is good for efficiency and reproducibility. 2. MplusAutomation offers tools to fit models in Mplus and import the fitted model into R. .small[(They've done the hard work so you don't have to π).] 3. This can be combined with `purrr` to flexibly generate, fit, and summarise Mplus models. 4. This workflow can be applied to `lavann` (and any other packages where models are specified in plain text). ] --- ## Things we didn't have time to cover .large[ 1. A complete example in RMarkdown 2. What happens if `readModels` doesn't extract what you want? 3. Running Mplus on a computing cluster, like Rosalind ] --- class: inverse, center, middle # Thanks for listening π <br> <br> # Questions? π --- class:middle, center *Slides created with [xaringan](https://slides.yihui.org/xaringan/).* <br> *emojis supplied by [`library(emo)`](https://github.com/hadley/emo).*