More efficient SEM via R? 🎉

class: center, middle, inverse, title-slide

# More efficient SEM via R? 🎉
### Dr Ewan Carr @ewancarr
### Department of Biostatistics & Health Informatics 
### SEMantics 16th March 2021

---

class: middle, center

.tiny{font-size: 50%}
.big{font-size: 140%}
.very-small{font-size: 60%}
</style>

.big[
<code>
ewancarr.github.io/semantics-mplusautomation
</code>
]

---

# Three minute version ⌚

1. `MplusAutomation` is an R package that allows us to programatically interact
   with Mplus without leaving R 🙌

2. We can create input files (`.inp`), run them in Mplus, and extract the
   outputs.

3. This is great for  **reproducibility.**

### This talk

**Why?** The joys of Mplus 🤦, and the problem we're trying to solve.

**How?** What is `MplusAutomation`? How does it work?
* Formatting your data for Mplus
* Generating input files
* Extracting output

**Going further** 🚀 RMarkdown, `purrr` 🐈, parallel processing, `lavaan`...

???

Who is this for?  
Anyone interested in fitting SEM models  
R users, future R users, Mplus users  
Obviously using R, but will hopefully be approachable to all

---
class: middle, center

![:scale 85%](images/tweet.png)

---
class: middle, center

.pull-left[
 
# Do we still need this? 
]
.pull-right[.center[![](images/hug_annotated.png)]]

---
class: middle, center, inverse

## What is Mplus?
 
 
## What is is the problem we're trying to solve?

---
class: spaced
# The joys of using Mplus 😢

.pull-left[
1. Write an input file.

2. Click `Run`.

3. Review the output file, all 20 pages... 🙈

4. Scroll down to the section of output you actually need.

5. Keep scrolling...

6. Copy the required outputs into a spreadsheet?

7. Reformat by hand.

8. Copy into your manuscript.

]

.pull-right[
.center[![](images/mplus.png)

.content-box-blue[
To reproduce your analysis, just repeat steps 1 to 8. 😆
]

]]

---

.content-box-blue[
This is bad enough for a single input file 😒. But some
applications require many input files.

* Simulation studies 😞
* Latent class analysis  😢
* Tools that build on Mplus 😭
]

![](images/mplus_trees.png)

---
class: middle

#### [Discernment of Mediator and Outcome Measurement in the PACE trial](https://www.medrxiv.org/content/10.1101/2021.01.25.21250436v1)

.pull-left[

.small[
*Ewan Carr, Silia Vitoratou, Trudie Chalder, and Kimberley Goldsmith*
]

![](images/pace_medrxiv.png)
.tiny[<https://www.medrxiv.org/content/10.1101/2021.01.25.21250436v1>]
]

.pull-right[.center[

![:scale 150%](images/pace.png) ]]

---
class: center
### 🐭 Reasons to avoid point-and-click analyses 🚫

.center[
.pull-left[
### 1. Efficiency 
![](images/xkcd_worth_the_time.png)
.tiny[<https://xkcd.com/1205/>]
]

.pull-right[
### 2. Reproducibility 
![](images/vadar_redrawn.png)
]
]

---
class: inverse, center, middle

# What is MplusAutomation?

---

`MplusAutomation` is an R package that talks to Mplus.

.content-box-yellow[.small[
MplusAutomation is a package for R that **facilitates complex latent variable
analyses in Mplus** involving comparisons among many models and parameters.

More specifically, MplusAutomation provides tools to accomplish **3 objectives**:

1. To **create and manage Mplus syntax** for groups of related models;
2. To **automate the estimation** of many models;
3. And to **extract**, aggregate, and compare fit statistics, parameter
   estimates, and ancillary model outputs.

]]

.pull-left[

.small[
It's been around since 2010, regularly updated. To get started:

* Read the [paper](https://www.tandfonline.com/doi/full/10.1080/10705511.2017.1402334) and [vignette](https://cran.r-project.org/web/packages/MplusAutomation/vignettes/Vignette.pdf) 📗
* Take a look at the [GitHub page](https://github.com/michaelhallquist/MplusAutomation) and [forum](https://groups.google.com/forum/#!forum/mplusautomation) 👀 
* Keep listening! 👂

If you use it, cite it!   
.very-small[See `citation("MplusAutomation")`.]
]]

.pull-right[

![](images/mplusautomation_journal.png)

]

---
## How does it work?

![](images/flowchart.png)

---
## Show me! 🤔

.large[

```r
library(MplusAutomation)
library(texreg)

# Specify the model
cfa <- "
 DATA: FILE = data/ex5.1.dat;
 VARIABLE: NAMES = y1-y6;
 USEVAR = y1-y3;
 MODEL: f1 BY y1-y3;
 "

# Create the input file
writeLines(cfa, "cfa.inp")

# Run in Mplus
runModels("cfa.inp")

# Read the output file
fit <- readModels("cfa.out")
```
]

---

.pull-left[

```r
htmlreg(fit,
  summaries = c("Observations",
                "CFI",
                "SRMR"))
```
]

.pull-right[
<table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;color: #000000;border-top: 2px solid #000000;">
<caption>CFA, single factor</caption>
<thead>
<tr>
<th style="padding-left: 5px;padding-right: 5px;">&nbsp;</th>
<th style="padding-left: 5px;padding-right: 5px;">input</th>
</tr>
</thead>
<tbody>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Y1<-F1</td>
<td style="padding-left: 5px;padding-right: 5px;">1.00 [ 1.00; 1.00]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y2<-F1</td>
<td style="padding-left: 5px;padding-right: 5px;">1.12 [ 0.93; 1.32]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y3<-F1</td>
<td style="padding-left: 5px;padding-right: 5px;">1.02 [ 0.85; 1.19]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y1<-Intercepts</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.02 [-0.15; 0.10]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y2<-Intercepts</td>
<td style="padding-left: 5px;padding-right: 5px;">0.03 [-0.10; 0.15]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y3<-Intercepts</td>
<td style="padding-left: 5px;padding-right: 5px;">0.04 [-0.09; 0.16]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">F1<->F1</td>
<td style="padding-left: 5px;padding-right: 5px;">0.91 [ 0.66; 1.15]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y1<->Y1</td>
<td style="padding-left: 5px;padding-right: 5px;">1.06 [ 0.87; 1.25]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y2<->Y2</td>
<td style="padding-left: 5px;padding-right: 5px;">0.80 [ 0.60; 0.99]</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Y3<->Y3</td>
<td style="padding-left: 5px;padding-right: 5px;">1.01 [ 0.82; 1.20]</td>
</tr>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Observations</td>
<td style="padding-left: 5px;padding-right: 5px;">500</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">CFI</td>
<td style="padding-left: 5px;padding-right: 5px;">1.00</td>
</tr>
<tr style="border-bottom: 2px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">SRMR</td>
<td style="padding-left: 5px;padding-right: 5px;">0.00</td>
</tr>
</tbody>
</table>
]
 
---
## Again! 🤩

```r
map(`2:6`,
  ~ str_glue("DATA:      FILE         = ex7.3.dat;
              VARIABLE:  NAMES        = u1-u4 x1-x10;
                         USEVARIABLES = u1-u4;
                         CATEGORICAL  = u1-u4;
                         CLASSES     = c (`{.x}`);
                 ANALYSIS: TYPE = MIXTURE;") %>%
  write_lines(., str_glue("lca/`{.x}`class.inp")))

runModels("lca")
fit <- readModels("lca")
```

.small[

```r
map_dfr(fit, "summaries") %>%
    select(NLatentClasses, Observations, AIC, aBIC, LL, Entropy) %>%
    kable()
```

| NLatentClasses| Observations|      AIC|     aBIC|       LL| Entropy|
|--------------:|------------:|--------:|--------:|--------:|-------:|
|              2|          500| 1948.488| 1957.853| -965.244|   0.904|
|              3|          500| 1953.115| 1967.682| -962.557|   0.810|
|              4|          500| 1962.883| 1982.653| -962.441|   0.745|
|              5|          500| 1972.883| 1997.856| -962.441|   0.579|
|              6|          500| 1982.883| 2013.059| -962.441|   0.562|
]

---
class: spaced
The rest of this talk...

### Getting started

1. Getting data into Mplus
2. Generating input files
4. Running models 
3. Extracting and formatting output

### Going further

4. Running models in parallel with `furrr`
5. Summarising many models with `map`
5. What about `lavaan`?

### Q&A

---
# A brief interlude into R

(Sorry).

* `MplusAutomation` is great, but where possible I want to use generic
  functions. (I'd rather .deep-orange[learn something once] that can be applied
  everywhere).

* Much of what I'll talk about today builds on two R functions: `str_glue` and `map`.

.pull-left[ .center[ ![:scale 200px](images/stringr.png) ]]
.pull-right[ .center[ ![:scale 200px](images/purrr.png) ]]

---
# 🧵 `str_glue`

`str_glue` does [string interpolation](https://en.wikipedia.org/wiki/String_interpolation).

```r
x <- 23

str_glue("The number is {x}.")
```

```
## The number is 23.
```

```r
today <- ymd("2021-03-16")

str_glue("Today is {wday(today, label = TRUE, abbr = FALSE)}.")
```

```
## Today is Tuesday.
```

We can also provide a list (or vector) of inputs:

```r
str_glue("This is number {1:3}")
```

```
## This is number 1
## This is number 2
## This is number 3
```

---
# 🗾 `map`

`map` applies a function to a list.

```r
map_dbl(1:6, sqrt)
```

```
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490
```

You can use an existing function (e.g. `sqrt`) or a temporary function:

```r
map_dbl(1:10, `~ .x^2`)
map_dbl(1:10, `function(x) { x^2 }`)    # These two lines are equivalent
```

```
##  [1]   1   4   9  16  25  36  49  64  81 100
```

---

# 🚶‍♀️ `walk`

If you don't need the outputs, use `walk`:

```r
walk(list_of_things, some_function)
```

Useful where your function carries out some action (e.g. running a model in
Mplus) but where you don't need to store the result of this action.

.right[Back to `MplusAutomation`...]

---
class: middle, center

![](images/stages_1.png)

---
# 💾 Getting data into Mplus

.pull-left[Mplus requires your data to be in a specific format.

I have .teal[no idea] what this is.  🤷
]

.pull-right[
.middle[

```r
DATA:        FILE = `ex3.1.dat`;
VARIABLE:    NAMES = y1 x1 x3;
MODEL:       y1 ON x1 x3;
```
]]

In the past, I've used `stata2mplus`. Now, `prepareMplusData`:

```r
# Load some data
data(HolzingerSwineford1939)

# Reformat for Mplus
prepareMplusData(HolzingerSwineford1939, "hs1939.dat")
```

There are a few options (see `?prepareMplusData` for details) — e.g. to
drop variables or only write data on changes.

```r
# Importing a Stata dataset
df <- read_dta("stata_dataset.dta")
prepareMplusData(df, "df.dat")
```

---

💡 `prepareMplusData` will also generate input files:

```r
inp <- prepareMplusData(HolzingerSwineford1939, "hs1939.dat")
```

```
## 
## Factor: school 
## Conversion:
##        level number
##  Grant-White      1
##      Pasteur      2
## TITLE: Your title goes here
## DATA: FILE = "hs1939.dat";
## VARIABLE: 
## NAMES = id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7 x8 x9; 
## MISSING=.;
```

```r
print(inp)
```

```
## [1] "TITLE: Your title goes here\n"                                         
## [2] "DATA: FILE = \"hs1939.dat\";\n"                                        
## [3] "VARIABLE: \n"                                                          
## [4] "NAMES = id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7 x8 x9; \n"
## [5] "MISSING=.;\n"
```

---
class: middle, center

![](images/stages_2.png)

---

# Generating input files

`MplusAutomation` provides the `createModels` function. I prefer to use generic
R functions instead (e.g. `str_glue` and `map`).

### Example 1: A static input file

```r
# Write your input file
inp <- "
TITLE: this is an example of a simple linear
 regression for a continuous observed
 dependent variable with two covariates
DATA: FILE IS ex3.1.dat;
VARIABLE: NAMES ARE y1 x1 x3;
MODEL: y1 ON x1 x3;"

# Save it
writeLines(inp, "ex3_1.inp")

# Run it
runModels("ex3_1.inp")
```

---
### Example 2: A dynamic input file

What if we wanted to repeat this model for several outcomes?

```r
inp <- "DATA: FILE IS ex3.1.dat;
 VARIABLE: NAMES ARE y1 y2 y3 x;
 MODEL: `y1` ON x;"
```

We can use `str_glue` and `walk` for this:

```r
walk(c("y1", "y2", "y3"), function(x) {
 # Generate the input file for this outcome
 input_file <- str_glue("DATA: FILE IS ex3.1.dat;
 VARIABLE: NAMES ARE y1 y2 y3 x;
 MODEL: `{x}` ON x;")
 # Save
 writeLines(input_file, paste0(`x`, ".inp")) })
```

.pull-left[
`y1.inp`
.code-bg-green[
```bash
DATA:     FILE IS ex3.1.dat;
VARIABLE: NAMES ARE y1 y2 y3 x;
MODEL:    y1 ON x;
```
]]

.pull-right[
`y2.inp`
.code-bg-green[
```bash
DATA:     FILE IS ex3.1.dat;
VARIABLE: NAMES ARE y1 y2 y3 x;
MODEL:    y2 ON x;
```
]]

---
class: spacing
### Example 3: Another dynamic example

Suppose we wanted to fit a latent class model with between 2 and 8 classes.

```r
n_classes <- 2:8
```

```
## 2 3 4 5 6 7 8
```

```r
walk(`n_classes`, function(x) {
 input_file <- str_glue("DATA: FILE = ex4.4.dat;
 VARIABLE: NAMES = y1-y8;
 CLASSES = c(`{x}`);
 ANALYSIS: TYPE = MIXTURE;")
 writeLines(input_file, str_glue("lca_`{x}`.inp")) })
```

.content-box-blue[
This process can be summarised as:

1. Make a **list of inputs** (e.g. dependent variables, numbers of classes)  
2. Update the model using **placeholders**.

This can be as simple 😃 or complex 😦 as you need.
]

---

Suppose we want to fit this latent class model to 100 datasets 😱.

```r
datasets <- dir_ls("data", glob = "d*.dat")
n_classes <- 2:8 
```

And maybe, consider two predictors of class membership, `x1` and `x2`.

```r
predictors <- c("x1", "x2")
```
--

We can produce list containing **all combinations** of these input parameters with `cross`:

```r
all_combinations <- cross(list(d = datasets,
 nc = n_classes,
 p = predictors))
```

This produces a list with 7 `$\times$` 2 `$\times$` 100 = 1400 items.

---

.pull-left[

Here's the first item:

```r
all_combinations[[1]]
```

```
## $d
## data/dataset1.dat
## 
## $nc
## [1] 2
## 
## $p
## [1] "x1"
```

And the 90th item:

```r
all_combinations[[90]]
```

```
## $d
## data/ex7.3.dat
## 
## $nc
## [1] 7
## 
## $p
## [1] "x2"
```
]

.pull-right[
Here's the second item:

```r
all_combinations[[2]]
```

```
## $d
## data/dataset2.dat
## 
## $nc
## [1] 2
## 
## $p
## [1] "x1"
```

.content-box-green[
Note that the parameters inside each item are
labelled:

`d` refers to the dataset

`nc` refers to the number of classes

`p` refers to the predictors
]
]

---

We can then use these in our placeholders:

```r
walk(all_combinations, function(x) {
 input_file <- str_glue("
 DATA: FILE = {`x$d`};
 VARIABLE: NAMES = u1-u4 x1 x2;
 CLASSES = c ({`x$nc`});
 ANALYSIS: TYPE = MIXTURE;
 ALGORITHM = INTEGRATION;
 MODEL: %OVERALL%
 c ON {`x$p`};
 %c#1%
 [u1$1-u4$1];
 %c#2%
 [u1$1-u4$1];
 OUTPUT: TECH1 TECH8;")
 # Generate a filename and save
 fn <- str_glue("{path_ext_remove(path_file(x$d))}_{nc}_{p}.inp")
 writeLines(input_file, fn)
}
```

This will generate 1400 input files, e.g.

```
dataset1_2_x1.inp
dataset1_3_x1.inp
dataset1_4_x1.inp
dataset1_5_x1.inp
...
```

---
class: middle, center

![](images/stages_3.png)

---

# Run models with `runModels`

.pull-left[
**1)** You can run a single model:

```r
runModels("cfa.inp")
```

**3)** You can print outputs:

```r
runModels("cfa.inp",
          showOutput=TRUE)
```
]

.pull-right[

**2)** Or a folder of models:

```r
runModels("a_folder_of_models")
```

```
##
## Running model: cfa.inp
## System command: "mplus" "cfa.inp"
## 
## Mplus VERSION 8.4 (Linux)
## MUTHEN & MUTHEN
## 
## Running input file 'cfa.inp'...
## 
## Beginning Time: 18:57:53
## Ending Time: 18:57:53
## Elapsed Time: 00:00:00
## 
## Output saved in 'cfa.out'.
```
]

---

A few points:

* `runModels` will try and guess where Mplus is installed. 
* You can optionally specify `filefilter` to run a subset models.

If you're fitting many models, `walk` may be more robust:

```r
# Create a list of input files
to_run <- dir_ls("models", glob = "*.inp")

# Pass each file to runModels
walk(to_run, runModels)
```

(This also brings other benefits, like easy parallelization, as we'll see
later).

---
class: center, middle

![](images/stages_4.png)

---
# Parse outputs with .small[`readModels`]

.center[
![](images/from_to.png)
]

---

```
Mplus VERSION 8.4 (Linux)
MUTHEN & MUTHEN
03/16/2021  03:00 PM

INPUT INSTRUCTIONS

DATA:   FILE  = ../data/ex5.1.dat;
          VARIABLE:  NAMES = y1-y6;
                     USEVAR = y1-y3;
          MODEL:     f1 BY y1-y3;

INPUT READING TERMINATED NORMALLY

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         500

Number of dependent variables                                    3
Number of independent variables                                  0
Number of continuous latent variables                            1

Observed dependent variables

Continuous
   Y1          Y2          Y3

Continuous latent variables
   F1
```

---

```
Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20

Input data file(s)
  ../data/ex5.1.dat

Input data format  FREE

UNIVARIATE SAMPLE STATISTICS

UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS

Variable/         Mean/     Skewness/   Minimum/ % with                Percentiles
        Sample Size      Variance    Kurtosis    Maximum  Min/Max      20%/60%    40%/80%    Median

Y1                   -0.022      -0.050      -3.958    0.20%      -1.236     -0.395     -0.039
             500.000       1.971      -0.363       3.588    0.20%       0.328      1.240
     Y2                    0.026      -0.139      -5.193    0.20%      -1.096     -0.321      0.044
             500.000       1.949       0.107       3.703    0.20%       0.385      1.211
     Y3                    0.035       0.169      -3.907    0.20%      -1.138     -0.398     -0.060
             500.000       1.953      -0.135       4.159    0.20%       0.303      1.173

THE MODEL ESTIMATION TERMINATED NORMALLY
```

---

```
MODEL FIT INFORMATION

Number of Free Parameters                        9

Loglikelihood

H0 Value                       -2450.381
          H1 Value                       -2450.381

Information Criteria

Akaike (AIC)                    4918.763
          Bayesian (BIC)                  4956.694
          Sample-Size Adjusted BIC        4928.128
            (n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                              0.000
          Degrees of Freedom                     0
          P-Value                           0.0000

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.000
 90 Percent C.I. 0.000 0.000
 Probability RMSEA <= .05 0.000

CFI/TLI

CFI                                1.000
          TLI                                1.000

Chi-Square Test of Model Fit for the Baseline Model

Value                            363.624
          Degrees of Freedom                     3
          P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.000
```

---

```
MODEL RESULTS

Two-Tailed
                    Estimate       S.E.  Est./S.E.    P-Value

F1       BY
    Y1                 1.000      0.000    999.000    999.000
    Y2                 1.125      0.099     11.378      0.000
    Y3                 1.020      0.089     11.474      0.000

Intercepts
    Y1                -0.022      0.063     -0.354      0.723
    Y2                 0.026      0.062      0.410      0.682
    Y3                 0.035      0.062      0.555      0.579

Variances
    F1                 0.908      0.125      7.257      0.000

Residual Variances
    Y1                 1.063      0.096     11.114      0.000
    Y2                 0.799      0.100      8.002      0.000
    Y3                 1.009      0.095     10.586      0.000
```

---

```
QUALITY OF NUMERICAL RESULTS

Condition Number for the Information Matrix              0.595E-01
       (ratio of smallest to largest eigenvalue)

Beginning Time:  12:05:35
        Ending Time:  12:05:35
       Elapsed Time:  00:00:00

MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA  90066

Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Support: Support@StatModel.com

---

## `readModels`

`readModels` can take a single output file:

```r
output <- readModels("input_files/ex6.11.out")
```

Or an entire folder:

```r
all_outputs <- readModels("input_files")
```

We get back a list containing different elements of the output:

```r
names(output)
```

```
##  [1] "input"               "warnings"            "errors"             
##  [4] "data_summary"        "sampstat"            "covariance_coverage"
##  [7] "summaries"           "invariance_testing"  "parameters"         
## [10] "class_counts"        "indirect"            "residuals"          
## [13] "tech1"               "tech3"               "tech4"              
## [16] "tech7"               "tech8"               "tech9"              
## [19] "tech10"              "tech12"              "tech15"             
## [22] "fac_score_stats"     "gh5"
```

---

This can be a little daunting:

```r
str(output)
```

```
## List of 23
##  $ input              :List of 4
##   ..$ title   : chr "this is an example of a piecewise growth model for a continuous outcome"
##   ..$ data    :List of 1
##   .. ..$ file: chr "ex6.11.dat"
##   ..$ variable:List of 1
##   .. ..$ names: chr "y1-y5"
##   ..$ model   : chr [1:5] "\ti s1 | y1@0 y2@1 y3@2 y4@2 y5@2;" "  \ti s2 | y1@0 y2@0 y3@0 y4@1 y5@2;" "" "" ...
##   ..- attr(*, "class")= chr [1:2] "mplus.inp" "list"
##   ..- attr(*, "start.line")= int 6
##   ..- attr(*, "end.line")= int 15
##  $ warnings           : list()
##   ..- attr(*, "class")= chr [1:2] "mplus.warnings" "list"
##  $ errors             : list()
##   ..- attr(*, "class")= chr [1:2] "mplus.errors" "list"
##  $ data_summary       : list()
##   ..- attr(*, "class")= chr [1:2] "mplus.data_summary" "list"
##  $ sampstat           :List of 1
##   ..$ univariate.sample.statistics: num [1:5, 1:14] 500 500 500 500 500 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:5] "Y1" "Y2" "Y3" "Y4" ...
##   .. .. ..$ : chr [1:14] "Sample Size" "Mean" "Variance" "Skewness" ...
##   ..- attr(*, "class")= chr [1:2] "mplus.sampstat" "list"
##  $ covariance_coverage: list()
##  $ summaries          :Classes 'mplus.summaries' and 'data.frame':	1 obs. of  31 variables:
##   ..$ Mplus.version        : chr "8.4"
##   ..$ Title                : chr "this is an example of a piecewise growth model for a continuous outcome"
##   ..$ AnalysisType         : chr "GENERAL"
##   ..$ DataType             : chr "INDIVIDUAL"
##   ..$ Estimator            : chr "ML"
##   ..$ Observations         : num 500
##   ..$ NGroups              : num 1
##   ..$ NDependentVars       : num 5
##   ..$ NIndependentVars     : num 0
##   ..$ NContinuousLatentVars: num 3
##   ..$ Parameters           : num 14
##   ..$ ChiSqM_Value         : num 5.24
##   ..$ ChiSqM_DF            : num 6
##   ..$ ChiSqM_PValue        : num 0.513
##   ..$ ChiSqBaseline_Value  : num 1588
##   ..$ ChiSqBaseline_DF     : num 10
##   ..$ ChiSqBaseline_PValue : num 0
##   ..$ LL                   : num -3706
##   ..$ UnrestrictedLL       : num -3704
##   ..$ CFI                  : num 1
##   ..$ TLI                  : num 1
##   ..$ AIC                  : num 7440
##   ..$ BIC                  : num 7499
##   ..$ aBIC                 : num 7455
##   ..$ RMSEA_Estimate       : num 0
##   ..$ RMSEA_90CI_LB        : num 0
##   ..$ RMSEA_90CI_UB        : num 0.054
##   ..$ RMSEA_pLT05          : num 0.929
##   ..$ SRMR                 : num 0.01
##   ..$ AICC                 : num 7441
##   ..$ Filename             : chr "ex6.11.out"
##   ..- attr(*, "filename")= chr "ex6.11.out"
##  $ invariance_testing : list()
##  $ parameters         :List of 1
##   ..$ unstandardized:Classes 'mplus.params' and 'data.frame':	34 obs. of  6 variables:
##   .. ..$ paramHeader: chr [1:34] "I.|" "I.|" "I.|" "I.|" ...
##   .. ..$ param      : chr [1:34] "Y1" "Y2" "Y3" "Y4" ...
##   .. ..$ est        : num [1:34] 1 1 1 1 1 0 1 2 2 2 ...
##   .. ..$ se         : num [1:34] 0 0 0 0 0 0 0 0 0 0 ...
##   .. ..$ est_se     : num [1:34] 999 999 999 999 999 999 999 999 999 999 ...
##   .. ..$ pval       : num [1:34] 999 999 999 999 999 999 999 999 999 999 ...
##   .. ..- attr(*, "filename")= chr "input_files/ex6.11.out"
##  $ class_counts       : list()
##  $ indirect           : list()
##  $ residuals          : list()
##  $ tech1              : list()
##  $ tech3              : list()
##  $ tech4              : list()
##  $ tech7              : list()
##  $ tech8              :List of 1
##   ..$ psr:Classes 'mplus.psr.data.frame' and 'data.frame':	0 obs. of  0 variables
##   ..- attr(*, "class")= chr [1:2] "mplus.tech8" "list"
##  $ tech9              : list()
##   ..- attr(*, "class")= chr [1:2] "mplus.tech9" "list"
##  $ tech10             : list()
##  $ tech12             : list()
##   ..- attr(*, "class")= chr [1:2] "mplus.tech12" "list"
##  $ tech15             :List of 1
##   ..$ conditional.probabilities: chr(0) 
##   ..- attr(*, "class")= chr [1:2] "mplus.tech15" "list"
##  $ fac_score_stats    : list()
##   ..- attr(*, "class")= chr [1:2] "mplus.facscorestats" "list"
##  $ gh5                : list()
##  - attr(*, "class")= chr [1:2] "mplus.model" "list"
##  - attr(*, "filename")= chr "input_files/ex6.11.out"
```

---

But note that each section is labelled:

```r
names(output)
```

So we can explore each in turn. Here's the sample statistics:

```r
output$sampstat
```

```
## $univariate.sample.statistics
##    Sample Size  Mean Variance Skewness Kurtosis Minimum Maximum %Min %Max
## Y1         500 0.443    1.367    0.035   -0.359  -2.639   3.613  0.2  0.2
## Y2         500 1.584    1.732    0.070   -0.150  -2.277   5.758  0.2  0.2
## Y3         500 2.595    2.318    0.027   -0.225  -1.458   7.146  0.2  0.2
## Y4         500 4.535    2.491   -0.045   -0.004  -0.255   9.106  0.2  0.2
## Y5         500 6.535    3.269    0.026    0.182   1.538  12.294  0.2  0.2
##       20%   40% Median   60%   80%
## Y1 -0.566 0.132  0.373 0.725 1.483
## Y2  0.509 1.213  1.520 1.895 2.748
## Y3  1.188 2.200  2.583 2.999 3.956
## Y4  3.272 4.051  4.462 4.827 5.904
## Y5  5.000 6.175  6.626 7.042 7.931
## 
## attr(,"class")
## [1] "mplus.sampstat" "list"
```

---

Here are the model parameters:

```r
output$parameters
```

```
## $unstandardized
##           paramHeader param    est    se  est_se    pval
## 1                 I.|    Y1  1.000 0.000 999.000 999.000
## 2                 I.|    Y2  1.000 0.000 999.000 999.000
## 3                 I.|    Y3  1.000 0.000 999.000 999.000
## 4                 I.|    Y4  1.000 0.000 999.000 999.000
## 5                 I.|    Y5  1.000 0.000 999.000 999.000
## 6                S1.|    Y1  0.000 0.000 999.000 999.000
## 7                S1.|    Y2  1.000 0.000 999.000 999.000
## 8                S1.|    Y3  2.000 0.000 999.000 999.000
## 9                S1.|    Y4  2.000 0.000 999.000 999.000
## 10               S1.|    Y5  2.000 0.000 999.000 999.000
## 11               S2.|    Y1  0.000 0.000 999.000 999.000
## 12               S2.|    Y2  0.000 0.000 999.000 999.000
## 13               S2.|    Y3  0.000 0.000 999.000 999.000
## 14               S2.|    Y4  1.000 0.000 999.000 999.000
## 15               S2.|    Y5  2.000 0.000 999.000 999.000
## 16            S1.WITH     I -0.029 0.049  -0.589   0.556
## 17            S2.WITH     I  0.059 0.036   1.642   0.101
## 18            S2.WITH    S1 -0.031 0.026  -1.184   0.237
## 19              Means     I  0.462 0.052   8.955   0.000
## 20              Means    S1  1.071 0.030  35.945   0.000
## 21              Means    S2  1.957 0.030  64.314   0.000
## 22         Intercepts    Y1  0.000 0.000 999.000 999.000
## 23         Intercepts    Y2  0.000 0.000 999.000 999.000
## 24         Intercepts    Y3  0.000 0.000 999.000 999.000
## 25         Intercepts    Y4  0.000 0.000 999.000 999.000
## 26         Intercepts    Y5  0.000 0.000 999.000 999.000
## 27          Variances     I  0.985 0.100   9.827   0.000
## 28          Variances    S1  0.240 0.037   6.408   0.000
## 29          Variances    S2  0.219 0.042   5.265   0.000
## 30 Residual.Variances    Y1  0.394 0.077   5.156   0.000
## 31 Residual.Variances    Y2  0.525 0.043  12.185   0.000
## 32 Residual.Variances    Y3  0.501 0.068   7.354   0.000
## 33 Residual.Variances    Y4  0.483 0.048  10.089   0.000
## 34 Residual.Variances    Y5  0.559 0.103   5.424   0.000
```

---

`summaries` is a data frame containing model fit indices:

```r
names(output$summaries)
```

```
##  [1] "Mplus.version"         "Title"                 "AnalysisType"         
##  [4] "DataType"              "Estimator"             "Observations"         
##  [7] "NGroups"               "NDependentVars"        "NIndependentVars"     
## [10] "NContinuousLatentVars" "Parameters"            "ChiSqM_Value"         
## [13] "ChiSqM_DF"             "ChiSqM_PValue"         "ChiSqBaseline_Value"  
## [16] "ChiSqBaseline_DF"      "ChiSqBaseline_PValue"  "LL"                   
## [19] "UnrestrictedLL"        "CFI"                   "TLI"                  
## [22] "AIC"                   "BIC"                   "aBIC"                 
## [25] "RMSEA_Estimate"        "RMSEA_90CI_LB"         "RMSEA_90CI_UB"        
## [28] "RMSEA_pLT05"           "SRMR"                  "AICC"                 
## [31] "Filename"
```

We can access individual items with `$`:

.pull-left[

```r
output$summaries$Observations
```

```
## [1] 500
```

```r
output$summaries$AIC
```

```
## [1] 7440.342
```

]
.pull-right[

```r
output$summaries$ChiSqM_Value
```

```
## [1] 5.244
```

```r
output$summaries$Filename
```

```
## [1] "ex6.11.out"
```
]

---

You can use these estimates directly in RMarkdown documents.

For example, this text:

<code>
```
 The model included \`r fit$Observations` participants.
 The AIC and BIC were \`r fit$AIC` and \`r fit$BIC`, respectively. 
```
</code>

Would become:

<code>
 The model included 500 participants. The AIC and BIC were
 7440.342 and 7499.346, respectively.
</code>

---

## Using `texreg` to produce tables

The `texreg` package can take Mplus output from `readModels` and produce regression tables:

.pull-left[

```r
library(texreg)
cfa <- readModels("cfa.out")
screenreg(cfa,
 single.row = TRUE,
 ci.force = TRUE,
 ci.test = NA,
 stars = 0)
```

Use `screenreg` for console output, `texreg` for LaTeX, `htmlreg` for HTML.

Read more [here](https://cran.r-project.org/web/packages/texreg/vignettes/texreg.pdf).

]

.pull-right[

```
## Reading model:  cfa.out
```

```
## 
## ====================================
## input 
## ------------------------------------
## Y1<-F1 1.00 [ 1.00; 1.00]
## Y2<-F1 1.12 [ 0.93; 1.32]
## Y3<-F1 1.02 [ 0.85; 1.19]
## Y1<-Intercepts -0.02 [-0.15; 0.10]
## Y2<-Intercepts 0.03 [-0.10; 0.15]
## Y3<-Intercepts 0.04 [-0.09; 0.16]
## F1<->F1 0.91 [ 0.66; 1.15]
## Y1<->Y1 1.06 [ 0.87; 1.25]
## Y2<->Y2 0.80 [ 0.60; 0.99]
## Y3<->Y3 1.01 [ 0.82; 1.20]
## ====================================
```

]

---
class: middle, center, inverse

# Going further...

---

# Running models in parallel

Mplus can make use of multiple CPU cores with the `PROCESSORS` option.

But sometimes we'd rather run **many instances of Mplus in parallel** (rather
than a single instance using multiple cores).

For example, a model that doesn't benefit from increasing `PROCESSORS` but that
we need to run 1000 times.

For this, we can use `furrr`:

.pull-left[

```r
install.packages("furrr")
library(furrr)
```

`furrr` provides parallel equivalents to `map` and `walk`, all starting with the `future_` prefix:

* `future_map`
* `future_walk`

]

.pull-right[ .center[ ![:scale 200px](images/furrr.png) ]]

---

## Example using `furrr`

We saw earlier how we could use `walk` and `runModels` to run all input files
in a directory:

```r
# Create a list containing the paths of the input files
all_models <- dir_ls("models", glob = "*.inp")

# Repeatedly run Mplus for each input file in the list
walk(all_models, runMplus)
```

This will happen sequentially 💤. To run this in parallel 🚀:

```r
# Load library, specify number of CPU cores to use
library(furrr)
plan(multisession, workers = 24)

# Run in parallel
future_walk(all_models, runMplus)
```

---

## Summarising many models with `map`

`readModels` returns a list containing different part of an output
file. For a single model, we can access the required statistics directly:

```r
output$summaries$AIC
```

To summarise many models, `map` can help.

```r
# First, run models in Mplus for 2-8 latent classes:
walk(dir_ls("input_files", regexp = "lca_.*inp"), runModels)
```

```r
# Then import each output file into R:
fit <- readModels("input_files", filefilter = "lca_.*")
```

```
## Reading model:  input_files/lca_2.out 
## Reading model:  input_files/lca_3.out 
## Reading model:  input_files/lca_4.out 
## Reading model:  input_files/lca_5.out 
## Reading model:  input_files/lca_6.out 
## Reading model:  input_files/lca_7.out 
## Reading model:  input_files/lca_8.out
```

---

We now have a list (`fit`) containing each model. For example, here's the first
model:

```r
fit[[1]]
```

```
## Estimated using MLR 
## Number of obs: 2000, number of (free) parameters: 25 
## 
## Fit Indices: 
## 
## CFI = NA, TLI = NA, SRMR = NA 
## RMSEA = NA, 90% CI [NA, NA], p < .05 = NA 
## AIC = 49889.395, BIC = 50029.417 
## NULL
```

We can access specific elements manually:

```r
fit[[1]]$summaries$AIC
```

```
## [1] 49889.39
```

But this doesn't scale well.

---

Instead, we can use `map`:

```r
model_fit <- map_dfr(fit,
 ~ select(.x$summaries,
 Classes = NLatentClasses,
 AIC, aBIC, LL))
```

.small[🤔 For each model, select the "summaries" element, and from this, select the
required columns. Then combine all the models into single data frame.]

.pull-left[.small[

```r
ggplot(model_fit, 
       aes(x = Classes, y = aBIC)) +
    geom_point() +
    geom_line()
```

![](index_files/figure-html/unnamed-chunk-60-1.png)
]]

.pull-right[.small[

```r
kable(model_fit)
```

| Classes|      AIC|     aBIC|        LL|
|-------:|--------:|--------:|---------:|
|       2| 49889.39| 49949.99| -24919.70|
|       3| 47975.19| 48057.60| -23953.60|
|       4| 46222.40| 46326.63| -23068.20|
|       5| 45661.36| 45787.40| -22778.68|
|       6| 45229.21| 45377.07| -22553.61|
|       7| 44959.39| 45129.06| -22409.70|
|       8| 44763.12| 44954.61| -22302.56|
]]

---
# What about `lavaan`?

.pull-left[
1. Sometimes, you just need Mplus.

2. Most of what we've covered (`str_glue`, `map`, `walk`) can be applied to
   `lavaan`.

`tidy` from the `broom` package takes the place of `readModels`.
]

.pull-right[
![](images/elephant.jpg)
]

---

.small[

```r
library(lavaan)
library(broom)

mediators <- c("M1", "M2", "M3")
fit <- map(mediators, function(m) {
 model <- str_glue("# direct effect
 Y ~ c*X
 # mediator
 {m} ~ a*X
 Y ~ b*{m}
 # indirect effect (a*b)
 ab := a*b
 # total effect
 total := c + (a*b)")
 return(sem(model, data = dat)) })
names(fit) <- mediators
```
]

.pull-left[
.small[

```r
fit %>%
    map_dfr(tidy,
            .id = "mediator") %>%
    filter(label %in% c("ab",
                        "total")) %>%
    select(mediator,
           label,
           estimate,
           std.error) %>%
    kable()
```
]]

.pull-right[
.small[

|mediator |label |  estimate| std.error|
|:--------|:-----|---------:|---------:|
|M1       |ab    | 0.1362914| 0.0830406|
|M1       |total | 0.4829034| 0.1446811|
|M2       |ab    | 0.1928951| 0.0913675|
|M2       |total | 0.4829034| 0.1446811|
|M3       |ab    | 0.3325815| 0.0996009|
|M3       |total | 0.4829034| 0.1446811|
]]

---
class: inverse, middle, center

# Wrapping up...

---
# Summary

.large[
1. Removing manual steps from our analyses is good for efficiency and
   reproducibility.

2. MplusAutomation offers tools to fit models in Mplus and 
   import the fitted model into R.

.small[(They've done the hard work so you don't have to 🙏).]

3. This can be combined with `purrr` to flexibly generate, fit, and summarise
   Mplus models.

4. This workflow can be applied to `lavann` (and any other packages where
   models are specified in plain text).
]

---
## Things we didn't have time to cover

.large[
1. A complete example in RMarkdown

2. What happens if `readModels` doesn't extract what you want?

3. Running Mplus on a computing cluster, like Rosalind
]

---
class: inverse, center, middle

# Thanks for listening 😃
 
 
# Questions? 🙋

---
class:middle, center

*Slides created with [xaringan](https://slides.yihui.org/xaringan/).*

*emojis supplied by [`library(emo)`](https://github.com/hadley/emo).*