Reproducible workflows in R

Practical

Author
Affiliation

Dr Ewan Carr

Department of Biostatistics & Health Informatics
King’s College London

Published

July 25, 2025

Welcome

This practical will bring together the steps covered in the lecture by:

  1. Building a reproducible R pipeline
  2. Committing your project to GitHub
  3. Restoring it from GitHub

With the Women’s EURO 2025 final approaching this weekend, we’ll build a reproducible pipeline to predict England’s chance of winning the final against Spain, using information on their performance over the past year.

You will:

  • Build models to predict England’s chance of winning their next match.
  • Simulate the likely outcome of the final.
  • Use here for file paths and renv to lock package versions.
  • Create a Git repository, version your code with git, and push it to GitHub.
  • Make the project fully reproducible and shareable.

Setup

Software

To complete this practical, you’ll need:

  • R (≥ 4.3)
  • RStudio
  • Git, used for version control; or
  • GitHub Desktop, a user-friendly way to interact with Git and GitHub.

GitHub Desktop includes Git, so you don’t need to install Git separately if you use it. However, we recommend installing Git directly to ensure compatibility with RStudio’s Git features.

💡If you’re new to the command line, use GitHub Desktop

GitHub Desktop provides a simple interface for common tasks like committing, pushing, and pulling code — no terminal required. It’s a good starting point if you’re unfamiliar with the command line.

Installing Git

Git is often pre-installed on macOS. You can check by typing in the terminal:

git --version

If it’s not installed, run:

xcode-select --install

You can optionally install GitHub Desktop, a graphical interface for Git and GitHub, by following the instructions here.

You can download Git for Windows from https://git-scm.com. Run the installer and accept default options, in particular:

“Use Git from the command line and also from 3rd-party software”

You can optionally install GitHub Desktop, a graphical interface for Git and GitHub, by following the instructions here.

Authenticating with GitHub

To push and pull from GitHub, you need to authenticate. There are two options:

  • Recommended for longer-term use.
  • Requires access to a command line.
  1. Generate a key (if you don’t have one):

    ssh-keygen -t ed25519 -C "your_email@example.com"
  2. Copy the key to your clipboard

pbcopy < ~/.ssh/id_ed25519.pub
cat ~/.ssh/id_ed25519.pub | clip
  1. Add it to GitHub:

See these instructions if you get stuck.

  • A good place to start if you’re new to Git.
  • When you first log in via GitHub Desktop, it stores a token securely.
  • You can clone, commit, push, and pull without dealing with passwords or keys.

1 Create a project in RStudio

Create a new RStudio Project and give it an appropriate name (e.g., euros-prediction):

FileNew ProjectNew Directory

2 Download the scripts and datasets

Right click the links below to download the required scripts and dataset:

Put these inside your project folder and recreate the structure shown below:

1-data/
├── raw/
│   └── fixtures.csv
└── clean/
2-scripts/
├── 01-clean.R
├── 02-analysis.R
└── 03-plot.R
outputs/

3 Initialise renv and install the required packages

In the R console, type:

1install.packages("renv")
2renv::init()
3renv::install()
4renv::snapshot()
1
Install the renv package. You only need to do this once—not for each project.
2
Initialise renv for the current project.
3
Install packages required for this project.
4
Save the current state into renv.lock.

Open the renv.lock lockfile to understand its contents.

4 Run the three scripts

Once packages are installed and renv is initialised, you’re ready to run the analysis.

Run 01-clean.R in RStudio. This script:

  1. Imports the fixtures.csv dataset from the 1-data/raw folder.
  2. Performs data cleaning, adding venue, Elo ratings, rest days, and form.
  3. Saves a cleaned dataset to 1-data/clean/fixtures.rds.

Run 02-analysis.R in RStudio. This script:

  1. Fits two Poisson regression models to predict:
    • England’s goals
    • Opponent’s goals
  2. Simulates 1000 match results
  3. Saves the probabilities to outputs/results.rds

Run 03-plot.R in RStudio. This script:

  1. Loads the saved probabilities (outputs/results.rds).
  2. Creates a bar chart of predicted win/draw/loss.
  3. Saves the figure to outputs/prediction_plot.png.

5 Putting it all together

Create a new script run.R at the project root with the contents:

run.R
library(here)
source(here("2-scripts", "01-clean.R"))
source(here("2-scripts", "02-analysis.R"))
source(here("2-scripts", "03-plot.R"))

This script uses source to run the three scripts sequentially, avoiding the need to run them separately.

Run the run.R script either by clicking Run or by typing at the console:

source("run.R")

6 Initialise the Git repository

We’ve now set up our project, initialised renv, and created a run.R script that automates our data cleaning and analysis.

In this section, we’ll initialise a new, empty Git repository. Git will allow us to track changes to our files over time and restore previous versions.

You can complete this section via the terminal or using a desktop application, such as GitHub Desktop.

Open a terminal in the project root and type:

git init
  • You can do this from within RStudio in the ‘Terminal’ pane.
  • If you haven’t yet installed Git, see the instructions above.
  1. Open GitHub Desktop.
  2. Go to FileAdd Local Repository….
  3. Click Choose… and select your existing project folder.
  4. You should see the prompt:

The directory does not appear to be a Git repository. Would you like to create a repository here instead?”

  1. Click “Create Repository”.

7 Commit files to the local repository

Having initialised the empty repository, we now need to add our files.

1git add 1-data/*
2git add 2-scripts/*
3git add run.R
4git add renv.lock
5git add euros-prediction.Rproj
6git commit -m "Initial commit"
1
Add all files in the ‘1-data’ folder.
2
Add all files in the ‘2-scripts’ folder.
3
Add the run.R script.
4
Add the renv.lock lockfile.
5
Add your RStudio Project file; change the name as appropriate.
6
Commit the new files with a short message (specified by -m)
  1. Open GitHub Desktop and select your repository.

  2. In the Changes tab, tick the checkboxes to stage the following:

    • All files in the 1-data/ folder
    • All files in the 2-scripts/ folder
    • The run.R file
    • The renv.lock file
  3. At the bottom left, enter a commit message: Initial commit

  4. Click Commit to main.

8 Create an empty repository on GitHub

  1. If you haven’t already, create an account on GitHub.com.

  2. Then, go to https://github.com/new and create a new, empty repository with an appropriate name (e.g., euros-prediction)

9 Connect our local repository to GitHub

We then need to connect your local repository with one you just created on GitHub.

In the terminal, you can do this by typing:

1git remote add origin git@github.com:username/repo.git
2git push origin main
1
Set the remote to point to the new repository on Github. You will need to replace the URL with the corresponding URL for your repository.
2
‘Push’ the local history to GitHub.
  1. On GitHub.com, create a new (empty) repository.

  2. In GitHub Desktop, go to RepositoryRepository Settings…

  3. Under Remote, click Add and enter:

    • Name: origin
    • URL: git@github.com:username/repo.git (Replace with the URL of your new repository)
  4. Click Save.

  5. Back in the main window, click Push origin (top bar) to upload your commits to GitHub.

10 Reproducing your analysis from GitHub

We’ll now test that we can recreate our analysis from the online repository. This ensures your project can be reliably re-run on another computer or by another user.

The steps involved are:

  1. ‘Clone’ the existing repository from GitHub.
  2. Restore the renv environment.
  3. Run the run.R script to repeat the analysis.

In the terminal:

1git clone https://github.com/username/your-repo.git
1
‘Clone’ the repository on GitHub to a local folder. Replace username and your-repo as appropriate.

Then open the RStudio Project, and at the R console:

renv::restore()
source("run.R")
  1. FileClone Repository…
  2. Choose your repository.

Then open the RStudio Project, and at the R console:

renv::restore()
source("run.R")

11 Adding a ‘README’ file

  1. Write a README.md1 file in your project folder. Briefly describe the analysis and steps needed to reproduce.
  2. Add and commit this file to your local repository.
  3. Push the changes to GitHub.

Once pushed, you can view the README.md on your repository page on GitHub.

Footnotes

  1. Read this if you’re not sure how to start.↩︎