NIHR Maudsley BRC
23 April 2026
Department of Biostatistics & Health Informatics
King’s College London
14:00–15:00 Concepts and demonstrations
15:00–16:00 Practice

Your answers to the survey…
43%
have never used Git
83%
use filenames to version their work
83%
use R in their analysis workflow
52%
use a Mac
45%
want the fundamentals
42%
want practical workflow help
39%
mention GitHub and collaboration
55%
main blocker is not knowing where to start
Git is software that tracks changes to your files.
analysis_final_v4_Ewan.R, THIS_ONE_WORKS.R
.git folder that stores the history.
You can use Git entirely locally, without GitHub.
Different interfaces, but doing the same thing underneath.
We can send and receive changes from the remote. This is referred to as “pushing” and “pulling”.
1. Complete history
2. Collaboration
3. Open research
README, and historyTracked vs. untracked
Git only watches files you’ve added. Everything else is invisible to it.
Commit
A saved snapshot of your project — with a message, author, and timestamp.
Staging
Pick which changes go into the next commit — so each one is deliberate.
Git creates the hidden .git folder inside your project. Your files stay where they are.
Copy your project files in, then tell Git which ones to watch.
Git calls this staging. You are choosing which tracked files should go into the next commit.
Your first commit gives the repository a starting point and a message explaining what it contains.
Edit a tracked file and save it on your computer. Git notices the change and waits for you to decide what to do next.
This is staging again. Choose which edits should go into the next commit.
Make another commit with a clear message. From that point on, the pattern is edit, choose, commit, repeat.
Continue working, and repeat the process as needed.
A commit gives you a saved version of your staged changes, together with its message, author, and timestamp.

We’ll work with a downloaded example/ project folder, which includes:
README.mdscripts/clean.Rdata/admissions.csvoutputs/model_predictions.csvOur goal is to:
data/admissions-example → Create Repository
example/ into the new admissions-example folder
data → Ignore all files in data/; repeat for outputs — GitHub Desktop writes .gitignore automatically
README.md, scripts/, .gitignore
scripts/clean.R, add a comment at the top, save
Initialise, add, and commit:
Make a change to scripts/clean.R, then:
Use git init to initialise the repository, then open it in GitHub Desktop:
Then in GitHub Desktop: File → Add Local Repository → choose the folder.
From there, review the Changes panel, ignore anything that shouldn’t be tracked, and make your first commit — exactly as we just did.
Warning
If you use File → New Repository inside a folder that already contains files, GitHub Desktop will stage everything automatically. That is usually not what you want.
The repository we just created only lives on our computer.
We can work entirely offline, committing changes and building up a history using Git.
However, to share it with others, we need to connect it to a remote repository.
We’re using GitHub, but there are several online hosting platforms for Git repositories (e.g. GitLab, Bitbucket).
The remote repository is a linked copy of your local repository that is hosted online (e.g., GitHub).
Going online adds a shared copy of the repository. It does not replace the local one on your computer.
Tip
GitHub Desktop wraps this into a simple publish flow. The command line shows the same ideas as separate steps.
Windows
File → Options → Accounts
Mac
GitHub Desktop → Settings → Accounts
Use the GitHub command line client: gh auth login
Or follow GitHub’s SSH setup guide
Full instructions are on the workshop setup page.
admissions-example, set it to Private
.gitignore, and licence unticked
README.md locally and save
If you can describe the change in one sentence, it’s probably a good commit.
A commit message should explain what changed and why.
Short summary line describing what changed
An optional longer description, explaining why it
changed and adding any important details.
The description should explain the problem being addressed, the approach taken, and any assumptions or trade-offs involved.
Add cross-validation to model pipeline
Implements 5-fold CV for logistic regression model. This replaces the previous single train/test split to reduce variance in performance estimates.
Check for missing values before model fitting
Introduces a check to stop execution if key variables contain missing values. This prevents silent failures and unexpected model behaviour.
Monday's workChangesUpdateFinal versionFixFix, properly this time.gitignore to ignore files you don’t want trackedCreate a file .gitignore in the project folder. Use this file to omit files you don’t want under version control. For example:
Never commit patient data.
Add any data folders to your .gitignore file before any data files are added.
Warning
Once sensitive data is committed to a public repository, deleting the file later is not enough.
Git history can still contain the earlier version, so pushing first and deleting later does not undo the exposure.
Researchers using UK Biobank data committed data folders locally, pushed them to GitHub, and then exposed them publicly.
A branch is a separate line of work within the same repository.
Until now, we’ve been thinking about a single history. This is usually called the main branch.
Git lets us have multiple branches:
On the command line, use git switch -c to create a new branch and move to it immediately. Here, I’ve named the branch add-figure-2:
On GitHub Desktop, choose Branch→New Branch…
Note that new commits now go on that branch, not main.

If it works, you can merge it back into the main branch. Merging means bringing the commits from that branch into the main line of the project
If not, you delete the branch and continue on main.
git switch -c add-readme-note
README.md, save, and commit the change
git switch main
git switch add-readme-note
Everyone has a local copy. Push and pull to the remote.
Everyone has a local copy. Push and pull to the remote.
Never work directly on main. Create a feature branch for each piece of work.
cleaning-fix-missing-valuesadd-figure-2update-inclusion-criteriamain stableA pull request is a way of asking to merge a branch into main.
implement-xgboost-model)mainA pull request allows others to review and approve your code, or suggest changes. Pull requests are also referred to as PRs.
When you start work
mainWhile working
When finished
A fork is your own copy of someone else’s repository — useful when you don’t have write access to the original.
We’ll try this in the collaborative exercise.
5-minute break
git config — three lines to run onceuser.name / user.emailinit.defaultBranchmain instead of the old master default when you create new repositories from the command line
GitHub Desktop usually sets the important parts for you. These commands are mainly for command-line users who want Git configured explicitly.
.gitconfig live?| Platform | Location | How to open |
|---|---|---|
| Mac | ~/.gitconfig |
open ~/.gitconfig or any text editor |
| Windows | C:\Users\YourName\.gitconfig |
Notepad, VS Code, or notepad $HOME\.gitconfig in PowerShell |
Note
.gitconfig is a hidden file. To show it: Mac — Cmd+Shift+. in Finder Windows — View → Show → Hidden items
You can edit it directly in a text editor, or use git config --global commands. Both do the same thing.