library(here)
<- read_csv(
data here("data", "raw", "messy-data.csv")
)
Kingβs Open Research Summer School
July 25, 2025
My aim is to surface key tools and habits β youβll need to explore further on your own.
Reproducibility is necessary for open science, but not sufficient. Broader change in motivations, incentives, and culture is essential.
βResearch can be open and reproducible and still completely and obviously wrong.β
final_final_v3.csv
data/raw/YYYY-MM-DD
βββ data
β βββ raw
β βββ clean
βββ 1-cleaning
βββ 2-analysis
β βββ a-descriptives
β βββ b-models
β βββ c-processing
βββ 3-figures
βββ 4-writing
βββ README.txt
setwd()
is banned
𧨠Absolute file paths are brittle.
π Relative file paths are portable.
This works well with RStudio projects.
Write modular code; break tasks into functions or scripts (e.g., scripts cleaning, analysis, visualisation).
Follow a style guide1 and use a code formatter2.
Document with inline comments and README.md
files.
Use consistent file and variable naming conventions (e.g., snake_case
).
Use git.
Initialise the repository:
Then on GitHub, create a new repository and connect the remote repository to your local one:
Push your changes to GitHub
Then, repeat:
Create an account on GitHub.com
Download GitHub Desktop
Add some files. Press buttons, break things.
KEEP BACKUPS
README.md
fileA good README.md
helps others (and your future self) understand your project.
.gitignore
effectively
We need a way of capturing the state of your computing environment, such that you or others can recreate it later.
renv
renv
is an R package to manage and reproduce the exact package versions used in a project.
renv
renv
is an R package. So first, install the package (once):
Save the current state:
Then share renv.lock
with collaborator (via Git).
renv
lockfileWhen re-initialising a project (e.g., on a new computer, or as a collaborator):
R packages are just one part of
your computing environment.
Containers package your entire computing environment so it runs consistently everywhere.
This typically involves Docker or Singularity.
source()
Create a script that runs your other scripts:
Make is a tool that runs only the parts of your code that need updating, based on whatβs changed.
Makefile
clean.csv
) with dependencies (raw.csv
, cleaning.R
). If either change, the code is run.
plot.png
), with two dependencies (clean.csv
, analysis.R
).
Once youβve defined your Makefile, you can then run:
to re-run all necessary preceding steps.
targets
targets
is an R package for building reproducible workflows by tracking and running R functions instead of files.
Like Make, but it works at the level of R objects and code, not just scripts and outputs.
If you read one thingβ¦
This is not transparency nor openness. What I mean is research has sufficient documentation and justification to reduce error and empower others to make up their own minds about its value. Research should be intelligible. Access is not sufficient. Research can be replicable without being reasonable or correct. Materials and data can be open without being intelligible, and they can be partly closed while still being comprehensible.
Thank you for listening.
Slides and practical materials