R for Health Data Research

Beginner, Intermediate and Advanced

Dr Ewan Carr

Department of Biostatistics & Health Informatics
King’s College London

Course materials 📚

All lectures and practicals can be found at the link below:

  • You should have received a password via email.

  • Updated throughout the course and available for at least one month after the final session.

Submit your questions 🙋

Click the “Submit a question” link to send me questions during the week.

Pre-course videos and homework 📝

The sessions are spread over four weeks, allowing time for the material to sink in before moving on.

  • Some sessions may have a short video to watch before the session.

    These will be posted on the HDR UK Futures platform.

  • Some sessions contain extra ‘homework’ exercises. If you want more practice, have at go at these in your own time.

Learning a new programming language is hard.


Especially when you’re used to another way of working
(e.g., Stata, SPSS or Python).


I tried to learn R five times before it stuck.

Expectations 💡

  • Expect confusion. It’s inevitable; you’re not doing anything wrong.
  • Mistakes are normal, even after many years of using R.
  • Allow time; things will take (much) longer at first.
  • It can help to use R for a specific project, before switching completely.

After 16 years of using R, I still:

  • Regularly make mistakes
  • Have to look things up, check the documentation
  • Spend forever tweaking code, and running endless tests, only to discover in the end that the entire problem was caused by a single comma out of place 😭

What this course is about ✅

  • Building confidence in using R for health data science
  • Learning to write, run, and understand R code
  • Working with complex, real-world data — importing, tidying, transforming, and summarising
  • Developing efficient, reproducible workflows in R
  • Moving toward modern, professional programming practice

What this course is not about ❌

  • Statistical modelling or inference
  • Hypothesis tests, regression, or prediction modelling
  • Detailed coverage of specific statistical packages

I’ll touch on these where relevant, but we won’t teach advanced statistics here.

This is a large, online class with a mix of lectures and practical sessions.


During the lectures 🗣️

  • Post questions in the chat.

    (You can message me directly, or the whole class).

  • Listen or type along — whatever works best for you.

During the practicals 💻

  • Post questions in the chat or put a hand up.
  • You’re welcome to unmute and share screen.
  • I’ll use messages in the chat to check progress.
  • I’ll go through the exercises at the end.

After the sessions 🕐

  • Review material and lecture recordings.
  • If you want, have a go at the homework.
  • Send me your questions via the form.

Every question is helpful

Someone else is almost certainly
wondering the same thing.

Learning R is a journey

“If I take one more step, it’ll be the farthest away from home I’ve ever been.”

Coding with generative AI

Good uses ✅

  • Quick help with debugging
  • Explaining why code fails
  • Generating toy examples or simulated data

Caution ⚠️

  • When it affects real-world decisions (e.g., patient care, service delivery)
  • When you don’t yet understand the problem
  • Confident-sounding but wrong answers
  • Outdated or odd R code (especially in fast-moving packages)
    Code you can’t explain is code you can’t maintain

Why I think we’ll still be coding in 5 years’ time

  • As with statistics, the hard part is communication and judgement: framing the problem and choosing the right approach (not typing)
  • Much of health data science is low-volume, high-importance work
  • Analyses need to be traceable and reproducible
  • Writing (and reading) code is part of the thinking
  • Writing code is fun