There has been several implementations of research compendium in the
R ecosystem already: rrtools
, rcompendium
,
template
, manuscriptPackage
, and
ProjectTemplate
. The idea of use_rang()
is not
to create a new format. Instead, use_rang()
can be used to
enhance your current research compendium.
If you would like to create a new research compendium, you can either
use any of the aforementioned formats; or to use
create_turing()
to create a structure suggested by The
Turing Way. However, the idea is that use_rang()
, a
usethis
-style function, is a general function that can work
well with any structure. Just like rang
in general,
use_rang()
is designed with interoperability in mind.
Case 1: Create a Turing-style research compendium
create_turing()
can be used to create a general research
compendium structure. The function generates an example structure like
this:
.
├── bibliography.bib
├── CITATION
├── code
│ ├── 00_preprocess.R
│ └── 01_visualization.R
├── data_clean
├── data_raw
│ └── penguins_raw.csv
├── figures
├── .here
├── inst
│ └── rang
│ └── update.R
├── Makefile └── paper.Rmd
More information about this can be found in the Turing Way. But in general:
- Raw data should be in the directory
data_raw
- Scripts should be in the directory
code
, preferably named in the execution order. - The scripts should generate intermediate data files in
data_intermediate
, figures infigures
. - After the code execution, a file for the manuscript is written with
literate programming techniques. In this case,
Paper.Rmd
.
The special part is inst/rang/update.R
. Running this
script does the following things:
- It scans the current directory for all R packages used.
- It creates the infrastructure for building the Docker image to run the code.
- It caches all R packages.
As written in the file, you should edit this script to cater for your
own needs. You might also need to run this multiple time during the
project lifecycle. You can also use the Makefile
included
to pull off some of the tasks. For example, you can run
make update
to run inst/rang/update.R
. We
highly recommend using GNU Make.
The first step is to run inst/rang/update.R
. You can
either run it by Rscript inst/rang/update.R
or
make update
. It will determine the snapshot date, scan the
current directory for R dependencies, determine the dependency graph,
generate Dockerfile
, and cache R packages.
After running it, you should have Dockerfile
at the root
level. In inst/rang
, you should have rang.R
and cache
. Now, you can build the Docker image. We
recommend using GNU Make and type make build
(or
docker build -t yourprojectimg .
). And launch the Docker
container (make launch
or
docker run --rm --name "yourprojectcontainer" -ti yourprojectimg
).
Another idea is to launch a Bash shell (make bash
or
docker run --rm --name "yourprojectcontainer" --entrypoint bash -ti yourprojectimg
).
Let’s assume you take this approach.
Inside the container, you will get all your files. And that container should have all the dependencies installed and you can run all the scripts right away. Let’s say
Rscript code/00_preprocess.R
Rscript code/01_visualization.R
Rscript -e "rmarkdown::render('paper.Rmd')"
You can copy any artefact generated inside the container from another shell instance.
docker cp yourprojectcontainer:/paper.pdf ./
Case 2: Enhance an existing research compendium
Oser et al. shared their data as a zip file on OSF. You can obtain a copy using
osfr
.
Rscript -e "osfr::osf_download(osfr::osf_retrieve_file('https://osf.io/y7cg5'))"
unzip meta-analysis\ replication\ files.zip
cd meta-analysis
Suppose you want to use Apptainer to reproduce this research. At the root level of this compendium, run:
Rscript -e "rang::use_rang(apptainer = TRUE)"
This compendium is slightly more tricky because we know that there is
one undeclared GitHub package. You need to edit
inst/rang/update.R
yourself. In this case, you also want to
fix the snapshot_date
. Also, you know that “texlive” is not
needed.
pkgs <- as_pkgrefs(here::here())
pkgs[pkgs == "cran::dmetar"] <- "MathiasHarrer/dmetar"
rang <- resolve(pkgs,
snapshot_date = "2021-08-11",
verbose = TRUE)
apptainerize(rang, output_dir = here::here(), verbose = TRUE, cache = TRUE,
post_installation_steps = c(recipes[["make"]], recipes[["clean"]]),
insert_readme = FALSE,
copy_all = TRUE,
cran_mirror = cran_mirror)
You can also edit Makefile
to give the project a handle.
Maybe “oser” is a good handle.
handle=oser
.PHONY: update build launch bash daemon stop export
Similar to above, we first run make build
to build the
Apptainer image. As the handle is “oser”, it generates an Apptainer
image called “oserimg.sif”.
Similar to above, you can now launch a bash shell and render the RMarkdown file.
make bash
Rscript -e "rmarkdown::render('README.Rmd', output_file = 'output.html')"
exit
Upon you exit, you have “output.html” in your host machine. You don’t need to transfer the file from the container. Please note that this feature is handy but can also have a negative impact to reproducibility.
What to share?
It is important to know that there are at least two levels of reproducibility: 1) Whether your computational environment can be reproducibly reconstructed, and 2) Whether your analysis is reproducibility. The discussion of reproducibility usually conflates the two. We want to focus on the 2nd goal.
If your goal is to ensure other researchers can have a compatible
computational environment that can (re)run your code, The Turing Way recommends
that one should share the research compendium and the container images,
not just the recipes e.g. Dockerfile
or
container.def
. There are many moving parts during the
reconstruction, e.g. whether the source Docker image is available and
usable. As long as Docker or Apptainer support the same image format (or
allow upgrade the current format), sharing the images is the most future
proof method.