R Studio for Research - all you need?

R Studio for Research - all you need?

Here, I will show you my current workflow to writing research papers in my studies. This is an approach to come up with a coherent and simple setup, which weaves together the literature and data related aspects of a project. In more detail, this approach tries to inter-connect the initial literature review and note taking (Notion), the easy creation of the paper’s references (Zotero) and all the necessary data analysis (R). Of course, we need to combine these resources to communicate our research results. Hence, these three threads are then combined with the help of R Markdown and bookdown into a research paper and presentation slides.

Photo by Christopher Gower on Unsplash

R-Studio

Often times for data analysis, R is my language of choice. It has a grand open source community behind it and with the tidyverse a simple syntax for the most common data wrangling. In addition ggplot2 is easy to use and produces beautiful graphs for your research paper.

Most importantly, R integrates seamlessly with the writing process thanks to R Markdown, knitrand additional libraries like bookdown. This is were R presents its real power for research. Let’s consider a typical issue in the writing process: In your final proof read you discover, that you have a typo in the x-axis label of one of your plots. If you are writing your paper in “Word” and use R (or Stata, SPSS, Matlab …) for your analysis, you are facing some painful steps:

  1. Re-run your analysis script,
  2. Adjust your plot’s code
  3. Export the new (hopefully correct) plot as an image file
  4. Copy and paste the image into Word
  5. Fidget with your word editor until you get the formatting right
  6. (And then re-adjust your new plot again, because it shifted your whole text around)

Here, R Markdown can help greatly. It allows you to weave together your analysis code and the text of your paper into one file. You decide to rename your x-axis label? No problem. Just change the name directly in the code of your plot - done. Even better: If you write your own plotting functions, you could also adjust e.g. the color schemes of all of your plots by changing one line of code. Not only is this more coherent and less error prone than doing it manually for every one of your graphics, it is also way faster and more flexible. Of course, important writing features like headers, footnotes and citations are implemented in R Markdown. Furthermore, R Studio 1.4 adds a “Visual Editor” making writing less code heavy and more friendly for beginners. For more information on the capability of R Markdown visit their website.

To make writing in R even more of a charm, I am using bookdown (basically an extension for R Markdown, which even allows for writing whole books) and a template, which is derived from Lucy D’Agostino McGowan’s fantastic blog post: “One year to dissertate”. The great thing about this template is, that its formatting is based on a Latex preamble. Basically, you can adjust the formatting to any of your font size, line spacing and reference style needs by adapting the preamble.

McGowan’s template can be found here, and my derivation of it can be found here (basically adds an easy way to also generate Latex slides).

Zotero

I collect my literature library in Zotero. It is an open source software, which allows you to easily build up your database of research papers. While there are many other software packages which allow you to do this, Zotero has a direct integration with R Studio. You will be able to automatically cite any paper in R, which is filed away in your Zotero data base - neat! Furthermore, Zotero has an active community, which creates many new add-ins with handy features (we will use one of these add-ins below).

I prefer to use a paper’s DOI, to easily import it into Zotero - in 90% of the cases this is enough to generate a complete reference for the respective paper (authors, title, journal name, year, …).

Notion

For creative note taking I use Notion. Basically, it is just a note taking app with a gorgeous UI. Notion’s advantage over other note taking software is, that it allows you to quickly link ideas and make connections between the content you collect. Check out Notion here - give it time and get used to it, it is good.

Okay, why are we using exactly Notion now for this research workflow? Well, as you might have guessed, it integrates well with the software mentioned above, namely Zotero. The very handy community plug-in “Notero” syncs your Zotero database with a designated page in Notion (make sure to visit their GitHub, they also provide a great Notion template for this!).

Why is this good? Because it keeps track of all the references, which you have collected in Zotero, in Notion. This gives you the opportunity to easily profit from Notion’s creative advantages when taking notes on your literature. Connecting different research papers with each other, summarising their findings, automatically comparing papers in tables, creating your own Kanban board to keep track of your reading progress or inserting a paper’s graphs into your notes - the possibilities are endless. Luckily, with “Notero” you will never miss (or duplicate) any entries as there (automatically) is one note-taking page for every paper in your database. This structure compares strongly with scribbled key-points on print outs and hastily tipped digital notes in some txt-file on your desktop.

The Big-Picture

Below, I tried to visualise the general idea behind this setup. Notice, how the existing literature and your data are the “only” two external inputs. Your intellectual work (mainly) takes place in Notion, while your data analysis is (of course) conducted in R. The core of this whole mechanism is in fact R Markdownas it ties together all these different preprocessing steps into one paper.pdf (or slides.pdf for a presentation).

Flowchart of the projects setup. Software in green, Inputs in yellow.

Flowchart of the projects setup. Software in green, Inputs in yellow.

Some other Things I learned

  • Search for literature en bulk, then download the papers and pass their DOIs to Zotero. It helps, if you come up with a file naming convention. For me I use: “{first author}_{year}”. Similarly for the respective bibtex key, I use: {first author}{year}. This helps to not loose track of a file and the respective citation.
  • Take notes in Notion and make use of the provided property fields in the “advanced template”, so you can display information at a glance in tables. Link papers to each other, if they refer to each other, or if you see a connection between them.
  • If your computer has a “dictation” feature. Dictate your first draft of text with it. If find that it is easier to just speak and “dump” your thoughts onto paper, rather than to struggle with finding the “right” words. Editing is easier than writing from scratch, hence get something on paper first.

Obviously, this might not work for you. Still I hope that this might provide some inspiration for your own workflow. If you find room for improvement (and I am sure you will) feel free to let me know!

Trial and Error

This post is part of the Trial and Error series. In this collection of blog posts, we focus on workflows to make your work in data science more elegant and simple.
Author

Finn

Posted on

2022-03-05

Updated on

2022-03-09

Licensed under