An Introduction to R and RStudio

Setting up R for reproducible research

April 2025

What is R?

  • R is a powerful opensource programming language suited to mathematical and statistical computing, analysis and graphics.

  • RStudio provides a convenient integrated development environment (IDE) or interface for using R.

    • Download RStudio (Desktop Open Source Edition) from posit.co.
  • RStudio Cloud provides R and RStudio via a web browser. You can make a free account but the amount of time you can use the service for free is limited, and you must pay for usage beyond the limit. The cloud version may be an option if you have difficulties setting up the software on your own computer. Remember to copy any files you make from RStudio Cloud to your own computer so that you do not risk losing your work. RStudio Cloud can be found at posit.cloud.

  • Both R and RStudio (Desktop Open Source Edition) are free.

  • We will use R via RStudio in my courses to gather, describe, visualise and analyse financial data. We use Quarto to create reproducible research and analysis reports. In Portfolio Management, we analyse securities risk and returns, construct investment portfolios, and model investment signals. In Financial Data Workshop and Financial Econometrics (Graduate), we build econometric models of securities returns and analyse corporate finance data. In my seminar, students conduct their research and write their graduation paper following reproducible research principles.

  • For my courses, please install R and RStudio on your laptop and bring it to the classes in which we are scheduled to work on data analysis. Please ensure you can connect to the internet via the university WIFI or other means.

  • This guide shows screenshots for R/RStudio on MacOS. The Windows and Linux versions of R/RStudio are similar.

Opening RStudio for the first time

  • After you have installed R and RStudio, open RStudio. It should look something like this.

Set a default working directory for RStudio

  • From the Tools menu, open the RStudio’s Global Options.

  • From “General” set the default working directory to your documents or another convenient folder on your hard drive. You can also use some cloud services for your R files like Dropbox, however Google Drive and Microsoft Onedrive can cause problems.

  • The working directory tells R where to look for your files. The default directory you select will be where R looks for a file when you are not working in a “project”. Next we will create a project.

    • I set my default working directory to the syncd_r_data older on my Dropbox.
  • After you have set the default working directory, quit and reopen RStudio.

Projects in RStudio

  • A project allows you to keep data, code and documents for a particular task together. Projects help you to keep your files organised.

  • A project also allows you to customise the setup of R for the task or analysis you are performing.

  • You may have several different projects and and customise the way R is set up differently for each project.

    • I create a project for each course that I teach and each research project that I’m working on.
    • If you use R for several classes, your seminar or other purposes, you can create a project for each of these.
  • Each project will have its own folder, and separate files to save the data in your environment, your history of code executed in the console, and the way you have R configured.

  • You can see that at present, I do not have a project open as the menu title appears as “Project: (none)”. When you are working in a project, the title of the menu will include the project name.

  • Let’s make a new project from the Projects menu at the top-right of the screen. The projects menu allows you to create a new project, open an existing project, and contains a list of recent projects.

  • To create a new project in a new folder, click “New Directory”.

  • Select “New Project”.

  • Give your new project a name.

    • I suggest naming your project after the class in which you are using R or the research you will work on.
    • Your project working directory will be created within your default working directory unless you speficy a different location.
  • I use the name “ExampleProject” and I will create the project working directory within my default working directory called “syncd_r_data”.

  • Click “Create Project”.

  • Now that you have created your new project, your RStudio should similar to the image below.

  • Notice that now the project menu displays the name of your project, the file path shows the path to your project working directory, and the project file (ExampleProject.Rproj for me) is shown within your project working directory.

  • When you work in your project, RStudio will look for your files within your (project) working directory.

  • The project file (ExampleProject.Rproj for me) can be used to open the project.

  • Check the file path to your working directory using the code:
Show the code
getwd()

  • Set the “Project Options” for your new project from the project menu at the top-right of your screen.

  • Set the three options to “Yes”.

  • Setting the options to “Yes” will ensure your data and history of executed code will be saved when you quit RStudio, and restored when you open your project the next time.

  • Let’s test this by creating some objects. You will see the objects appear in your environment.
Show the code
a <- 2
b <- 3
c <- a + b
  • Quit and reopen RStudio, making sure you open your new project. If RStudio does not open in your new project, you can open your project from the projects menu.

  • You can see that RStudio has created two hidden files:

    • .RData holds the objects that are loaded in your environment.
    • .Rhistory holds the history of the code you have executed in the console.
  • You can see these files in the files pane of RStudio but usually not in your operating system.

  • The objects you created have been restored to your environment pane. You can toggle the appearance of the environment pane between list and grid view. I prefer grid.

  • The history of code that you executed in the console can be found in the history pane.

A very brief tour of the RStudio IDE

  • R is an object oriented language.

    • This means everything in R is considered an object. Each type of object has certain attributes and interacts with other objects in certain ways.
    • For example, a matrix of data is an object that can interact with other objects such as a scalar or another matrix, through arithmetic. A function is an object that transforms data stored as an object, such as a dataframe, in some predefined way.
    • Objects can interact with each other to change their attributes, or create other objects.
    • Some important types of objects are: vectors, lists, arrays, matrices, tables, data frames, functions.
  • The RStudio IDE has four panes.

  • Pane 1 is the source pane where you can open and work on various types of code files or documents.

    • I have opened a new R code file (Untitled1.R) from the New File menu indicated in the top left of the screen.
    • You also display objects in the source pane. In the screenshot below, I create a matrix called A in the console. The matrix A appears in the environment pane. Clicking on the symbol highlighted in the Environment pane opens our matrix A in the Source pane.
  • Pane 2 is the console where you can enter and execute code.

  • Pane 3 contains the environment and history tabs. The environment shows all objects that are loaded for use. The history tab shows the code you have executed in the console.

  • Pane 4 contains a number of tabs showing your folders and files, plots you have drawn, packages, a help window.

Packages

  • Packages extend the functionality of R.

  • Packages contain functions, data and documents. Packages are designed to perform certain specialised types of data manipulation, data description, analysis, estimation, visualisation, optimisation, document creation, etc.

  • Some packages are part of the default R installation, called “Base R”. There are a huge number of additional packages available that allow you to do almost any computational analysis with R. We can even make use of packages written for other languages, such as for Python using the “reticulate” package.

    • The Comprehensive R Archive Network holds over 22,000 currently maintained packages, and there are more packages available from developers’ sites in [GitHub}(https://github.com/){target=“_blank”}.
  • The packages pane shown above provides a list of installed packages and a menu for installing packages. The documentation and help for each package can be accessed by clicking on the package name in the packages list.

  • We will use several packages for the analyses in my courses. I will tell you which packages you need for each course.

Using packages

  • There are two steps to using a package.

    1. Install the package. This downloads the latest verson of the package from an R repository (CRAN) and saves the package on your computer.

    2. Load a package for use. When you want to use a package you must load it first using the library command.

Package installation

  • Packages can be installed via the RStudio packages menu or in the console using code.

  • To install a package using the menu, click the install button, type in the name of the package and hit install.

  • For example, in the image below, I install the package called “AER”.1

    • Usually, “Install from” and “Install to library” may be left as suggested by RStudio. It is best to install to a library located on your hard drive, not the cloud.

    • Note that R is case-sensitive, thus upper- and lower-case letters are interpreted as being different. Make sure you type the package name correctly.

1 The AER package contains functions and data to go with the applied econometrics textbook Kleiber and Zeileis (2008).

Kleiber, Christian, and Achim Zeileis. 2008. Applied Econometrics with R. 1st ed. New York: Springer US. https://doi.org/10.1007/978-0-387-77318-6.

  • Alternatively, you can install a package by running code in the console.

  • For example, I install the package called “dynlm” by typing the code below and pressing enter.2

2 The dynlm package is for dynamic linear regressions for time series data.

Show the code
install.packages("dynlm", dependencies = TRUE)

  • “Include dependencies” allows the automatic installation of other packages used by the package you wish to install. It is a good idea to include dependencies.
  • Package installation involves downloading the packages from the internet and installing them on your hard drive. You must be connected to the internet.

  • The installation process is usually fast, but may take some time for very large packages.

  • You will see messages displayed in the Console as the package are downloaded.

    • A message like “The downloaded source packages are in [folder location]” means the packages were successfully installed.

    • If you are asked “Do you want to install from sources the packages which need compilation? (Yes/no/cancel)” during package installation, you can (usually) answer “n” for the packages we use in my courses.

    • The warning “Warning in install.packages : package ‘xxxxxx’ is not available for this version of R” means either the package is not available, or more often, you have mistyped the package name.

    • If an installation results in a message like “Installation of package X had non-zero exit status” the package was not installed. Try again as it may just be an internet problem. If you still have a problem for packages used in one of my courses, contact me.

  • Packages need only be installed on your computer once, but they should be updated from time to time to make sure you are using the latest version of each package. This can be done by clicking the “Update” button in the packages pane.

Loading a package for use

  • A package must be loaded in R before it can be used.

  • Packages can be loaded by ticking the box to the left of the package name in the RStudio the packages pane, as shown in the example for the AER package below.

    • Clicking the box next to the AER package name in the Packages pane executed the library() function in the console. Messages were printed (in red) about loading the dependencies (other packages required by AER).

  • When you wish to use a package, it must be loaded using the library command. For example, to load the AER package:

  • A package can also be loaded using code. For example, the dynlm package is loaded using th code below.

Show the code
library(dynlm)
  • Executing the library command for dynlm in the console caused the loaded packages box to become ticked in the packages pane.

  • Another way to load a package is to tick the box next to the package in the Packages tab. The packages tab shows you all packages that you have installed.

Customise R startup using the .Rprofile file

  • The .Rprofile file executes code when R starts up.

  • By creating an .Rprofile file in a project’s working directory, you can customise the setup of R for your project as follows:

    • Automatically load the packages you use in the project.
    • Set the options of R.
  • You can use the .Rprofile file to automatically load the packages you will need to use for this course and to set some R options. Thus .Rprofile allows you to customise the start-up of R for your project.

    • Remember that the packages must already be installed on your computer to be loaded by the .Rprofile file.
  • Here is a simple example of an .Rprofile file.

    • You can add as many projects to auto-load as you need.

    • I set some of R’s options in my .Rprofile, including:

      • Change the prompt to appear as R>.
      • Set scipen to 999 to avoid output in scientific notation where possible.
      • Set the number of digits to the right of the decimal point in output to 4 where possible.
      • Set the width of output to 70 characters.
    • Note that the lines in green beginning with a # are comments. It is always a good idea to comment your code so it is easily understandable later by other people and yourself.

Show the code
# Example .Rprofile file.

# Auto-load the packages:
library(AER)
library(dynlm)

# Set R options:
options(prompt = "R> ", scipen = 999, digits = 4, width = 80)

Make the .Rprofile file

  • Make your .Rprofile file using the following steps.

  • Open a new R script file in the Source pane.

  • Your new R script file is shown below.

    • An R script file is a text file that is used to save code.

  • Copy the code from the example .Rprofile file above and paste it into your R script file.

  • Notice that the file’s name is red and is followed by an asterisk. This means the file is unsaved.

  • Click on the save icon to save the R script file as “Rprofile.R” in your project’s working working directory.

  • The file will appear in your files pane in your project working directory as “Rprofile.R”.

  • Since the .Rprofile file must be a hidden file, we must rename it to “.Rprofile”.

  • Select the file by checking the box to the left of the file with a tick.

  • Click “Rename” from the menu at the top of the Files pane.

  • Rename the file as “.Rprofile”.

  • You should now have your .Rprofile file in your project working directory as shown below.
  • If you restart RStudio, the packages AER and dynlm will load automatically as R starts up.
  • You may edit the .Rprofile file any time to add library commands for additional packages you might want to load when R starts up.

Reproducible research

[Under construction]

Managing your files

[Under construction]

Three types of files to save your work

There are three types of files that you can use to save your coding work.

  1. R script file (x.R): contains code (and comments).

    • You have already used this type of file to make your .Rprofile file.
  2. R Notebook file (x.Rmd): uses a system called Rmarkdown which allows dynamic interactive documents to be created in R that contain written text, R code and the output from the code.3

    • An R Notebook allows for “code chunks” containing R code which can be executed independently and interactively.
    • The notebook generates a HTML file when saved.
    • The HTML is a rendered presentable version of the notebook which also contains the original notebook file.
    • R Notebooks are suitable for documenting reproducible research.
    • In my courses, I usually ask students to work in an R Notebook file during class and in their assignments.

3 For a thorough discussion of R Notebooks see https://bookdown.org/yihui/rmarkdown/notebook.html.

  1. Quarto file (x.qmd): contains written text, code and outputs from the code.4

    • Quarto is an opensource scientific and technical publishing system suited to documenting reproducible research consisting of text, code and outputs.
    • Quarto is capable of rendering articles, presentations, books, dashboards and websites.
    • My seminar students are required to produce a fully reproducible graduation paper, iwth the final output in PDF form, using R, Rstudio and Quarto.
    • This guide to R and RStudio was made using R, RStudio and Quarto.

4 A guide to Quarto can be found at https://quarto.org/.

Installing Tex for markdown

  • Both R Notebooks and Quarto documents use the markdown language for formatting text. Markdown is an easy-to-write plain text format that is related to TeX (or LaTeX).

  • A TeX installation is required to use Rmarkdown, such as MacTex for MacOS or MiKTeX which is cross-platform. If you use LaTeX to write documents, you likely already have a TeX installation. If not, the easiest way to get a TeX installation for markdown is via the R TinyTeX package.

Install TinyTeX using the package called tinytex

  • You have already installed the package called tinytex. Executing the code below in the console will download the TinyTeX files from the internet and install them on your computer so that you can make markdown documents.
Show the code
install.packages("tinytex")
tinytex::install_tinytex()

  • Tinytex will take some time to download and install.

  • When the installation has finished, restart RStudio.

Where are my files?

  • The working directory is where R will look for your files and save files.

  • Show your current working directory:

Show the code
getwd()

  • The working directory is set by your project.

  • If you want to change your working directory temporarily you can set it to another path. Here is an example for my folders:

Show the code
setwd("/Users/Clinton/Dropbox/syncd_r_data/econometrics")
  • If you your working directory is a folder on a network drive or on the cloud (like Dropbox or Google Drive) that is synced to your computer’s hard drive, you will be able to use it with RStudio on multiple computers.

Test your Rmarkdown environment

  • Run through a simple test to see if you can make an R Notebook, as follows.

  • Open a new R Notebook.

  • The notebook already contains example text and R code that plots a simple chart.

  • Run the code by pressing “Run All” from the “Run” menu.

  • After you have run the code, you should see the plot appear.

  • Include your name as I have at the top of the Notebook.

  • Save the file to your working directory.

  • You should see two files appear.

    • Your code is in the file with the extension “.Rmd”.
    • Your HTML R Notebook has the extension “.nb.html”.

  • Open the HTML file with your browser. It should look like this:

  • Well done! You have created a Notebook.

Resources for R

Venables, W. N., D. M. Smith, and R Core Team. 2019. An Introduction to R.” https://doi.org/10.1201/9781420035025.ch1.

CRAN Task Views

  • CRAN Task Views provide information about packages that can be used with R. Packages extend the functionality of R. They provide routines for various types of data manipulation, mathematical and econometric models, optimisation methods, financial models and more. Packages are constantly being developed and updated by R users.

  • Three Task Views are useful my courses are:

Books on R

  • Many books have been published on using R, programming with R, and using R for finance.

    • Finance: Bennett and Hugen (2016), Andrecut (2010), Tsay (2010), Tsay (2014)
    • Econometrics: Kleiber and Zeileis (2008)
    • Coding: Matloff (2011)
Bennett, Mark J., and Dirk L. Hugen. 2016. Financial Analytics with R: Building a Laptop Laboratory for Data Science. Cambridge University Press. https://doi.org/10.1017/CBO9781316584460.
Andrecut, M. 2010. Portfolio Optimization in R.” Cornell University, arXiv:1307.0450 [q-fin.PM]. https://doi.org/10.1201/b17178.
Tsay, Ruey S. 2010. Analysis of financial time series. Wiley.
———. 2014. Multivariate time series analysis : with R and financial applications. Wiley.
Matloff, Norman. 2011. The Art of R Programming: A Tour of Statistical Software Design. No Starch Press. https://www.nostarch.com/artofr.htm.

R-Bloggers

Coding help via AI

  • AI can help with learning coding. AI chatbots often answer simple coding questions well but have difficulty hallucinating or getting stuck in a loop of answer for more complex questions.

  • As with everything AI, the danger is failing to learn due to the convenience of getting an answer from AI. I recommend searching for solutions yourself.

RStudio Keyboard Shortcuts (for Mac)

  • Code completion:

    • Type part of the function you want and then press the Tab key.
    • Also works for function arguments, that is press Tab when yo uare inside the brackets of a function.
    • Also recalls object names.
  • Press F1 (or fn+F1) for help on a function.

  • Retrieve previous commands:

    • Press the up or down arrows to scroll through your code history.
  • View a list of previous commands:

    • Press Control (or Command) and the up arrow.
  • View a list of previous commands that match a prefix:

    • Type the prefix and press Control (orCommand) and the up arrow.
  • Enter a line of code from an R script to the console:

    • Command and enter.
  • Enter a line of code from history to and R script:

    • Shift and enter.
  • More:

Information on Rmarkdown and TinyTeX




© Copyright Clinton Watkins 2025 https://www.clinton.watkins.com