Download the script here


1 Preliminary

  • We want to analyze our data. In the first place this means we are going to conduct descriptive analyses: How is the data structured, what do we know about the variables?

  • To do so, we need to download the data, do we?


This is how the beginning of every of your scripts should look like: 1. Clean environment and 2. load packages

rm(list = ls())
# install.packages("tidyverse")
library(tidyverse)

2 Loading the dataset

To make life easier, we load a dataset that is already attached to R: the mtcar dataset.

data<- mtcars

2.1 Explore the data structure

For a first overview over our data, we can use the base functions (e.g. those which are without any package installations part of R) names(),str(), head(), summary(), table(), quantile(), and View().

Task Go through every function’s output and try to understand what it means!

names(data)
str(data)
head(data)
summary(data)
table(data$mpg) # you could use any other variable here.
quantile(data$mpg, na.rm = TRUE)
View(data)

3 Data Distribution

In the beginning of each research project, we start with most important data indicators: We want to know the mean(), the median(), or about the range() of a variable.

mean(data$cyl, na.rm=T) 
median(data$cyl, na.rm=T) 
range(data$cyl, na.rm=T)

4 Frequency Table

4.1 One-dimensional

Or we want to check about the frequency: We use the table() function.

To give an example:

table(data$cyl)

4.2 n-dimensional table

Sometimes, it is interesting whether there are differences among categories. For example, we want to know the two-diemsnional table for the number of gears and cylinders.

table(data$cyl,data$gear)

5 Histogram

Moving away from relative or absolute probabilities, we can analyse the distribution. Histograms are most common for this:

hist(data$gear, main = paste("Gear Histogram"),xlab="Cylinder in numbers", ylab="Frequence")

6 Boxplot

Yes, R is pretty straightforward. For example, we can easily calculate the boxplot for the variables of any data:

boxplot(data$mpg, na.rm=T,main=paste("Boxplot mpg"))