Download the script here
We want to analyze our data. In the first place this means we are going to conduct descriptive analyses: How is the data structured, what do we know about the variables?
To do so, we need to download the data, do we?
This is how the beginning of every of your scripts should look like: 1. Clean environment and 2. load packages
rm(list = ls())
# install.packages("tidyverse")
library(tidyverse)
To make life easier, we load a dataset that is already attached to R: the mtcar dataset.
data<- mtcars
For a first overview over our data, we can use the base functions (e.g. those which are without any package installations part of R) names()
,str()
, head()
, summary()
, table()
, quantile()
, and View()
.
Task Go through every function’s output and try to understand what it means!
names(data)
str(data)
head(data)
summary(data)
table(data$mpg) # you could use any other variable here.
quantile(data$mpg, na.rm = TRUE)
View(data)
In the beginning of each research project, we start with most important data indicators: We want to know the mean()
, the median()
, or about the range()
of a variable.
mean(data$cyl, na.rm=T)
median(data$cyl, na.rm=T)
range(data$cyl, na.rm=T)
Or we want to check about the frequency: We use the table()
function.
To give an example:
table(data$cyl)
Sometimes, it is interesting whether there are differences among categories. For example, we want to know the two-diemsnional table for the number of gears and cylinders.
table(data$cyl,data$gear)
Moving away from relative or absolute probabilities, we can analyse the distribution. Histograms are most common for this:
hist(data$gear, main = paste("Gear Histogram"),xlab="Cylinder in numbers", ylab="Frequence")
Yes, R is pretty straightforward. For example, we can easily calculate the boxplot for the variables of any data:
boxplot(data$mpg, na.rm=T,main=paste("Boxplot mpg"))