Download the script here


1 General

Let´s take a look at real data examples! You are going to have several options for datasets you can (feel free to use others) use. We now want to introduce: + UN General Assembly voting data by Voeten et al. (2000) + Survey Data on immigration + A subset of aiddata (provided by aiddata.org)


2 Download the Data

In contrast to eurostat data, there is no direct API attached to these datasets. This is why - in theory - we have two ways to download the data. We can download it manually and store it somewhere in the project folder (maybe “/data/xyz”). Or we tell R to download it directly from the website. While I have already uploaded subsetted data on the latter two datasets here and here, we now want to download the Voeten data directly from the internet:

rm(list = ls())
# install.packages("tidyverse")
library(tidyverse)
if(!exists("data/UN-73new_small.RData")) {
  url <- "https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/LEJUQZ/KKG7SW"
  download.file(url,
                destfile="data/UN-73new_small.RData",
                quiet = FALSE,
                mode = "w",
                cacheOK = TRUE,
                extra = getOption("download.file.extra"),
                headers = NULL)
}
load("data/UN-73new_small.RData")

# Now we subset the data to delete irrelevant variables
completeVotes<- completeVotes[c("rcid","vote","Country","year","importantvote","me","nu","di","hr","co","ec")]

completeVotes<- completeVotes[!is.na(completeVotes$Country),]

save(completeVotes,file="data/UN-73new_small.RData")

If we downloaded the aiddata here and saved it to our localdata folder, we would have been tackled by the problem of massive amounts of data. To still be able to handle this, the fread function in the data.table package offers great features to open huge csv files fast and completely.

#aiddata <- fread("data/AidDataCoreFull_ResearchRelease_Level1_v3.1.csv",
#                    sep=",",
#                    nrows = -1,
#                    na.strings = c("NA","N/A",""),
#                    stringsAsFactors=FALSE
#)

3 Tasks

All three datasets should be stored in the datafile. As learned, we now want to make the following

  • Import the data
  • Load it to the environment
  • Tell us about the structure of the data
  • Subset according to the tasks and gogogo!

3.1 Voeten

  • Subset the data to only choose important votes (what do we notice?)
  • Subset the data and only focus on Russia, USA and China
  • Table the countries decision in the year 2006

3.2 Immigration

  • Subset the data to only get these participants with an age over 23
  • Table the ethnocentrism scale (variable ethno)

3.3 Aiddata

  • Subset the data to get all aid in 2013
  • Subset donor countries and only focus on Russia, USA and China
  • Table the commited ammount of these donor countries