title: “Practice Assignment 3” author: “Padmakshi Parkhe” date: “9/17/2017” output: html_document —
To import data into R, it is important to first install and/or load all necessary packages - ‘dplyr’, ‘haven’, ‘readxl’, ‘tibble’, ‘utils’, and ‘datasets’ Next I create the dataset with 15 rows and 7 columns in excel. The first row is a header row. I save the dataset in three different formats - .xlsx (excel), .csv (comma separated value) and .txt (tab delimited text) in the working directory.
I can import the datasets in R in two ways - using code or using the Import Dataset Tab in the Environment window.
I first need to load the right packages using the library function
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readxl)
library(haven)
## Warning: package 'haven' was built under R version 3.4.1
library(utils)
library(datasets)
I then import the .xlsx file from my working directory into R using the read_excel function, and print the file.
Excel <- read_excel("PracticeAssignment3.xlsx")
Excel
## # A tibble: 14 x 7
## `Sr. No` Name `Test 1` `Test 2` `Test 3` `Test 4` `Test 5`
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 ANN 4 2 9 4 2
## 2 2 PAM 7 9 9 4 6
## 3 3 JAN 8 8 0 9 3
## 4 4 PAT 9 6 3 6 7
## 5 5 KAT 8 0 2 7 0
## 6 6 KIM 8 0 3 5 1
## 7 7 JIM 5 1 4 1 2
## 8 8 SAM 7 1 6 9 7
## 9 9 AMY 6 2 7 8 8
## 10 10 TIM 3 3 8 7 0
## 11 11 RON 1 6 8 6 8
## 12 12 ROD 4 7 5 6 8
## 13 13 JON 6 4 5 3 5
## 14 14 LUC 9 8 7 2 9
To create a tibble, I make sure the package ‘tibble’ is loaded and then I use the tbl_df function.
library(tibble)
## Warning: package 'tibble' was built under R version 3.4.1
Excel <- tbl_df(Excel)
Excel
## # A tibble: 14 x 7
## `Sr. No` Name `Test 1` `Test 2` `Test 3` `Test 4` `Test 5`
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 ANN 4 2 9 4 2
## 2 2 PAM 7 9 9 4 6
## 3 3 JAN 8 8 0 9 3
## 4 4 PAT 9 6 3 6 7
## 5 5 KAT 8 0 2 7 0
## 6 6 KIM 8 0 3 5 1
## 7 7 JIM 5 1 4 1 2
## 8 8 SAM 7 1 6 9 7
## 9 9 AMY 6 2 7 8 8
## 10 10 TIM 3 3 8 7 0
## 11 11 RON 1 6 8 6 8
## 12 12 ROD 4 7 5 6 8
## 13 13 JON 6 4 5 3 5
## 14 14 LUC 9 8 7 2 9
I use the ‘read.csv’ function and then print the file.
CSVfile <- read.csv("PracticeAssignment3.csv")
CSVfile
## Sr..No Name Test.1 Test.2 Test.3 Test.4 Test.5
## 1 1 ANN 4 2 9 4 2
## 2 2 PAM 7 9 9 4 6
## 3 3 JAN 8 8 0 9 3
## 4 4 PAT 9 6 3 6 7
## 5 5 KAT 8 0 2 7 0
## 6 6 KIM 8 0 3 5 1
## 7 7 JIM 5 1 4 1 2
## 8 8 SAM 7 1 6 9 7
## 9 9 AMY 6 2 7 8 8
## 10 10 TIM 3 3 8 7 0
## 11 11 RON 1 6 8 6 8
## 12 12 ROD 4 7 5 6 8
## 13 13 JON 6 4 5 3 5
## 14 14 LUC 9 8 7 2 9
To create a tibble of the file, I use the ‘tbl_df’ function again and print the tibble.
CSVfile <- tbl_df(CSVfile)
CSVfile
## # A tibble: 14 x 7
## Sr..No Name Test.1 Test.2 Test.3 Test.4 Test.5
## <int> <fctr> <int> <int> <int> <int> <int>
## 1 1 ANN 4 2 9 4 2
## 2 2 PAM 7 9 9 4 6
## 3 3 JAN 8 8 0 9 3
## 4 4 PAT 9 6 3 6 7
## 5 5 KAT 8 0 2 7 0
## 6 6 KIM 8 0 3 5 1
## 7 7 JIM 5 1 4 1 2
## 8 8 SAM 7 1 6 9 7
## 9 9 AMY 6 2 7 8 8
## 10 10 TIM 3 3 8 7 0
## 11 11 RON 1 6 8 6 8
## 12 12 ROD 4 7 5 6 8
## 13 13 JON 6 4 5 3 5
## 14 14 LUC 9 8 7 2 9
I use the ‘read.delim’ function and print the file.
TXTfile <- read.delim("PracticeAssignment3.txt", header=TRUE)
TXTfile
## Sr..No Name Test.1 Test.2 Test.3 Test.4 Test.5
## 1 1 ANN 4 2 9 4 2
## 2 2 PAM 7 9 9 4 6
## 3 3 JAN 8 8 0 9 3
## 4 4 PAT 9 6 3 6 7
## 5 5 KAT 8 0 2 7 0
## 6 6 KIM 8 0 3 5 1
## 7 7 JIM 5 1 4 1 2
## 8 8 SAM 7 1 6 9 7
## 9 9 AMY 6 2 7 8 8
## 10 10 TIM 3 3 8 7 0
## 11 11 RON 1 6 8 6 8
## 12 12 ROD 4 7 5 6 8
## 13 13 JON 6 4 5 3 5
## 14 14 LUC 9 8 7 2 9
To create a tibble of the file, I follow the same process as for the two earlier tibbles.
TXTfile <- tbl_df(TXTfile)
TXTfile
## # A tibble: 14 x 7
## Sr..No Name Test.1 Test.2 Test.3 Test.4 Test.5
## <int> <fctr> <int> <int> <int> <int> <int>
## 1 1 ANN 4 2 9 4 2
## 2 2 PAM 7 9 9 4 6
## 3 3 JAN 8 8 0 9 3
## 4 4 PAT 9 6 3 6 7
## 5 5 KAT 8 0 2 7 0
## 6 6 KIM 8 0 3 5 1
## 7 7 JIM 5 1 4 1 2
## 8 8 SAM 7 1 6 9 7
## 9 9 AMY 6 2 7 8 8
## 10 10 TIM 3 3 8 7 0
## 11 11 RON 1 6 8 6 8
## 12 12 ROD 4 7 5 6 8
## 13 13 JON 6 4 5 3 5
## 14 14 LUC 9 8 7 2 9
So far, I have worked with files that were saved on my local drive. I will now work with files from the web.
I need the URL of the site where the file is stored. For this, I will use the ‘read.csv’ function and supply the URL of the file site instead of a filename. I then print the file.
CSVWeb <- read.csv("http://www.personal.psu.edu/dlp/alphaheight_weight_dataset.csv")
CSVWeb
This returns the entire dataset of 200 rows and 4 columns. I have chosen to ‘hide’ the results so the long table is not displayed in the report.
To create a tibble for this dataset, I use the ‘tbl_df’ function as before.
CSVWeb <- tbl_df(CSVWeb)
CSVWeb
## # A tibble: 200 x 4
## Index Height Weight Gender
## <int> <dbl> <dbl> <fctr>
## 1 1 65.78 112.99 female
## 2 2 71.52 136.49 male
## 3 3 69.40 153.03 male
## 4 4 68.22 142.34 female
## 5 5 67.79 144.30 male
## 6 6 68.70 123.30 male
## 7 7 69.80 141.49 male
## 8 8 70.01 136.46 female
## 9 9 67.90 112.37 male
## 10 10 66.78 120.67 male
## # ... with 190 more rows
This returns the first 10 rows and 4 columns of tidy data.
If I need to view a summary of the dataset, I use the ‘summary’ function.
summary(CSVWeb)
## Index Height Weight Gender
## Min. : 1.00 Min. :63.43 Min. : 97.9 female: 95
## 1st Qu.: 50.75 1st Qu.:66.52 1st Qu.:119.9 male :105
## Median :100.50 Median :67.94 Median :127.9
## Mean :100.50 Mean :67.95 Mean :127.2
## 3rd Qu.:150.25 3rd Qu.:69.20 3rd Qu.:136.1
## Max. :200.00 Max. :73.90 Max. :159.0
I can also import the same file from the web using the Import Dataset Tab in the Environment Window. Some things to remember about this - Click the ‘Update’ button once you type/paste the URL. The package ‘readr’ will be automatically loaded. R will do the coding and work and will print out the dataset in a new script. I can then view the dataset using the ‘View’ function. To create a tibble of the data, I use the ‘tbl_df’ function again.
I will use the read.csv function again and print the file.
CSVWeb2 <- read.csv("http://www.personal.psu.edu/dlp/w540/datasets/titanicsurvival.csv")
CSVWeb2
Only a certain number of rows are displayed and I get an error message about the max.print limit. So I extend the max.print limit to 10000 rows using the ‘options’ function and print again. I have chosen to ‘hide’ the results so the long table is not displayed in the report.
options(max.print = 10000)
CSVWeb2
To see a tibble of this dataset, I use the ‘tbl_df’ function and print the tibble.
CSVWeb2 <- tbl_df(CSVWeb2)
CSVWeb2
## # A tibble: 2,201 x 4
## Class Age Sex Survive
## <int> <int> <int> <int>
## 1 1 1 1 1
## 2 1 1 1 1
## 3 1 1 1 1
## 4 1 1 1 1
## 5 1 1 1 1
## 6 1 1 1 1
## 7 1 1 1 1
## 8 1 1 1 1
## 9 1 1 1 1
## 10 1 1 1 1
## # ... with 2,191 more rows
The last step of the assignment is to import an SPSS dataset from the web. To import SPSS files, with extension .sav, it is always a good idea to use the Import Dataset Tab. R ensures that the ‘haven’ package has been loaded and prints the dataset in a separate script.
library(haven) SPSSfile <- read_sav(“https://cehd.gmu.edu/assets/dimitrovbook/EXAMPLE_23_1.sav”) View(SPSSfile)
I can view the dataset as well as the tibble. The descriptions of the items in the header row are not displayed in the tibble.