title: “Practice Assignment 3” author: “Padmakshi Parkhe” date: “9/17/2017” output: html_document —

Importing data into R

Importing local files

To import data into R, it is important to first install and/or load all necessary packages - ‘dplyr’, ‘haven’, ‘readxl’, ‘tibble’, ‘utils’, and ‘datasets’ Next I create the dataset with 15 rows and 7 columns in excel. The first row is a header row. I save the dataset in three different formats - .xlsx (excel), .csv (comma separated value) and .txt (tab delimited text) in the working directory.

I can import the datasets in R in two ways - using code or using the Import Dataset Tab in the Environment window.

3.1a To load the .xlsx file

I first need to load the right packages using the library function

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.4.1

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(readxl)
library(haven)

## Warning: package 'haven' was built under R version 3.4.1

library(utils)
library(datasets)

I then import the .xlsx file from my working directory into R using the read_excel function, and print the file.

Excel <- read_excel("PracticeAssignment3.xlsx")
Excel

## # A tibble: 14 x 7
##    `Sr. No`  Name `Test 1` `Test 2` `Test 3` `Test 4` `Test 5`
##       <dbl> <chr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1        1   ANN        4        2        9        4        2
##  2        2   PAM        7        9        9        4        6
##  3        3   JAN        8        8        0        9        3
##  4        4   PAT        9        6        3        6        7
##  5        5   KAT        8        0        2        7        0
##  6        6   KIM        8        0        3        5        1
##  7        7   JIM        5        1        4        1        2
##  8        8   SAM        7        1        6        9        7
##  9        9   AMY        6        2        7        8        8
## 10       10   TIM        3        3        8        7        0
## 11       11   RON        1        6        8        6        8
## 12       12   ROD        4        7        5        6        8
## 13       13   JON        6        4        5        3        5
## 14       14   LUC        9        8        7        2        9

To create a tibble, I make sure the package ‘tibble’ is loaded and then I use the tbl_df function.

library(tibble)

## Warning: package 'tibble' was built under R version 3.4.1

Excel <- tbl_df(Excel)
Excel

## # A tibble: 14 x 7
##    `Sr. No`  Name `Test 1` `Test 2` `Test 3` `Test 4` `Test 5`
##       <dbl> <chr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
##  1        1   ANN        4        2        9        4        2
##  2        2   PAM        7        9        9        4        6
##  3        3   JAN        8        8        0        9        3
##  4        4   PAT        9        6        3        6        7
##  5        5   KAT        8        0        2        7        0
##  6        6   KIM        8        0        3        5        1
##  7        7   JIM        5        1        4        1        2
##  8        8   SAM        7        1        6        9        7
##  9        9   AMY        6        2        7        8        8
## 10       10   TIM        3        3        8        7        0
## 11       11   RON        1        6        8        6        8
## 12       12   ROD        4        7        5        6        8
## 13       13   JON        6        4        5        3        5
## 14       14   LUC        9        8        7        2        9

3.1b: To import the .csv file

I use the ‘read.csv’ function and then print the file.

CSVfile <- read.csv("PracticeAssignment3.csv")
CSVfile

##    Sr..No Name Test.1 Test.2 Test.3 Test.4 Test.5
## 1       1  ANN      4      2      9      4      2
## 2       2  PAM      7      9      9      4      6
## 3       3  JAN      8      8      0      9      3
## 4       4  PAT      9      6      3      6      7
## 5       5  KAT      8      0      2      7      0
## 6       6  KIM      8      0      3      5      1
## 7       7  JIM      5      1      4      1      2
## 8       8  SAM      7      1      6      9      7
## 9       9  AMY      6      2      7      8      8
## 10     10  TIM      3      3      8      7      0
## 11     11  RON      1      6      8      6      8
## 12     12  ROD      4      7      5      6      8
## 13     13  JON      6      4      5      3      5
## 14     14  LUC      9      8      7      2      9

To create a tibble of the file, I use the ‘tbl_df’ function again and print the tibble.

CSVfile <- tbl_df(CSVfile)
CSVfile

## # A tibble: 14 x 7
##    Sr..No   Name Test.1 Test.2 Test.3 Test.4 Test.5
##     <int> <fctr>  <int>  <int>  <int>  <int>  <int>
##  1      1    ANN      4      2      9      4      2
##  2      2    PAM      7      9      9      4      6
##  3      3    JAN      8      8      0      9      3
##  4      4    PAT      9      6      3      6      7
##  5      5    KAT      8      0      2      7      0
##  6      6    KIM      8      0      3      5      1
##  7      7    JIM      5      1      4      1      2
##  8      8    SAM      7      1      6      9      7
##  9      9    AMY      6      2      7      8      8
## 10     10    TIM      3      3      8      7      0
## 11     11    RON      1      6      8      6      8
## 12     12    ROD      4      7      5      6      8
## 13     13    JON      6      4      5      3      5
## 14     14    LUC      9      8      7      2      9

3.1c: To import the .txt file

I use the ‘read.delim’ function and print the file.

TXTfile <- read.delim("PracticeAssignment3.txt", header=TRUE)
TXTfile

##    Sr..No Name Test.1 Test.2 Test.3 Test.4 Test.5
## 1       1  ANN      4      2      9      4      2
## 2       2  PAM      7      9      9      4      6
## 3       3  JAN      8      8      0      9      3
## 4       4  PAT      9      6      3      6      7
## 5       5  KAT      8      0      2      7      0
## 6       6  KIM      8      0      3      5      1
## 7       7  JIM      5      1      4      1      2
## 8       8  SAM      7      1      6      9      7
## 9       9  AMY      6      2      7      8      8
## 10     10  TIM      3      3      8      7      0
## 11     11  RON      1      6      8      6      8
## 12     12  ROD      4      7      5      6      8
## 13     13  JON      6      4      5      3      5
## 14     14  LUC      9      8      7      2      9

To create a tibble of the file, I follow the same process as for the two earlier tibbles.

TXTfile <- tbl_df(TXTfile)
TXTfile

## # A tibble: 14 x 7
##    Sr..No   Name Test.1 Test.2 Test.3 Test.4 Test.5
##     <int> <fctr>  <int>  <int>  <int>  <int>  <int>
##  1      1    ANN      4      2      9      4      2
##  2      2    PAM      7      9      9      4      6
##  3      3    JAN      8      8      0      9      3
##  4      4    PAT      9      6      3      6      7
##  5      5    KAT      8      0      2      7      0
##  6      6    KIM      8      0      3      5      1
##  7      7    JIM      5      1      4      1      2
##  8      8    SAM      7      1      6      9      7
##  9      9    AMY      6      2      7      8      8
## 10     10    TIM      3      3      8      7      0
## 11     11    RON      1      6      8      6      8
## 12     12    ROD      4      7      5      6      8
## 13     13    JON      6      4      5      3      5
## 14     14    LUC      9      8      7      2      9

Importing web-based files

So far, I have worked with files that were saved on my local drive. I will now work with files from the web.

3.2: To import a .csv file from the web

I need the URL of the site where the file is stored. For this, I will use the ‘read.csv’ function and supply the URL of the file site instead of a filename. I then print the file.

CSVWeb <- read.csv("http://www.personal.psu.edu/dlp/alphaheight_weight_dataset.csv")
CSVWeb

This returns the entire dataset of 200 rows and 4 columns. I have chosen to ‘hide’ the results so the long table is not displayed in the report.

To create a tibble for this dataset, I use the ‘tbl_df’ function as before.

CSVWeb <- tbl_df(CSVWeb)
CSVWeb

## # A tibble: 200 x 4
##    Index Height Weight Gender
##    <int>  <dbl>  <dbl> <fctr>
##  1     1  65.78 112.99 female
##  2     2  71.52 136.49   male
##  3     3  69.40 153.03   male
##  4     4  68.22 142.34 female
##  5     5  67.79 144.30   male
##  6     6  68.70 123.30   male
##  7     7  69.80 141.49   male
##  8     8  70.01 136.46 female
##  9     9  67.90 112.37   male
## 10    10  66.78 120.67   male
## # ... with 190 more rows

This returns the first 10 rows and 4 columns of tidy data.

Summary of Data

If I need to view a summary of the dataset, I use the ‘summary’ function.

summary(CSVWeb)

##      Index            Height          Weight         Gender   
##  Min.   :  1.00   Min.   :63.43   Min.   : 97.9   female: 95  
##  1st Qu.: 50.75   1st Qu.:66.52   1st Qu.:119.9   male  :105  
##  Median :100.50   Median :67.94   Median :127.9               
##  Mean   :100.50   Mean   :67.95   Mean   :127.2               
##  3rd Qu.:150.25   3rd Qu.:69.20   3rd Qu.:136.1               
##  Max.   :200.00   Max.   :73.90   Max.   :159.0

I can also import the same file from the web using the Import Dataset Tab in the Environment Window. Some things to remember about this - Click the ‘Update’ button once you type/paste the URL. The package ‘readr’ will be automatically loaded. R will do the coding and work and will print out the dataset in a new script. I can then view the dataset using the ‘View’ function. To create a tibble of the data, I use the ‘tbl_df’ function again.

3.3: To import the second .csv dataset from the web

I will use the read.csv function again and print the file.

CSVWeb2 <- read.csv("http://www.personal.psu.edu/dlp/w540/datasets/titanicsurvival.csv")
CSVWeb2

Only a certain number of rows are displayed and I get an error message about the max.print limit. So I extend the max.print limit to 10000 rows using the ‘options’ function and print again. I have chosen to ‘hide’ the results so the long table is not displayed in the report.

options(max.print = 10000)
CSVWeb2

To see a tibble of this dataset, I use the ‘tbl_df’ function and print the tibble.

CSVWeb2 <- tbl_df(CSVWeb2)
CSVWeb2

## # A tibble: 2,201 x 4
##    Class   Age   Sex Survive
##    <int> <int> <int>   <int>
##  1     1     1     1       1
##  2     1     1     1       1
##  3     1     1     1       1
##  4     1     1     1       1
##  5     1     1     1       1
##  6     1     1     1       1
##  7     1     1     1       1
##  8     1     1     1       1
##  9     1     1     1       1
## 10     1     1     1       1
## # ... with 2,191 more rows

3.4 Importing SPSS files from the web

The last step of the assignment is to import an SPSS dataset from the web. To import SPSS files, with extension .sav, it is always a good idea to use the Import Dataset Tab. R ensures that the ‘haven’ package has been loaded and prints the dataset in a separate script.

library(haven) SPSSfile <- read_sav(“https://cehd.gmu.edu/assets/dimitrovbook/EXAMPLE_23_1.sav”) View(SPSSfile)

I can view the dataset as well as the tibble. The descriptions of the items in the header row are not displayed in the tibble.