Set the working directory

setwd("E:/Desktop/ProyectoR")

The list.files() function in R is used to list all the files in a specified directory. By default, it shows the names of the files in the current working directory.

list.files()
##  [1] "Atriplex.txt"         "Atriplex.xlsx"        "data_structure.R"    
##  [4] "Importando_datos.R"   "Importar_datos.html"  "Importar_datos.Rmd"  
##  [7] "Manipulacion_datos.R" "Medidas_resumen.R"    "pt_data.csv"         
## [10] "pt_data.tsv"          "rsconnect"            "tablas_graficos.html"
## [13] "tablas_graficos.R"    "tablas_graficos.Rmd"

Another way to achieve the same result is by adjusting the working directory using the following steps: R → Session → Set Working Directory → Choose Directory, and then selecting the folder where the data is located. You can use the getwd() function to verify that the correct directory has been set.

getwd()
## [1] "E:/Desktop/ProyectoR"

The R packages readr and readxl will be installed only if they are not already installed on the computer. You can use the following code to install them if necessary:

# install.packages(c("readr", "readxl"))

Load libraries

You can create .txt, .csv, and .tsv files using simple programs like Notepad (on Windows), TextEdit (on Mac), or any basic text editor. You just need to type your data and save the file with the correct extension. For example, use commas for .csv files and tabs for .tsv files. These formats are useful for organizing data in a plain and readable way.

A tabular data file is structured in matrix form, such that each line of text reflects one example and each example has the same number of features. The feature values on each line are separated by a predefined symbol know as delimiter. Often, the first line of a tabular data file list the names of the data columns. This is called a header line.

Importing data from .csv files

Perhaps the most common tabular text file format is the coma-separated values (CSV) file.

A .csv file representing the medical dataset constructed previosly could be stored as:

name,temperature,status,gender,blood
jhon,98.1,FALSE,MALE,O
Jane,98.6,FALSE,FEMALE,AB
Steve,101.4,TRUE,MALE,A

Given a patient data file named pt_data.csv located in the R working directory

pt_data<-read.csv("pt_data.csv",stringsAsFactors = FALSE,
                  header = TRUE)
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'pt_data.csv'
pt_data
##    name temperature status gender blood
## 1  jhon        98.1  FALSE   MALE     O
## 2  Jane        98.6  FALSE FEMALE    AB
## 3 Steve       101.4   TRUE   MALE     A
class(pt_data)
## [1] "data.frame"

If stringsAsFactors = TRUE: R automatically converts character strings into factors (categorical variables).

If stringsAsFactors = FALSE: R leaves character strings as character vectors, which is usually what you want for text data.

By using the readr package

library(readr)
pt_data2 <- read_csv("pt_data.csv")
## Rows: 3 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): name, gender, blood
## dbl (1): temperature
## lgl (1): status
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pt_data2
## # A tibble: 3 × 5
##   name  temperature status gender blood
##   <chr>       <dbl> <lgl>  <chr>  <chr>
## 1 jhon         98.1 FALSE  MALE   O    
## 2 Jane         98.6 FALSE  FEMALE AB   
## 3 Steve       101.  TRUE   MALE   A
class(pt_data2)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

Importing data from .tsv files

Including other delimited formats such as tab_separated values (TSV)

A .tsv file representing the medical dataset constructed previosly could be stored as:

name temperature status gender blood
jhon 98.1 FALSE MALE O
Jane 98.6 FALSE FEMALE AB
Steve 101.4 TRUE MALE A

The spaces between columns in the .tsv are actually tab characters, even though they may appear as regular spaces here.

pt_data3 <- read.delim("pt_data.tsv")
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'pt_data.tsv'
pt_data3
##    name temperature status gender blood
## 1  jhon        98.1  FALSE   MALE     O
## 2  Jane        98.6  FALSE FEMALE    AB
## 3 Steve       101.4   TRUE   MALE     A
class(pt_data3)
## [1] "data.frame"

By using the readr package

library(readr)
pt_data4 <- read_tsv("pt_data.tsv")
## Rows: 3 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): name, gender, blood
## dbl (1): temperature
## lgl (1): status
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pt_data4
## # A tibble: 3 × 5
##   name  temperature status gender blood
##   <chr>       <dbl> <lgl>  <chr>  <chr>
## 1 jhon         98.1 FALSE  MALE   O    
## 2 Jane         98.6 FALSE  FEMALE AB   
## 3 Steve       101.  TRUE   MALE   A
class(pt_data4)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

The dataset to be used can be found in Infostat by navigating to File → Open sample data → Atriplex. Once located, the dataset can be selected and exported in either Excel or .txt format.

Importing data from .txt files

datos2<-read.table("Atriplex.txt",header=TRUE,dec = ",")
head(datos2, 4)
##   Tamaño Episperma PG PN     PS Bloque
## 1 chicas     claro 60 47 0.0030      1
## 2 chicas     claro 73 33 0.0030      2
## 3 chicas     claro 73 60 0.0031      3
## 4 chicas    rojizo 93  7 0.0030      1
class(datos2)
## [1] "data.frame"

Importing data from .xlsx files.

By using the readxl package.

If we want to view the sheets in an Excel file, we can do so using the excel_sheets function.”

library(readxl)
excel_sheets("Atriplex.xlsx")
## [1] "Hoja1"
datos3=read_xlsx("Atriplex.xlsx",sheet='Hoja1')
head(datos3,4)
## # A tibble: 4 × 6
##   Tamaño Episperma    PG    PN     PS Bloque
##   <chr>  <chr>     <dbl> <dbl>  <dbl>  <dbl>
## 1 chicas claro        60    47 0.003       1
## 2 chicas claro        73    33 0.003       2
## 3 chicas claro        73    60 0.0031      3
## 4 chicas rojizo       93     7 0.003       1
class(datos3)
## [1] "tbl_df"     "tbl"        "data.frame"

Another way to achieve the same result is using the function read_excel of the same readxl package

datos4 <- read_excel("Atriplex.xlsx")
head(datos4)
## # A tibble: 6 × 6
##   Tamaño Episperma    PG    PN     PS Bloque
##   <chr>  <chr>     <dbl> <dbl>  <dbl>  <dbl>
## 1 chicas claro        60    47 0.003       1
## 2 chicas claro        73    33 0.003       2
## 3 chicas claro        73    60 0.0031      3
## 4 chicas rojizo       93     7 0.003       1
## 5 chicas rojizo       66    33 0.0026      2
## 6 chicas rojizo       60    20 0.003       3
class(datos4)
## [1] "tbl_df"     "tbl"        "data.frame"

Importing data from the internet

If the data is in a format such as .csv, .tsv, or .txt.

url <- "https://raw.githubusercontent.com/hadley/data-fuel-economy/master/1984/mpg.csv"
datos5 <- read.csv(url)
head(datos5)
##   year class              model displ      trans cyl               manufacturer
## 1 1984     1 SPIDER VELOCE 2000   120 manual(m5)   4                 ALFA ROMEO
## 2 1984     1       BERTONE X1/9    91 manual(m5)   4                    BERTONE
## 3 1984     1           CORVETTE   350    auto(4)   8                  CHEVROLET
## 4 1984     1           CORVETTE   350 manual(m4)   8                  CHEVROLET
## 5 1984     1              300ZX   181    auto(4)   6 NISSAN MOTOR COMPANY, LTD.
## 6 1984     1              300ZX   181    auto(4)   6 NISSAN MOTOR COMPANY, LTD.
##   cty hwy    eng.dscr     T     G     S twodoor.p twodoor.l fourdoor.p
## 1  23  35      (FFS)  FALSE FALSE FALSE        NA        NA         NA
## 2  25  36      (FFS)  FALSE FALSE FALSE        NA        NA         NA
## 3  16  28      (FFS)  FALSE FALSE FALSE        NA        NA         NA
## 4  16  28      (FFS)  FALSE FALSE FALSE        NA        NA         NA
## 5  19  28 (FFS,TRBO)   TRUE FALSE FALSE        NA        NA         NA
## 6  20  28      (FFS)  FALSE FALSE FALSE        NA        NA         NA
##   fourdoor.l hatch.p hatch.l
## 1         NA      NA      NA
## 2         NA      NA      NA
## 3         NA      NA      NA
## 4         NA      NA      NA
## 5         NA      50      22
## 6         NA      50      22
class(datos5)
## [1] "data.frame"

By using the readr package

library(readr)
url <- "https://raw.githubusercontent.com/hadley/data-fuel-economy/master/1984/mpg.csv"
datos6 <- read_csv(url)
## Rows: 1958 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): model, trans, manufacturer, eng.dscr
## dbl (12): year, class, displ, cyl, cty, hwy, twodoor.p, twodoor.l, fourdoor....
## lgl  (3): T, G, S
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(datos6)
## # A tibble: 6 × 19
##    year class model    displ trans   cyl manufacturer   cty   hwy eng.dscr T    
##   <dbl> <dbl> <chr>    <dbl> <chr> <dbl> <chr>        <dbl> <dbl> <chr>    <lgl>
## 1  1984     1 SPIDER …   120 manu…     4 ALFA ROMEO      23    35 (FFS)    FALSE
## 2  1984     1 BERTONE…    91 manu…     4 BERTONE         25    36 (FFS)    FALSE
## 3  1984     1 CORVETTE   350 auto…     8 CHEVROLET       16    28 (FFS)    FALSE
## 4  1984     1 CORVETTE   350 manu…     8 CHEVROLET       16    28 (FFS)    FALSE
## 5  1984     1 300ZX      181 auto…     6 NISSAN MOTO…    19    28 (FFS,TR… TRUE 
## 6  1984     1 300ZX      181 auto…     6 NISSAN MOTO…    20    28 (FFS)    FALSE
## # ℹ 8 more variables: G <lgl>, S <lgl>, twodoor.p <dbl>, twodoor.l <dbl>,
## #   fourdoor.p <dbl>, fourdoor.l <dbl>, hatch.p <dbl>, hatch.l <dbl>
class(datos6)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

In the next session, we will build frequency tables and create graphs for each variable.

Bibliografía

Di Rienzo J.A., Casanoves F., Balzarini M.G., Gonzalez L., Tablada M., Robledo C.W. InfoStat versión 2020. Centro de Transferencia InfoStat, Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba, Argentina. URL http://www.infostat.com.ar

``