setwd("E:/Desktop/ProyectoR")
The list.files() function in R is used to list all the
files in a specified directory. By default, it shows the names of the
files in the current working directory.
list.files()
## [1] "Atriplex.txt" "Atriplex.xlsx" "data_structure.R"
## [4] "Importando_datos.R" "Importar_datos.html" "Importar_datos.Rmd"
## [7] "Manipulacion_datos.R" "Medidas_resumen.R" "pt_data.csv"
## [10] "pt_data.tsv" "rsconnect" "tablas_graficos.html"
## [13] "tablas_graficos.R" "tablas_graficos.Rmd"
Another way to achieve the same result is by adjusting the working
directory using the following steps: R → Session → Set Working Directory
→ Choose Directory, and then selecting the folder where the data is
located. You can use the getwd() function to verify that
the correct directory has been set.
getwd()
## [1] "E:/Desktop/ProyectoR"
The R packages readr and readxl will be
installed only if they are not already installed on the computer. You
can use the following code to install them if necessary:
# install.packages(c("readr", "readxl"))
You can create .txt, .csv, and
.tsv files using simple programs like Notepad (on Windows),
TextEdit (on Mac), or any basic text editor. You just need to type your
data and save the file with the correct extension. For example, use
commas for .csv files and tabs for .tsv files.
These formats are useful for organizing data in a plain and readable
way.
A tabular data file is structured in matrix form, such that each line of text reflects one example and each example has the same number of features. The feature values on each line are separated by a predefined symbol know as delimiter. Often, the first line of a tabular data file list the names of the data columns. This is called a header line.
.csv filesPerhaps the most common tabular text file format is the coma-separated values (CSV) file.
A .csv file representing the medical dataset constructed
previosly could be stored as:
name,temperature,status,gender,blood
jhon,98.1,FALSE,MALE,O
Jane,98.6,FALSE,FEMALE,AB
Steve,101.4,TRUE,MALE,A
Given a patient data file named pt_data.csv located in
the R working directory
pt_data<-read.csv("pt_data.csv",stringsAsFactors = FALSE,
header = TRUE)
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'pt_data.csv'
pt_data
## name temperature status gender blood
## 1 jhon 98.1 FALSE MALE O
## 2 Jane 98.6 FALSE FEMALE AB
## 3 Steve 101.4 TRUE MALE A
class(pt_data)
## [1] "data.frame"
If stringsAsFactors = TRUE: R automatically converts
character strings into factors (categorical variables).
If stringsAsFactors = FALSE: R leaves character strings
as character vectors, which is usually what you want for text data.
By using the readr package
library(readr)
pt_data2 <- read_csv("pt_data.csv")
## Rows: 3 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): name, gender, blood
## dbl (1): temperature
## lgl (1): status
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pt_data2
## # A tibble: 3 × 5
## name temperature status gender blood
## <chr> <dbl> <lgl> <chr> <chr>
## 1 jhon 98.1 FALSE MALE O
## 2 Jane 98.6 FALSE FEMALE AB
## 3 Steve 101. TRUE MALE A
class(pt_data2)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
.tsv filesIncluding other delimited formats such as tab_separated values (TSV)
A .tsv file representing the medical dataset constructed
previosly could be stored as:
name temperature status gender blood
jhon 98.1 FALSE MALE O
Jane 98.6 FALSE FEMALE AB
Steve 101.4 TRUE MALE A
The spaces between columns in the .tsv are actually tab
characters, even though they may appear as regular spaces here.
pt_data3 <- read.delim("pt_data.tsv")
## Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
## incomplete final line found by readTableHeader on 'pt_data.tsv'
pt_data3
## name temperature status gender blood
## 1 jhon 98.1 FALSE MALE O
## 2 Jane 98.6 FALSE FEMALE AB
## 3 Steve 101.4 TRUE MALE A
class(pt_data3)
## [1] "data.frame"
By using the readr package
library(readr)
pt_data4 <- read_tsv("pt_data.tsv")
## Rows: 3 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): name, gender, blood
## dbl (1): temperature
## lgl (1): status
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pt_data4
## # A tibble: 3 × 5
## name temperature status gender blood
## <chr> <dbl> <lgl> <chr> <chr>
## 1 jhon 98.1 FALSE MALE O
## 2 Jane 98.6 FALSE FEMALE AB
## 3 Steve 101. TRUE MALE A
class(pt_data4)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
The dataset to be used can be found in Infostat by navigating to File
→ Open sample data → Atriplex. Once located, the dataset can be selected
and exported in either Excel or .txt
format.
.txt filesdatos2<-read.table("Atriplex.txt",header=TRUE,dec = ",")
head(datos2, 4)
## Tamaño Episperma PG PN PS Bloque
## 1 chicas claro 60 47 0.0030 1
## 2 chicas claro 73 33 0.0030 2
## 3 chicas claro 73 60 0.0031 3
## 4 chicas rojizo 93 7 0.0030 1
class(datos2)
## [1] "data.frame"
.xlsx files.By using the readxl package.
If we want to view the sheets in an Excel file, we can do so using
the excel_sheets function.”
library(readxl)
excel_sheets("Atriplex.xlsx")
## [1] "Hoja1"
datos3=read_xlsx("Atriplex.xlsx",sheet='Hoja1')
head(datos3,4)
## # A tibble: 4 × 6
## Tamaño Episperma PG PN PS Bloque
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 chicas claro 60 47 0.003 1
## 2 chicas claro 73 33 0.003 2
## 3 chicas claro 73 60 0.0031 3
## 4 chicas rojizo 93 7 0.003 1
class(datos3)
## [1] "tbl_df" "tbl" "data.frame"
Another way to achieve the same result is using the function
read_excel of the same readxl package
datos4 <- read_excel("Atriplex.xlsx")
head(datos4)
## # A tibble: 6 × 6
## Tamaño Episperma PG PN PS Bloque
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 chicas claro 60 47 0.003 1
## 2 chicas claro 73 33 0.003 2
## 3 chicas claro 73 60 0.0031 3
## 4 chicas rojizo 93 7 0.003 1
## 5 chicas rojizo 66 33 0.0026 2
## 6 chicas rojizo 60 20 0.003 3
class(datos4)
## [1] "tbl_df" "tbl" "data.frame"
If the data is in a format such as .csv,
.tsv, or .txt.
url <- "https://raw.githubusercontent.com/hadley/data-fuel-economy/master/1984/mpg.csv"
datos5 <- read.csv(url)
head(datos5)
## year class model displ trans cyl manufacturer
## 1 1984 1 SPIDER VELOCE 2000 120 manual(m5) 4 ALFA ROMEO
## 2 1984 1 BERTONE X1/9 91 manual(m5) 4 BERTONE
## 3 1984 1 CORVETTE 350 auto(4) 8 CHEVROLET
## 4 1984 1 CORVETTE 350 manual(m4) 8 CHEVROLET
## 5 1984 1 300ZX 181 auto(4) 6 NISSAN MOTOR COMPANY, LTD.
## 6 1984 1 300ZX 181 auto(4) 6 NISSAN MOTOR COMPANY, LTD.
## cty hwy eng.dscr T G S twodoor.p twodoor.l fourdoor.p
## 1 23 35 (FFS) FALSE FALSE FALSE NA NA NA
## 2 25 36 (FFS) FALSE FALSE FALSE NA NA NA
## 3 16 28 (FFS) FALSE FALSE FALSE NA NA NA
## 4 16 28 (FFS) FALSE FALSE FALSE NA NA NA
## 5 19 28 (FFS,TRBO) TRUE FALSE FALSE NA NA NA
## 6 20 28 (FFS) FALSE FALSE FALSE NA NA NA
## fourdoor.l hatch.p hatch.l
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA 50 22
## 6 NA 50 22
class(datos5)
## [1] "data.frame"
By using the readr package
library(readr)
url <- "https://raw.githubusercontent.com/hadley/data-fuel-economy/master/1984/mpg.csv"
datos6 <- read_csv(url)
## Rows: 1958 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): model, trans, manufacturer, eng.dscr
## dbl (12): year, class, displ, cyl, cty, hwy, twodoor.p, twodoor.l, fourdoor....
## lgl (3): T, G, S
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(datos6)
## # A tibble: 6 × 19
## year class model displ trans cyl manufacturer cty hwy eng.dscr T
## <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <chr> <lgl>
## 1 1984 1 SPIDER … 120 manu… 4 ALFA ROMEO 23 35 (FFS) FALSE
## 2 1984 1 BERTONE… 91 manu… 4 BERTONE 25 36 (FFS) FALSE
## 3 1984 1 CORVETTE 350 auto… 8 CHEVROLET 16 28 (FFS) FALSE
## 4 1984 1 CORVETTE 350 manu… 8 CHEVROLET 16 28 (FFS) FALSE
## 5 1984 1 300ZX 181 auto… 6 NISSAN MOTO… 19 28 (FFS,TR… TRUE
## 6 1984 1 300ZX 181 auto… 6 NISSAN MOTO… 20 28 (FFS) FALSE
## # ℹ 8 more variables: G <lgl>, S <lgl>, twodoor.p <dbl>, twodoor.l <dbl>,
## # fourdoor.p <dbl>, fourdoor.l <dbl>, hatch.p <dbl>, hatch.l <dbl>
class(datos6)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
In the next session, we will build frequency tables and create graphs for each variable.
Di Rienzo J.A., Casanoves F., Balzarini M.G., Gonzalez L., Tablada M., Robledo C.W. InfoStat versión 2020. Centro de Transferencia InfoStat, Facultad de Ciencias Agropecuarias, Universidad Nacional de Córdoba, Argentina. URL http://www.infostat.com.ar
``