Training workshop: Introduction to Data Sciences with R language.

Second Session:

Rafael Resendiz Ramirez
August 20, 2015

Extraction, scanning, cleaning and handling of data.

Compilation Data

First, we must learn to collect our data, this is done considering that usually going to get through various media and in various formats. For this reason, we will make some scripts, both to connect with the site where the information we need, as the file in question is located. This time instead of loading a file in a predetermined direction, we will use a function that allows us to choose the file you want to work.

  • Script_1 # Reading local files “csv” format.
    newFileName <- read.table(file.choose(), sep = “,”, row.names, col.names, header = TRUE)
    head(newFileName)
    # Select only the column that interests me
    mycolumnselect<-newFileName[,c(37)]
    mycolumnselect

Compilation Data

Compilation Data

Compilation Data

  • Script_4a
    Procedures for working with fortran files from R

       After downloading the file. I think it is necessary to divide the variables differently. 
        In this example, I write the code to show how you can work the file.    
        Procedures for working with fortran files    
        After downloading the file. I think it is necessary to divide the variables differently. 
    

Compilation Data

  • Script_4b

        In this example, I write the code to show how you can work. 
       First procedure for working with fortran 
        Read and fix fortran file
        yourworktempfile <- read.fwf("namefortranfile.for", widths=c(V1,V2,...Vn))     
        Extracting the column to which want to work     
        yourworktempdata <- yourworktempfile[,Vn]      
        Turn extracted data to a matrix     
        dataextratedmatrix <- matrix(yourworktempdata, nrow= Numberows, ncol = Numbercols, byrow =FALSE, dimnames= NULL)     
    

Compilation Data

  • Script_4c

        # Remove head (only in this case)     
        datamatrixclean <- dataextratedmatrix[Numberow_Ini:Numberow_End,]      
        # To convert characters as numeric data      
        numericData <- as.numeric(datamatrixclean)     
        # Execute the 'x' function (mean, sum, etc)    
        xFunction(numericData)    
    

Compilation Data

Compilation Data

  • Script_5

      # Find specific variables and values.
      xpathSApply(rootNode, "//name", xmlValue)
      findzp <- xpathSApply(rootNode, "//zipcode", xmlValue)
      findzp
    

Homework

  1. You should unzipped about four zipped files
  2. Download one file
  3. Extract the name and form about the data.
  4. Download and Read several files in various kinds.

Thank's