Purpose: Retrieve those sentences that contain negations.

PREPROCESSING

1. Preparation

#install.packages("tokenizers")
#install.packages("FSelectorRcpp")
library(NLP)
library(FSelectorRcpp)
library(tokenizers)

## Warning: package 'tokenizers' was built under R version 3.4.4

library(tm)

## Warning: package 'tm' was built under R version 3.4.3

library(SnowballC)
library(stats)
library(ngram)

## Warning: package 'ngram' was built under R version 3.4.3

library(stringr)
library(readxl)

setwd("~/Google Drive/UM/Smart Services/Thesis/Thesis/Code/Feature Set3/Code/1. Negation Handling")

Data <- read_excel("~/Google Drive/UM/Smart Services/Thesis/Thesis/Code/Feature Set3/Input/1.Reviews split into Sentences.xlsx")

Text <- as.character(Data$Review.Fragments)

NEGATION DETECTION

2. Negation List

First a list of common negations is retrieved to spot any sentence containing negations.

#Get List of Negations
library(qdapDictionaries)

## Warning: package 'qdapDictionaries' was built under R version 3.4.4

Negations <- as.list(qdapDictionaries::negation.words)

Length2 <- as.numeric(length(Negations))
Negations.Clean <- as.list(rep(NA,Length2))

for (j in 1:Length2){
  Extract <- Negations[[j]]
  ADD1 <- "^"
  ADD2 <- "$"
  New_Word <- paste(ADD1,Extract,ADD2,sep = "")
  New_Word <-gsub("'"," ",New_Word)
  Negations.Clean[[j]] <- New_Word
}

A quick review has shown that the text contained additional negations. Those were added to the list.

Negations.Clean[[24]] <- "was t"
Negations.Clean[[25]] <- "did t"
Negations.Clean[[26]] <- "can t"

3. Trim Leading Function

trim.leading <- function (x){
  x <- sub("^\\s+", "", x)
  print(x)
}

4. Identify Negations

In this step the reviews are checked for negations. The function checks each word to determine if the word is identical to any of the negation list elements and returns a logical element. The results are stored in a two dimensional list structure First list level: list of sentences each sentence list elements contains a list of words.

Split.Text <- strsplit(Text,split = " ")

Negation.Detection <- list()
for(k in 1:4735){
  Extract <- Split.Text[[k]]
  Length <- as.numeric(length(Extract))
  Words <- list()
  
  for(l in 1:Length){
    Word <- Extract[[l]]
    Word <- any(sapply(Negations.Clean,grepl,Word))
    Words[[l]] <- Word
  }
  Negation.Detection[[k]] <- Words
}

5. Extract Review and Sentence ID (Part1)

In order to retrieve the sentence, two-dimensional list structure containing the logical elements was used to determine the “location” of the negations in the text. Both the Review ID and the Sentence ID number were saved.

First it was determined which sentences contained a negation:

Neg.Sentences <- list()

for(n in 1:4735){
  #Extract Sentences
  Sentence <- Negation.Detection[[n]]
  Sentence <- unlist(Sentence)
  Sent.Eval <- any(Sentence==TRUE)
  Neg.Sentences[[n]] <- Sent.Eval
}

6. Extract Sentence ID

Second, both the Review and the Sentence ID were saved in the “Index.List” structure.

Sentence.IDs <- which(Neg.Sentences==TRUE)
Sentence.IDs <- as.list(Sentence.IDs)
Sentence.IDs <- unlist(Sentence.IDs)

8 Extract Negative Reviews

Finally, the index list was used to retrieve negative reviews. Those reviews were stored in the structure “Negative.Sentences”.

Negative.Reviews <- Text[Sentence.IDs]
Negative.Reviews <- as.list(Negative.Reviews)

df <- data.frame(matrix(seq(1:420),nrow = 420,ncol=1))
df$Neg.Text <- Negative.Reviews

Finally the negative reviews are extracted as an Excel file for further processing

WriteXLS::WriteXLS(df,ExcelFileName = "1. Negative Fragments.xlsx")

9. Extract ID Vectors

In order to be able to reinsert the tagged sentences both the Review and Sentence ID of sentences containing negations are extracted.

df.Vec <- data.frame(Sentence.IDs)

Those indeces are exported as an Excel file.

WriteXLS::WriteXLS(df.Vec,ExcelFileName = "1. Negative Sentence Indicators.xlsx")

Detecting Negations

Lisa

6/28/2018