La1

Problem Statement

Visualize Twitter sentiment over time using a smoothed line plot

STEP 1: Load the dataset

data <- read.csv("C:/Users/Admin/OneDrive/Desktop/test.csv")
head(data)

     id
1 31963
2 31964
3 31965
4 31966
5 31967
6 31968
                                                                                                                                                                                        tweet
1                                                                                             #studiolife #aislife #requires #passion #dedication #willpower   to find #newmaterialsâ\u0080¦ 
2                                                @user #white #supremacists want everyone to see the new â\u0080\u0098  #birdsâ\u0080\u0099 #movie â\u0080\u0094 and hereâ\u0080\u0099s why  
3                                                                                                                     safe ways to heal your #acne!!    #altwaystoheal #healthy   #healing!! 
4 is the hp and the cursed child book up for reservations already? if yes, where? if no, when? ð\u009f\u0098\u008dð\u009f\u0098\u008dð\u009f\u0098\u008d   #harrypotter #pottermore #favorite
5                                                                                            3rd #bihday to my amazing, hilarious #nephew eli ahmir! uncle dave loves you and missesâ\u0080¦ 
6                                                                                                                                                                 choose to be   :) #momtips

Step 2: Create simple sentiment word lists

Code:

positive_words <- c("good", "great", "happy", "love", "excellent", "best", "nice", "awesome")

negative_words <- c("bad", "worst", "sad", "hate", "terrible", "poor", "angry", "awful")

Explanation: We manually define positive and negative words.
This replaces the need for tidytext.

Step 3: Convert tweets to lowercase

Code:

data$tweet <- tolower(data$tweet)

Explanation:

tolower() : converts all text to lowercase so matching works correctly.

Step 4: Calculate sentiment score for each tweet

Code:

data$score <- sapply(data$tweet, function(x) {
  words <- unlist(strsplit(x, " "))
  pos_count <- sum(words %in% positive_words)
  neg_count <- sum(words %in% negative_words)   
  return(pos_count - neg_count) 
  })

Explanation:

strsplit() splits tweet into words
%in% checks if word is in positive/negative list
score = positive count - negative count

Step 5: Use id as time

Code:

data$time <- data$id

Explanation:

Since no date column exists, id is used as time order.

Step 6: Load ggplot2

Code:

library(ggplot2)

Explanation: Used for creating the graph.

Step 7: Create smoothed line plot

Code:

ggplot(data, aes(x = time, y = score)) +
  geom_smooth(color = "blue", fill = "lightblue") +
  labs(     title = "Twitter Sentiment Over Time",
            x = "Tweet Order",
            y = "Sentiment Score"   ) + 
  theme_minimal()

`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Explanation: ggplot() initializes plot
aes() maps variables
geom_smooth() creates smooth trend
labs() adds labels
theme_minimal() gives clean look

Step 8: Add legend

Code:

ggplot(data, aes(x = time, y = score, color = "Sentiment Trend")) +
  geom_smooth(fill = "lightblue", se = TRUE, method = "loess") +
  labs(     title = "Twitter Sentiment Over Time",
            x = "Tweet Order",     y = "Sentiment Score",
            color = "Legend"   ) +
  theme_minimal() +
  theme(legend.position = "top")

`geom_smooth()` using formula = 'y ~ x'

Explanation:

Creates legend for better presentation.

Interpretation :

The graph shows how sentiment changes across tweets. Since the dataset does not include time information, tweet order is used as a representation of time. Positive scores indicate positive sentiment and negative scores indicate negative sentiment. The smoothed curve helps in identifying overall trends by reducing short-term variations.