R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Part 1: Build at least three sets of variable combinations

# Load the necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load your data
data <- read.csv("C:\\Users\\mansi\\Downloads\\news+popularity+in+multiple+social+media+platforms\\News_Popularity_in_Multiple_Social_Media_Platforms.csv")

# A numeric summary of data for at least 10 columns
summary(data)
##      IDLink          Title             Headline            Source         
##  Min.   :     1   Length:93239       Length:93239       Length:93239      
##  1st Qu.: 24302   Class :character   Class :character   Class :character  
##  Median : 52275   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 51561                                                           
##  3rd Qu.: 76586                                                           
##  Max.   :104802                                                           
##     Topic           PublishDate        SentimentTitle      SentimentHeadline 
##  Length:93239       Length:93239       Min.   :-0.950694   Min.   :-0.75543  
##  Class :character   Class :character   1st Qu.:-0.079057   1st Qu.:-0.11457  
##  Mode  :character   Mode  :character   Median : 0.000000   Median :-0.02606  
##                                        Mean   :-0.005411   Mean   :-0.02749  
##                                        3rd Qu.: 0.064255   3rd Qu.: 0.05971  
##                                        Max.   : 0.962354   Max.   : 0.96465  
##     Facebook         GooglePlus          LinkedIn       
##  Min.   :   -1.0   Min.   :  -1.000   Min.   :   -1.00  
##  1st Qu.:    0.0   1st Qu.:   0.000   1st Qu.:    0.00  
##  Median :    5.0   Median :   0.000   Median :    0.00  
##  Mean   :  113.1   Mean   :   3.888   Mean   :   16.55  
##  3rd Qu.:   33.0   3rd Qu.:   2.000   3rd Qu.:    4.00  
##  Max.   :49211.0   Max.   :1267.000   Max.   :20341.00
# Set the seed for reproducibility
set.seed(123)

# Create Set 1: Variable Combination
set1 <- data %>%
  select(SentimentTitle, SentimentHeadline, Facebook, GooglePlus, LinkedIn) %>%
  mutate(
    # Create a new calculated variable
    CombinedSentiment = SentimentTitle + SentimentHeadline,
    ResponseVariable = Facebook
  )

# Create Set 2: Variable Combination
set2 <- data %>%
  select(SentimentTitle, SentimentHeadline, Facebook, GooglePlus, LinkedIn) %>%
  mutate(
    # Create a new calculated variable
    TotalSocial = Facebook + GooglePlus + LinkedIn,
    ResponseVariable = SentimentTitle
  )

# Create Set 3: Variable Combination
set3 <- data %>%
  select(SentimentTitle, SentimentHeadline, Facebook, GooglePlus, LinkedIn) %>%
  mutate(
    # Create a new calculated variable
    TotalSentiment = SentimentTitle + SentimentHeadline,
    ResponseVariable = GooglePlus
  )

# Print the first few rows of each set
head(set1)
##   SentimentTitle SentimentHeadline Facebook GooglePlus LinkedIn
## 1     0.00000000       -0.05330018       -1         -1       -1
## 2     0.20833333       -0.15638581       -1         -1       -1
## 3    -0.42521003        0.13975425       -1         -1       -1
## 4     0.00000000        0.02606430       -1         -1       -1
## 5     0.00000000        0.14108446       -1         -1       -1
## 6    -0.07537784        0.03677279       -1         -1       -1
##   CombinedSentiment ResponseVariable
## 1       -0.05330018               -1
## 2        0.05194752               -1
## 3       -0.28545578               -1
## 4        0.02606430               -1
## 5        0.14108446               -1
## 6       -0.03860504               -1
head(set2)
##   SentimentTitle SentimentHeadline Facebook GooglePlus LinkedIn TotalSocial
## 1     0.00000000       -0.05330018       -1         -1       -1          -3
## 2     0.20833333       -0.15638581       -1         -1       -1          -3
## 3    -0.42521003        0.13975425       -1         -1       -1          -3
## 4     0.00000000        0.02606430       -1         -1       -1          -3
## 5     0.00000000        0.14108446       -1         -1       -1          -3
## 6    -0.07537784        0.03677279       -1         -1       -1          -3
##   ResponseVariable
## 1       0.00000000
## 2       0.20833333
## 3      -0.42521003
## 4       0.00000000
## 5       0.00000000
## 6      -0.07537784
head(set3)
##   SentimentTitle SentimentHeadline Facebook GooglePlus LinkedIn TotalSentiment
## 1     0.00000000       -0.05330018       -1         -1       -1    -0.05330018
## 2     0.20833333       -0.15638581       -1         -1       -1     0.05194752
## 3    -0.42521003        0.13975425       -1         -1       -1    -0.28545578
## 4     0.00000000        0.02606430       -1         -1       -1     0.02606430
## 5     0.00000000        0.14108446       -1         -1       -1     0.14108446
## 6    -0.07537784        0.03677279       -1         -1       -1    -0.03860504
##   ResponseVariable
## 1               -1
## 2               -1
## 3               -1
## 4               -1
## 5               -1
## 6               -1

Part 2: Plot a visualization for each response-explanatory relationship, and draw some conclusions based on the plot

# Plot a visualization for each response-explanatory relationship
# Visualization for Set 1 (CombinedSentiment vs. Facebook)
plot(set1$CombinedSentiment, set1$Facebook, main = "CombinedSentiment vs. Facebook", 
     xlab = "CombinedSentiment", ylab = "Facebook", col = "blue")

# Visualization for Set 2 (TotalSocial vs. SentimentTitle)
plot(set2$TotalSocial, set2$SentimentTitle, main = "TotalSocial vs. SentimentTitle", 
     xlab = "TotalSocial", ylab = "SentimentTitle", col = "green")

# Visualization for Set 3 (TotalSentiment vs. GooglePlus)
plot(set3$TotalSentiment, set3$GooglePlus, main = "TotalSentiment vs. GooglePlus", 
     xlab = "TotalSentiment", ylab = "GooglePlus", col = "red")

Based on my analysis of the above 3 plots, there are no significant outliers. The data points seem to be uniformly distributed, with no single point sticking out significantly from the rest.

Part 3 : Calculate the appropriate correlation coefficient for each of these combinations

# Calculate the appropriate correlation coefficient
correlation_set1 <- cor(set1$CombinedSentiment, set1$Facebook)
correlation_set2 <- cor(set2$TotalSocial, set2$SentimentTitle)
correlation_set3 <- cor(set3$TotalSentiment, set3$GooglePlus)

# Print the correlation coefficients
cat("Correlation Set 1:", correlation_set1, "\n")
## Correlation Set 1: -0.002122828
cat("Correlation Set 2:", correlation_set2, "\n")
## Correlation Set 2: -0.003068925
cat("Correlation Set 3:", correlation_set3, "\n")
## Correlation Set 3: -0.005385344

Part 4: Build a confidence interval for each of the response variables. Provide a detailed conclusion of the response variable (i.e., the population) based on your confidence interval.

# Build confidence intervals for the response variables
# Confidence interval for Set 1 (Facebook)
conf_interval_set1 <- t.test(set1$Facebook)$conf.int
cat("Confidence Interval for Facebook (Set 1):", conf_interval_set1, "\n")
## Confidence Interval for Facebook (Set 1): 109.1606 117.1221
# Confidence interval for Set 2 (SentimentTitle)
conf_interval_set2 <- t.test(set2$SentimentTitle)$conf.int
cat("Confidence Interval for SentimentTitle (Set 2):", conf_interval_set2, "\n")
## Confidence Interval for SentimentTitle (Set 2): -0.006287091 -0.00453564
# Confidence interval for Set 3 (GooglePlus)
conf_interval_set3 <- t.test(set3$GooglePlus)$conf.int
cat("Confidence Interval for GooglePlus (Set 3):", conf_interval_set3, "\n")
## Confidence Interval for GooglePlus (Set 3): 3.769661 4.007063

Confidence intervals provide a range of values within which the population parameter (in this case, the population mean) is likely to fall. The above code creates confidence intervals for the response variables in each set:-

Confidence Interval for Facebook (Set 1): This interval estimates the range within which the true mean of Facebook engagement is likely to lie. The confidence interval gives the sense of the precision of your sample mean estimate.

Confidence Interval for SentimentTitle (Set 2): Similar to the previous interval, this one estimates the range for the true mean of SentimentTitle. It tells how confident to be about the population mean based on your sample data.

Confidence Interval for GooglePlus (Set 3): This interval provides an estimate for the true mean of GooglePlus engagement. Like the others, it helps to understand the likely range of the population mean.