LA

Author

Aditi Hurkat, Sagar Tripathi

Problem statement:

Develop a script in R to plot a diverging bar chart to show positive and negative sentiment in public opinion survey data using ggplot2.

What we will do?

  1. Load required libraries
  2. Load and inspect dataset
  3. Perform exploratory data analysis
  4. Transform data into long format
  5. Prepare values for diverging visualization Create and enhance the diverging bar chart

Step 1: Load required libraries

In this step, we load the required libraries. ggplot2 is used for visualization, while dplyr and tidyr help in data manipulation and reshaping. These packages are part of the tidyverse ecosystem and simplify data analysis.

library(ggplot2)
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.3

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

Step 2: Load and inspect the dataset

Here, we load the dataset using read.csv() and preview it using head(). This helps in understanding the variables, structure, and type of data present in the dataset.

data <- read.csv(file.choose())

head(data)
              States Opinion.of.Children...Quality.of.Meal..in.percent....Good
1    Andhra Pradesh                                                      88.41
2 Arunachal Pradesh                                                      99.37
3              Bihar                                                      5.64
4           Haryana                                                      55.00
5  Himachal Pradesh                                                      90.00
6    Jammu & Kashmir                                                     97.00
  Opinion.of.Children...Quality.of.Meal..in.percent....Average
1                                                         9.92
2                                                         0.63
3                                                        22.06
4                                                        44.50
5                                                         9.55
6                                                         3.00
  Opinion.of.Children...Quality.of.Meal..in.percent....Poor
1                                                      1.68
2                                                      0.00
3                                                     72.30
4                                                      0.50
5                                                      0.45
6                                                        NA
  Opinion.of.Children...Satisfaction..in.percent....Yes
1                                                 93.85
2                                                100.00
3                                                 22.11
4                                                 99.50
5                                                 99.54
6                                                 98.00
  Opinion.of.Children...Satisfaction..in.percent....No
1                                                 6.15
2                                                   NA
3                                                77.89
4                                                 0.50
5                                                 0.46
6                                                 2.00

Step 3: Exploratory Data Analysis

In this step, we analyze the structure and summary of the dataset. The str() function shows data types and structure, while summary() provides statistical insights such as mean, minimum, and maximum values.

str(data)
'data.frame':   17 obs. of  6 variables:
 $ States                                                      : chr  "Andhra Pradesh " "Arunachal Pradesh " "Bihar" "Haryana " ...
 $ Opinion.of.Children...Quality.of.Meal..in.percent....Good   : num  88.41 99.37 5.64 55 90 ...
 $ Opinion.of.Children...Quality.of.Meal..in.percent....Average: num  9.92 0.63 22.06 44.5 9.55 ...
 $ Opinion.of.Children...Quality.of.Meal..in.percent....Poor   : num  1.68 0 72.3 0.5 0.45 ...
 $ Opinion.of.Children...Satisfaction..in.percent....Yes       : num  93.8 100 22.1 99.5 99.5 ...
 $ Opinion.of.Children...Satisfaction..in.percent....No        : num  6.15 NA 77.89 0.5 0.46 ...
summary(data)
    States          Opinion.of.Children...Quality.of.Meal..in.percent....Good
 Length:17          Min.   :  4.61                                           
 Class :character   1st Qu.: 55.85                                           
 Mode  :character   Median : 80.21                                           
                    Mean   : 70.45                                           
                    3rd Qu.: 90.00                                           
                    Max.   :100.00                                           
                                                                             
 Opinion.of.Children...Quality.of.Meal..in.percent....Average
 Min.   : 0.00                                               
 1st Qu.: 7.29                                               
 Median :16.00                                               
 Mean   :22.03                                               
 3rd Qu.:22.06                                               
 Max.   :89.72                                               
                                                             
 Opinion.of.Children...Quality.of.Meal..in.percent....Poor
 Min.   : 0.000                                           
 1st Qu.: 0.545                                           
 Median : 1.525                                           
 Mean   : 7.989                                           
 3rd Qu.: 7.112                                           
 Max.   :72.300                                           
 NA's   :1                                                
 Opinion.of.Children...Satisfaction..in.percent....Yes
 Min.   : 22.11                                       
 1st Qu.: 78.39                                       
 Median : 93.85                                       
 Mean   : 86.56                                       
 3rd Qu.: 98.73                                       
 Max.   :100.00                                       
                                                      
 Opinion.of.Children...Satisfaction..in.percent....No
 Min.   : 0.000                                      
 1st Qu.: 1.817                                      
 Median : 6.920                                      
 Mean   :14.283                                      
 3rd Qu.:21.872                                      
 Max.   :77.890                                      
 NA's   :1                                           

Step 4: Prepare the data for diverging bar chart

Here, the dataset is converted from wide format to long format using pivot_longer(), making it suitable for ggplot. Negative sentiment values are converted to negative numbers so that they appear on the opposite side of the chart, creating a diverging effect.

data_long <- data %>%
  pivot_longer(
    cols = c(
      Opinion.of.Children...Satisfaction..in.percent....Yes,
      Opinion.of.Children...Satisfaction..in.percent....No
    ),
    names_to = "Sentiment",
    values_to = "Percentage"
  )

# Convert "No" values to negative (VERY IMPORTANT)
data_long$Percentage <- ifelse(
  data_long$Sentiment == "Opinion.of.Children...Satisfaction..in.percent....No",
  -data_long$Percentage,
  data_long$Percentage
)

# Remove missing values
data_long <- na.omit(data_long)

Step 5: Examine the transformed data

In this step, we verify the transformed dataset to ensure correct structure and values. This helps confirm that the data is ready for visualization.

str(data_long)
tibble [31 × 6] (S3: tbl_df/tbl/data.frame)
 $ States                                                      : chr [1:31] "Andhra Pradesh " "Andhra Pradesh " "Arunachal Pradesh " "Bihar" ...
 $ Opinion.of.Children...Quality.of.Meal..in.percent....Good   : num [1:31] 88.41 88.41 99.37 5.64 5.64 ...
 $ Opinion.of.Children...Quality.of.Meal..in.percent....Average: num [1:31] 9.92 9.92 0.63 22.06 22.06 ...
 $ Opinion.of.Children...Quality.of.Meal..in.percent....Poor   : num [1:31] 1.68 1.68 0 72.3 72.3 ...
 $ Sentiment                                                   : chr [1:31] "Opinion.of.Children...Satisfaction..in.percent....Yes" "Opinion.of.Children...Satisfaction..in.percent....No" "Opinion.of.Children...Satisfaction..in.percent....Yes" "Opinion.of.Children...Satisfaction..in.percent....Yes" ...
 $ Percentage                                                  : num [1:31] 93.85 -6.15 100 22.11 -77.89 ...
 - attr(*, "na.action")= 'omit' Named int [1:3] 4 11 12
  ..- attr(*, "names")= chr [1:3] "4" "11" "12"
head(data_long)
# A tibble: 6 × 6
  States    Opinion.of.Children.…¹ Opinion.of.Children.…² Opinion.of.Children.…³
  <chr>                      <dbl>                  <dbl>                  <dbl>
1 "Andhra …                  88.4                    9.92                   1.68
2 "Andhra …                  88.4                    9.92                   1.68
3 "Arunach…                  99.4                    0.63                   0   
4 "Bihar"                     5.64                  22.1                   72.3 
5 "Bihar"                     5.64                  22.1                   72.3 
6 "Haryana…                  55                     44.5                    0.5 
# ℹ abbreviated names:
#   ¹​Opinion.of.Children...Quality.of.Meal..in.percent....Good,
#   ²​Opinion.of.Children...Quality.of.Meal..in.percent....Average,
#   ³​Opinion.of.Children...Quality.of.Meal..in.percent....Poor
# ℹ 2 more variables: Sentiment <chr>, Percentage <dbl>

Step 6: Create a diverging bar chart

In this step, we create the diverging bar chart using geom_bar(). Positive and negative values are displayed on opposite sides of the axis. The coord_flip() function improves readability by making the chart horizontal. Labels and themes are added to enhance clarity and presentation.

ggplot(): Initializes the plot by defining the dataset and mapping variables to aesthetics such as x-axis, y-axis, and fill. It sets the foundation for the entire visualization.

ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment))

geom_bar(): Adds the bar chart layer. Using stat = "identity" ensures that actual values from the dataset are used instead of counts.

ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +
  geom_bar(stat = "identity")

coord_flip(): Flips the axes to make the bar chart horizontal. This improves readability, especially when category names are long.

ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +
  geom_bar(stat = "identity") +
  coord_flip()

theme_minimal(): Applies a clean and simple theme to the plot by removing unnecessary background elements and grid clutter.

ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  theme(legend.position = "top")

scale_y_continuous(): Modifies the y-axis scale. Using labels = abs ensures that negative values are displayed as positive percentages for better interpretation. labs(): Adds descriptive labels such as title, x-axis label, and y-axis label, improving the clarity and presentation of the plot.

ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  theme(legend.position = "top") +
  scale_y_continuous(labels = abs) +
  labs(title = "Diverging Bar Chart of Public Opinion",
       x = "Category",
       y = "Percentage")