Develop a script in R to plot a diverging bar chart to show positive and negative sentiment in public opinion survey data using ggplot2.
What we will do?
Load required libraries
Load and inspect dataset
Perform exploratory data analysis
Transform data into long format
Prepare values for diverging visualization Create and enhance the diverging bar chart
Step 1: Load required libraries
In this step, we load the required libraries. ggplot2 is used for visualization, while dplyr and tidyr help in data manipulation and reshaping. These packages are part of the tidyverse ecosystem and simplify data analysis.
library(ggplot2)library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.3
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)
Step 2: Load and inspect the dataset
Here, we load the dataset using read.csv() and preview it using head(). This helps in understanding the variables, structure, and type of data present in the dataset.
In this step, we analyze the structure and summary of the dataset. The str() function shows data types and structure, while summary() provides statistical insights such as mean, minimum, and maximum values.
str(data)
'data.frame': 17 obs. of 6 variables:
$ States : chr "Andhra Pradesh " "Arunachal Pradesh " "Bihar" "Haryana " ...
$ Opinion.of.Children...Quality.of.Meal..in.percent....Good : num 88.41 99.37 5.64 55 90 ...
$ Opinion.of.Children...Quality.of.Meal..in.percent....Average: num 9.92 0.63 22.06 44.5 9.55 ...
$ Opinion.of.Children...Quality.of.Meal..in.percent....Poor : num 1.68 0 72.3 0.5 0.45 ...
$ Opinion.of.Children...Satisfaction..in.percent....Yes : num 93.8 100 22.1 99.5 99.5 ...
$ Opinion.of.Children...Satisfaction..in.percent....No : num 6.15 NA 77.89 0.5 0.46 ...
summary(data)
States Opinion.of.Children...Quality.of.Meal..in.percent....Good
Length:17 Min. : 4.61
Class :character 1st Qu.: 55.85
Mode :character Median : 80.21
Mean : 70.45
3rd Qu.: 90.00
Max. :100.00
Opinion.of.Children...Quality.of.Meal..in.percent....Average
Min. : 0.00
1st Qu.: 7.29
Median :16.00
Mean :22.03
3rd Qu.:22.06
Max. :89.72
Opinion.of.Children...Quality.of.Meal..in.percent....Poor
Min. : 0.000
1st Qu.: 0.545
Median : 1.525
Mean : 7.989
3rd Qu.: 7.112
Max. :72.300
NA's :1
Opinion.of.Children...Satisfaction..in.percent....Yes
Min. : 22.11
1st Qu.: 78.39
Median : 93.85
Mean : 86.56
3rd Qu.: 98.73
Max. :100.00
Opinion.of.Children...Satisfaction..in.percent....No
Min. : 0.000
1st Qu.: 1.817
Median : 6.920
Mean :14.283
3rd Qu.:21.872
Max. :77.890
NA's :1
Step 4: Prepare the data for diverging bar chart
Here, the dataset is converted from wide format to long format using pivot_longer(), making it suitable for ggplot. Negative sentiment values are converted to negative numbers so that they appear on the opposite side of the chart, creating a diverging effect.
In this step, we create the diverging bar chart using geom_bar(). Positive and negative values are displayed on opposite sides of the axis. The coord_flip() function improves readability by making the chart horizontal. Labels and themes are added to enhance clarity and presentation.
ggplot(): Initializes the plot by defining the dataset and mapping variables to aesthetics such as x-axis, y-axis, and fill. It sets the foundation for the entire visualization.
ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment))
geom_bar(): Adds the bar chart layer. Using stat = "identity" ensures that actual values from the dataset are used instead of counts.
ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +geom_bar(stat ="identity")
coord_flip(): Flips the axes to make the bar chart horizontal. This improves readability, especially when category names are long.
ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +geom_bar(stat ="identity") +coord_flip()
theme_minimal(): Applies a clean and simple theme to the plot by removing unnecessary background elements and grid clutter.
ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +geom_bar(stat ="identity") +coord_flip() +theme_minimal() +theme(legend.position ="top")
scale_y_continuous(): Modifies the y-axis scale. Using labels = abs ensures that negative values are displayed as positive percentages for better interpretation. labs(): Adds descriptive labels such as title, x-axis label, and y-axis label, improving the clarity and presentation of the plot.
ggplot(data_long, aes(x = States, y = Percentage, fill = Sentiment)) +geom_bar(stat ="identity") +coord_flip() +theme_minimal() +theme(legend.position ="top") +scale_y_continuous(labels = abs) +labs(title ="Diverging Bar Chart of Public Opinion",x ="Category",y ="Percentage")