The dataset used in this analysis contains information on various pharmaceutical products, including reviews from users regarding the effectiveness of these drugs. The goal of this project is to explore and analyze the dataset to uncover insights into the relationships between different types of user reviews (Excellent, Average, and Poor) and how these reviews vary across different uses and manufacturers.
## Warning: package 'readr' was built under R version 4.4.2
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'ggplot2' was built under R version 4.4.2
## corrplot 0.94 loaded
## Rows: 11825 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Medicine Name, Composition, Uses, Side_effects, Image URL, Manufact...
## dbl (3): Excellent Review %, Average Review %, Poor Review %
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 6 Ă— 9
## `Medicine Name` Composition Uses Side_effects `Image URL` Manufacturer
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Avastin 400mg Injecti… Bevacizuma… Canc… Rectal blee… https://on… Roche Produ…
## 2 Augmentin 625 Duo Tab… Amoxycilli… Trea… Vomiting Na… https://on… Glaxo Smith…
## 3 Azithral 500 Tablet Azithromyc… Trea… Nausea Abdo… https://on… Alembic Pha…
## 4 Ascoril LS Syrup Ambroxol (… Trea… Nausea Vomi… https://on… Glenmark Ph…
## 5 Aciloc 150 Tablet Ranitidine… Trea… Headache Di… https://on… Cadila Phar…
## 6 Allegra 120mg Tablet Fexofenadi… Trea… Headache Dr… https://on… Sanofi Indi…
## # ℹ 3 more variables: `Excellent Review %` <dbl>, `Average Review %` <dbl>,
## # `Poor Review %` <dbl>
## spc_tbl_ [11,825 Ă— 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Medicine Name : chr [1:11825] "Avastin 400mg Injection" "Augmentin 625 Duo Tablet" "Azithral 500 Tablet" "Ascoril LS Syrup" ...
## $ Composition : chr [1:11825] "Bevacizumab (400mg)" "Amoxycillin (500mg) + Clavulanic Acid (125mg)" "Azithromycin (500mg)" "Ambroxol (30mg/5ml) + Levosalbutamol (1mg/5ml) + Guaifenesin (50mg/5ml)" ...
## $ Uses : chr [1:11825] "Cancer of colon and rectum Non-small cell lung cancer Kidney cancer Brain tumor Ovarian cancer Cervical cancer" "Treatment of Bacterial infections" "Treatment of Bacterial infections" "Treatment of Cough with mucus" ...
## $ Side_effects : chr [1:11825] "Rectal bleeding Taste change Headache Nosebleeds Back pain Dry skin High blood pressure Protein in urine Inflam"| __truncated__ "Vomiting Nausea Diarrhea Mucocutaneous candidiasis" "Nausea Abdominal pain Diarrhea" "Nausea Vomiting Diarrhea Upset stomach Stomach pain Allergic reaction Dizziness Headache Rash Hives Tremors Pal"| __truncated__ ...
## $ Image URL : chr [1:11825] "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/f5a26c491e4d48199a"| __truncated__ "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/wy2y9bdipmh6rgkrj0zm.jpg" "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/cropped/kqkouvaqejbyk47dvjfu.jpg" "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/3205599cc49d4073ae"| __truncated__ ...
## $ Manufacturer : chr [1:11825] "Roche Products India Pvt Ltd" "Glaxo SmithKline Pharmaceuticals Ltd" "Alembic Pharmaceuticals Ltd" "Glenmark Pharmaceuticals Ltd" ...
## $ Excellent Review %: num [1:11825] 22 47 39 24 34 35 40 43 36 35 ...
## $ Average Review % : num [1:11825] 56 35 40 41 37 42 34 28 43 41 ...
## $ Poor Review % : num [1:11825] 22 18 21 35 29 23 26 29 21 24 ...
## - attr(*, "spec")=
## .. cols(
## .. `Medicine Name` = col_character(),
## .. Composition = col_character(),
## .. Uses = col_character(),
## .. Side_effects = col_character(),
## .. `Image URL` = col_character(),
## .. Manufacturer = col_character(),
## .. `Excellent Review %` = col_double(),
## .. `Average Review %` = col_double(),
## .. `Poor Review %` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
## Medicine Name Composition Uses Side_effects
## Length:11825 Length:11825 Length:11825 Length:11825
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Image URL Manufacturer Excellent Review % Average Review %
## Length:11825 Length:11825 Min. : 0.00 Min. : 0.00
## Class :character Class :character 1st Qu.: 22.00 1st Qu.:27.00
## Mode :character Mode :character Median : 34.00 Median :35.00
## Mean : 38.52 Mean :35.76
## 3rd Qu.: 51.00 3rd Qu.:47.00
## Max. :100.00 Max. :88.00
## Poor Review %
## Min. : 0.00
## 1st Qu.: 0.00
## Median : 22.00
## Mean : 25.73
## 3rd Qu.: 35.00
## Max. :100.00
# Handle missing values
data_cleaned=data %>%
filter(!is.na(`Excellent Review %`), !is.na(`Average Review %`), !is.na(`Poor Review %`))
# Convert factors to characters or numeric as necessary
data_cleaned$`Excellent Review %` <- as.numeric(data_cleaned$`Excellent Review %`)
data_cleaned$`Average Review %` <- as.numeric(data_cleaned$`Average Review %`)
data_cleaned$`Poor Review %` <- as.numeric(data_cleaned$`Poor Review %`)
# Remove any rows with missing data (optional, depending on your needs)
data_cleaned=na.omit(data_cleaned)
# Calculate the mean and standard deviation for review percentages
mean(data_cleaned$`Excellent Review %`)
## [1] 38.51603
## [1] 25.22534
## [1] 35.75636
## [1] 18.26813
## [1] 25.72761
## [1] 23.99198
# Group by 'Uses' and calculate average review percentages
data_summary=data_cleaned %>%
group_by(Uses) %>%
summarise(
avg_excellent = mean(`Excellent Review %`, na.rm = TRUE),
avg_average = mean(`Average Review %`, na.rm = TRUE),
avg_poor = mean(`Poor Review %`, na.rm = TRUE)
)
# View summary data
print(data_summary)
## # A tibble: 712 Ă— 4
## Uses avg_excellent avg_average avg_poor
## <chr> <dbl> <dbl> <dbl>
## 1 Abdominal cramp 39.8 35.5 24.8
## 2 Abdominal pain 35.3 39.7 25
## 3 Abdominal pain Irritable bowel syndrome 73 27 0
## 4 Acidity Heartburn Anxiety 26 57 17
## 5 Acidity Heartburn Stomach ulcers 34.6 37.7 27.7
## 6 Acidity Indigestion 33 53 14
## 7 Acidity Stomach ulcers 22 34 44
## 8 Acidity Stomach ulcers Bloating 47.7 42.8 9.54
## 9 Acne 28.9 41.3 29.8
## 10 AcneTreatment of Bacterial infectionsTrea… 28 29 43
## # ℹ 702 more rows
# Scatter plot of Excellent vs Poor Reviews
ggplot(data_cleaned, aes(x = `Excellent Review %`, y = `Poor Review %`)) +
geom_point(color = "blue") +
labs(title = "Excellent vs Poor Reviews",
x = "Excellent Review %",
y = "Poor Review %")
# Histogram for Excellent Review Percentage
ggplot(data, aes(x = `Excellent Review %`)) +
geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
labs(title = "Distribution of Excellent Reviews",
x = "Excellent Review %",
y = "Frequency")
# Histogram for Average Review Percentage
ggplot(data, aes(x = `Average Review %`)) +
geom_histogram(binwidth = 5, fill = "lightgreen", color = "black") +
labs(title = "Distribution of Average Reviews",
x = "Average Review %",
y = "Frequency")
# Histogram for Poor Review Percentage
ggplot(data, aes(x = `Poor Review %`)) +
geom_histogram(binwidth = 5, fill = "lightcoral", color = "black") +
labs(title = "Distribution of Poor Reviews",
x = "Poor Review %",
y = "Frequency")
# Compute correlation matrix for review percentages
cor_matrix=cor(data_cleaned[, c("Excellent Review %",
"Average Review %", "Poor Review %")], use = "complete.obs")
# Print the correlation matrix
print(cor_matrix)
## Excellent Review % Average Review % Poor Review %
## Excellent Review % 1.0000000 -0.4279625 -0.7255451
## Average Review % -0.4279625 1.0000000 -0.3114637
## Poor Review % -0.7255451 -0.3114637 1.0000000
# Visualize the correlation matrix
corrplot(cor_matrix, method = "circle", type = "upper", tl.col = "black", tl.srt = 45)
In this analysis, we explored the drug review dataset by cleaning the data, calculating descriptive statistics, and visualizing the review percentages. Additionally, we examined the relationships between different types of reviews and grouped the data by the drug’s uses. The analysis helps to understand the overall sentiment of reviews for various drugs and identify potential areas for improvement based on user feedback.