1.Introduction

The dataset used in this analysis contains information on various pharmaceutical products, including reviews from users regarding the effectiveness of these drugs. The goal of this project is to explore and analyze the dataset to uncover insights into the relationships between different types of user reviews (Excellent, Average, and Poor) and how these reviews vary across different uses and manufacturers.

Packages

# Load essential Package 
library(readr)
## Warning: package 'readr' was built under R version 4.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(corrplot)
## corrplot 0.94 loaded

2.Data Import and Inspection

# Import the dataset using readr
data=read_csv("D:/r codes/Medicine_Details.csv")
## Rows: 11825 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Medicine Name, Composition, Uses, Side_effects, Image URL, Manufact...
## dbl (3): Excellent Review %, Average Review %, Poor Review %
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Check the first few rows of the dataset
head(data)
## # A tibble: 6 Ă— 9
##   `Medicine Name`        Composition Uses  Side_effects `Image URL` Manufacturer
##   <chr>                  <chr>       <chr> <chr>        <chr>       <chr>       
## 1 Avastin 400mg Injecti… Bevacizuma… Canc… Rectal blee… https://on… Roche Produ…
## 2 Augmentin 625 Duo Tab… Amoxycilli… Trea… Vomiting Na… https://on… Glaxo Smith…
## 3 Azithral 500 Tablet    Azithromyc… Trea… Nausea Abdo… https://on… Alembic Pha…
## 4 Ascoril LS Syrup       Ambroxol (… Trea… Nausea Vomi… https://on… Glenmark Ph…
## 5 Aciloc 150 Tablet      Ranitidine… Trea… Headache Di… https://on… Cadila Phar…
## 6 Allegra 120mg Tablet   Fexofenadi… Trea… Headache Dr… https://on… Sanofi Indi…
## # ℹ 3 more variables: `Excellent Review %` <dbl>, `Average Review %` <dbl>,
## #   `Poor Review %` <dbl>
# Check the structure of the dataset to understand data types
str(data)
## spc_tbl_ [11,825 Ă— 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Medicine Name     : chr [1:11825] "Avastin 400mg Injection" "Augmentin 625 Duo Tablet" "Azithral 500 Tablet" "Ascoril LS Syrup" ...
##  $ Composition       : chr [1:11825] "Bevacizumab (400mg)" "Amoxycillin  (500mg) +  Clavulanic Acid (125mg)" "Azithromycin (500mg)" "Ambroxol (30mg/5ml) + Levosalbutamol (1mg/5ml) + Guaifenesin (50mg/5ml)" ...
##  $ Uses              : chr [1:11825] "Cancer of colon and rectum Non-small cell lung cancer Kidney cancer Brain tumor Ovarian cancer Cervical cancer" "Treatment of Bacterial infections" "Treatment of Bacterial infections" "Treatment of Cough with mucus" ...
##  $ Side_effects      : chr [1:11825] "Rectal bleeding Taste change Headache Nosebleeds Back pain Dry skin High blood pressure Protein in urine Inflam"| __truncated__ "Vomiting Nausea Diarrhea Mucocutaneous candidiasis" "Nausea Abdominal pain Diarrhea" "Nausea Vomiting Diarrhea Upset stomach Stomach pain Allergic reaction Dizziness Headache Rash Hives Tremors Pal"| __truncated__ ...
##  $ Image URL         : chr [1:11825] "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/f5a26c491e4d48199a"| __truncated__ "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/wy2y9bdipmh6rgkrj0zm.jpg" "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/cropped/kqkouvaqejbyk47dvjfu.jpg" "https://onemg.gumlet.io/l_watermark_346,w_480,h_480/a_ignore,w_480,h_480,c_fit,q_auto,f_auto/3205599cc49d4073ae"| __truncated__ ...
##  $ Manufacturer      : chr [1:11825] "Roche Products India Pvt Ltd" "Glaxo SmithKline Pharmaceuticals Ltd" "Alembic Pharmaceuticals Ltd" "Glenmark Pharmaceuticals Ltd" ...
##  $ Excellent Review %: num [1:11825] 22 47 39 24 34 35 40 43 36 35 ...
##  $ Average Review %  : num [1:11825] 56 35 40 41 37 42 34 28 43 41 ...
##  $ Poor Review %     : num [1:11825] 22 18 21 35 29 23 26 29 21 24 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Medicine Name` = col_character(),
##   ..   Composition = col_character(),
##   ..   Uses = col_character(),
##   ..   Side_effects = col_character(),
##   ..   `Image URL` = col_character(),
##   ..   Manufacturer = col_character(),
##   ..   `Excellent Review %` = col_double(),
##   ..   `Average Review %` = col_double(),
##   ..   `Poor Review %` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# Summary statistics of the dataset
summary(data)
##  Medicine Name      Composition            Uses           Side_effects      
##  Length:11825       Length:11825       Length:11825       Length:11825      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##   Image URL         Manufacturer       Excellent Review % Average Review %
##  Length:11825       Length:11825       Min.   :  0.00     Min.   : 0.00   
##  Class :character   Class :character   1st Qu.: 22.00     1st Qu.:27.00   
##  Mode  :character   Mode  :character   Median : 34.00     Median :35.00   
##                                        Mean   : 38.52     Mean   :35.76   
##                                        3rd Qu.: 51.00     3rd Qu.:47.00   
##                                        Max.   :100.00     Max.   :88.00   
##  Poor Review %   
##  Min.   :  0.00  
##  1st Qu.:  0.00  
##  Median : 22.00  
##  Mean   : 25.73  
##  3rd Qu.: 35.00  
##  Max.   :100.00

3.Data Cleaning and Transformation

# Handle missing values
data_cleaned=data %>%
  filter(!is.na(`Excellent Review %`), !is.na(`Average Review %`), !is.na(`Poor Review %`))

# Convert factors to characters or numeric as necessary
data_cleaned$`Excellent Review %` <- as.numeric(data_cleaned$`Excellent Review %`)
data_cleaned$`Average Review %` <- as.numeric(data_cleaned$`Average Review %`)
data_cleaned$`Poor Review %` <- as.numeric(data_cleaned$`Poor Review %`)

# Remove any rows with missing data (optional, depending on your needs)
data_cleaned=na.omit(data_cleaned)

4.Descriptive Statistics

# Calculate the mean and standard deviation for review percentages
mean(data_cleaned$`Excellent Review %`)
## [1] 38.51603
sd(data_cleaned$`Excellent Review %`)
## [1] 25.22534
mean(data_cleaned$`Average Review %`)
## [1] 35.75636
sd(data_cleaned$`Average Review %`)
## [1] 18.26813
mean(data_cleaned$`Poor Review %`)
## [1] 25.72761
sd(data_cleaned$`Poor Review %`)
## [1] 23.99198

5.Grouped Analysis by Uses

# Group by 'Uses' and calculate average review percentages
data_summary=data_cleaned %>%
  group_by(Uses) %>%
  summarise(
    avg_excellent = mean(`Excellent Review %`, na.rm = TRUE),
    avg_average = mean(`Average Review %`, na.rm = TRUE),
    avg_poor = mean(`Poor Review %`, na.rm = TRUE)
  )

# View summary data
print(data_summary)
## # A tibble: 712 Ă— 4
##    Uses                                       avg_excellent avg_average avg_poor
##    <chr>                                              <dbl>       <dbl>    <dbl>
##  1 Abdominal cramp                                     39.8        35.5    24.8 
##  2 Abdominal pain                                      35.3        39.7    25   
##  3 Abdominal pain Irritable bowel syndrome             73          27       0   
##  4 Acidity Heartburn Anxiety                           26          57      17   
##  5 Acidity Heartburn Stomach ulcers                    34.6        37.7    27.7 
##  6 Acidity Indigestion                                 33          53      14   
##  7 Acidity Stomach ulcers                              22          34      44   
##  8 Acidity Stomach ulcers Bloating                     47.7        42.8     9.54
##  9 Acne                                                28.9        41.3    29.8 
## 10 AcneTreatment of Bacterial infectionsTrea…          28          29      43   
## # ℹ 702 more rows

6.Data Visualizations

# Scatter plot of Excellent vs Poor Reviews
ggplot(data_cleaned, aes(x = `Excellent Review %`, y = `Poor Review %`)) +
  geom_point(color = "blue") +
  labs(title = "Excellent vs Poor Reviews",
       x = "Excellent Review %", 
       y = "Poor Review %")

# Histogram for Excellent Review Percentage
ggplot(data, aes(x = `Excellent Review %`)) +
  geom_histogram(binwidth = 5, fill = "skyblue", color = "black") +
  labs(title = "Distribution of Excellent Reviews", 
       x = "Excellent Review %", 
       y = "Frequency")

# Histogram for Average Review Percentage
ggplot(data, aes(x = `Average Review %`)) +
  geom_histogram(binwidth = 5, fill = "lightgreen", color = "black") +
  labs(title = "Distribution of Average Reviews", 
       x = "Average Review %", 
       y = "Frequency")

# Histogram for Poor Review Percentage
ggplot(data, aes(x = `Poor Review %`)) +
  geom_histogram(binwidth = 5, fill = "lightcoral", color = "black") +
  labs(title = "Distribution of Poor Reviews", 
       x = "Poor Review %", 
       y = "Frequency")

7.Correlation Analysis

# Compute correlation matrix for review percentages
cor_matrix=cor(data_cleaned[, c("Excellent Review %", 
              "Average Review %", "Poor Review %")], use = "complete.obs")

# Print the correlation matrix
print(cor_matrix)
##                    Excellent Review % Average Review % Poor Review %
## Excellent Review %          1.0000000       -0.4279625    -0.7255451
## Average Review %           -0.4279625        1.0000000    -0.3114637
## Poor Review %              -0.7255451       -0.3114637     1.0000000
# Visualize the correlation matrix
corrplot(cor_matrix, method = "circle", type = "upper", tl.col = "black", tl.srt = 45)

8.Conclusion

In this analysis, we explored the drug review dataset by cleaning the data, calculating descriptive statistics, and visualizing the review percentages. Additionally, we examined the relationships between different types of reviews and grouped the data by the drug’s uses. The analysis helps to understand the overall sentiment of reviews for various drugs and identify potential areas for improvement based on user feedback.