library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.4.3
## corrplot 0.95 loaded
library(readr)
brain_tumor_data <- read_csv("C:/Users/hp/OneDrive/Desktop/project 1 eda/Brain_Tumor_Prediction_Dataset.csv")
## Rows: 250000 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (17): Gender, Country, Tumor_Location, MRI_Findings, Smoking_History, Al...
## dbl (4): Age, Tumor_Size, Genetic_Risk, Survival_Rate(%)
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(brain_tumor_data)
colSums(is.na(brain_tumor_data))
## Age Gender Country Tumor_Size
## 0 0 0 0
## Tumor_Location MRI_Findings Genetic_Risk Smoking_History
## 0 0 0 0
## Alcohol_Consumption Radiation_Exposure Head_Injury_History Chronic_Illness
## 0 0 0 0
## Blood_Pressure Diabetes Tumor_Type Treatment_Received
## 0 0 0 0
## Survival_Rate(%) Tumor_Growth_Rate Family_History Symptom_Severity
## 0 0 0 0
## Brain_Tumor_Present
## 0
Interpretation
–The dataset, named “Brain_Tumor_Prediction_Dataset.csv”, is successfully loaded using read_csv().
–It contains 250,000 rows and 21 columns.
– Columns contain categorical data (e.g., Gender, Country, Tumor Type) and others contain numerical data (e.g., Age, Tumor Size, Genetic Risk, Survival Rate).
– The columns include information such as:
– Demographics: Age, Gender, Country
– Tumor Characteristics: Tumor Size, Tumor Location, Tumor Type
– Health Factors: MRI Findings, Genetic Risk, Smoking History, Alcohol Consumption
– Other Medical Conditions: Diabetes, Chronic Illness, Head Injury History
– Outcome Measures: Survival Rate, Brain Tumor Present (Yes/No)
– There are no missing values.
tumor_distribution <- brain_tumor_data %>%
group_by(Tumor_Type) %>%
summarise(Count = n()) %>%
arrange(desc(Count))
print(tumor_distribution)
## # A tibble: 2 × 2
## Tumor_Type Count
## <chr> <int>
## 1 Benign 125204
## 2 Malignant 124796
Interpretation
It shows the count of two types of brain tumors in the dataset:
– Benign tumors: 125,204 cases
– Malignant tumors: 124,796 cases
#Level 2: Data Extraction & Filtering
most_common_tumor <- brain_tumor_data %>%
count(Tumor_Type) %>%
arrange(desc(n)) %>%
head(1)
print(most_common_tumor)
## # A tibble: 1 × 2
## Tumor_Type n
## <chr> <int>
## 1 Benign 125204
Interpretation
– The dataset contains information about different types of brain tumors.
– The output shows that “Benign” is the most common type of tumor.
– The total number of patients with a Benign tumor is 125,204.
young_patients <- brain_tumor_data %>%
filter(Age < 30)
print(head(young_patients))
## # A tibble: 6 × 21
## Age Gender Country Tumor_Size Tumor_Location MRI_Findings Genetic_Risk
## <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 29 Male Germany 7.97 Frontal Abnormal 70
## 2 5 Other Brazil 8.65 Parietal Abnormal 68
## 3 19 Other Russia 6.86 Temporal Normal 81
## 4 16 Female USA 8.06 Frontal Normal 47
## 5 17 Other USA 9.66 Parietal Abnormal 89
## 6 29 Female India 1.11 Parietal Abnormal 34
## # ℹ 14 more variables: Smoking_History <chr>, Alcohol_Consumption <chr>,
## # Radiation_Exposure <chr>, Head_Injury_History <chr>, Chronic_Illness <chr>,
## # Blood_Pressure <chr>, Diabetes <chr>, Tumor_Type <chr>,
## # Treatment_Received <chr>, `Survival_Rate(%)` <dbl>,
## # Tumor_Growth_Rate <chr>, Family_History <chr>, Symptom_Severity <chr>,
## # Brain_Tumor_Present <chr>
Interpretation
– Patients under 30 years old with brain tumors.
– Displays age, gender, country, tumor details, MRI findings, and risk factors.
– Helps analyze tumor trends in young patients.
– Identifies common risk factors among younger individuals
– Can be used for early diagnosis and preventive measures.
high_malignancy_cases <- brain_tumor_data %>%
filter(Tumor_Type == "Malignant")
print(head(high_malignancy_cases))
## # A tibble: 6 × 21
## Age Gender Country Tumor_Size Tumor_Location MRI_Findings Genetic_Risk
## <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 66 Other China 8.7 Cerebellum Severe 81
## 2 87 Female Australia 8.14 Temporal Normal 65
## 3 84 Female Brazil 7.94 Temporal Abnormal 47
## 4 29 Male Germany 7.97 Frontal Abnormal 70
## 5 19 Other Russia 6.86 Temporal Normal 81
## 6 43 Other Australia 1.59 Temporal Abnormal 58
## # ℹ 14 more variables: Smoking_History <chr>, Alcohol_Consumption <chr>,
## # Radiation_Exposure <chr>, Head_Injury_History <chr>, Chronic_Illness <chr>,
## # Blood_Pressure <chr>, Diabetes <chr>, Tumor_Type <chr>,
## # Treatment_Received <chr>, `Survival_Rate(%)` <dbl>,
## # Tumor_Growth_Rate <chr>, Family_History <chr>, Symptom_Severity <chr>,
## # Brain_Tumor_Present <chr>
Interpretation
– It shows the patients diagnosed with malignant (cancerous) brain tumors.
– Displays age, gender, country, tumor details, MRI findings, and risk factors.
– It helps to understand common traits in malignant cases.
– It is useful for early detection, treatment planning, and risk assessment.
avg_age <- brain_tumor_data %>%
group_by(Tumor_Type) %>%
summarise(Average_Age = mean(Age, na.rm = TRUE))
print(avg_age)
## # A tibble: 2 × 2
## Tumor_Type Average_Age
## <chr> <dbl>
## 1 Benign 47.0
## 2 Malignant 47.0
Interpretation
– The average age of patients for both benign (non-cancerous) and malignant (cancerous) brain tumors is 47 years.
– Helps in understanding risk age groups for brain tumors.
– It can guide us for further medical research and preventive measures for people around this age.
location_count <- table(brain_tumor_data$Tumor_Location)
most_common_location <- names(location_count[which.max(location_count)])
print(most_common_location)
## [1] "Parietal"
Interpretation
“Parietal” is the most common location where brain tumors are found in this dataset.
age_risk <- brain_tumor_data %>%
group_by(Age) %>%
summarise(Count = n()) %>%
arrange(desc(Count))
print(head(age_risk))
## # A tibble: 6 × 2
## Age Count
## <dbl> <int>
## 1 25 3061
## 2 46 3060
## 3 28 3025
## 4 39 3014
## 5 6 3011
## 6 26 3011
Interpretation
– The table displays the age groups with the highest number of brain tumor cases in the dataset.
– Helps in identifying high-risk age groups for brain tumors.
– Can assist in early detection and preventive healthcare strategies for these age groups.
tumor_rank <- brain_tumor_data %>%
mutate(Symptom_Severity_Num = as.numeric(factor(Symptom_Severity, levels = c("Mild", "Moderate", "Severe")))) %>%
group_by(Tumor_Type) %>%
summarise(Average_Severity = mean(Symptom_Severity_Num, na.rm = TRUE)) %>%
arrange(desc(Average_Severity))
print(tumor_rank)
## # A tibble: 2 × 2
## Tumor_Type Average_Severity
## <chr> <dbl>
## 1 Benign 2.00
## 2 Malignant 2.00
Interpretation
– Conversion of symptom severity into numerical values:
o “Mild” → 1
o “Moderate” → 2
o “Severe” → 3
– Then, it calculated the average severity score for each tumor type (Benign and Malignant).
– Both Benign and Malignant tumors have an average severity score of 2.00.
severity_distribution <- table(brain_tumor_data$Tumor_Type, brain_tumor_data$Symptom_Severity)
print(severity_distribution)
##
## Mild Moderate Severe
## Benign 41818 41522 41864
## Malignant 41663 41443 41690
Interpretation
– The number of cases in each severity category is almost equal for both Benign and Malignant tumors.
– Example:
– Benign Tumors
o Mild: 41,818
o Moderate: 41,522
o Severe: 41,864
– Malignant Tumors
o Mild: 41,663
o Moderate: 41,443
o Severe: 41,690
– Since all three severity levels are nearly equal, the calculated mean severity score for both tumor types could turn out to be the same.
– This explains why the earlier ranking showed Benign = 2.00 and Malignant = 2.00.
risk_factors <- colSums(brain_tumor_data[, c("Genetic_Risk", "Smoking_History", "Radiation_Exposure", "Family_History")] == "Yes", na.rm = TRUE)
print(risk_factors)
## Genetic_Risk Smoking_History Radiation_Exposure Family_History
## 0 125150 0 124964
Interpretation
– It counted the number of “Yes” responses for four major risk factors of brain tumors:
o Genetic Risk
o Smoking History
o Radiation Exposure
o Family History
– Genetic Risk: 0 → No cases in the dataset had “Yes” for genetic risk.
– Smoking History: 125,150 → A large number of cases had smoking history as a risk factor.
– Radiation Exposure: 0 → No cases had “Yes” for radiation exposure.
– Family History: 124,964 → Many cases had a family history of brain tumors.
– Smoking history and family history appear to be the most significant risk factors.
gender_distribution <- brain_tumor_data %>%
group_by(Gender) %>%
summarise(Count = n())
print(gender_distribution)
## # A tibble: 3 × 2
## Gender Count
## <chr> <int>
## 1 Female 83375
## 2 Male 83407
## 3 Other 83218
Interpretation
Output shows the distribution of brain tumor cases based on Gender .
– Female Cases: 83,375
– Male Cases: 83,407
– Other Cases: 83,218 – The counts are very close to each other, meaning no gender is significantly overrepresented.
– Since the number of cases for all gender categories is almost the same, the dataset does not show gender bias in tumor occurrence.
– If one gender had significantly higher cases, we could explore potential risk factors.
– However, since all values are close, gender does not seem to be a major differentiating factor in tumor distribution.
brain_tumor_data <- brain_tumor_data %>%
mutate(Risk_Category = case_when(
Age <= 25 ~ "Low Risk",
Age > 25 & Age <= 50 ~ "Moderate Risk",
Age > 50 ~ "High Risk"
))
# Count of each risk category
table(brain_tumor_data$Risk_Category)
##
## High Risk Low Risk Moderate Risk
## 114510 61647 73843
Interpretation
This output categorizes individuals into different risk levels based on age and counts the number of cases in each category.
– Risk Categories Based on Age:
o Low Risk (Age ≤ 25): 61,647 cases
o Moderate Risk (Age 26–50): 73,843 cases
o High Risk (Age > 50): 114,510 cases
– Most Cases are in the High-Risk Group:
o The highest number of cases (114,510) falls into the High-Risk category (Age > 50).
o This suggests that older individuals are more likely to develop brain tumors.
– Moderate Risk Group is the Second Largest:
o The Moderate Risk group has 73,843 cases, meaning middle-aged individuals also have a significant number of cases.
– Young People Have the Fewest Cases:
o The Low Risk group (Age ≤ 25) has the least number of cases (61,647), suggesting that younger individuals are less prone to brain tumors.
– Risk of brain tumors increases with age, with the highest cases in people above 50 years.
– Middle-aged individuals also have a significant number of cases.
– Young individuals have the lowest risk, indicating that age plays a crucial role in tumor development.
treatment_effectiveness <- brain_tumor_data %>%
group_by(Treatment_Received) %>%
summarise(Average_Survival = mean(`Survival_Rate(%)`, na.rm = TRUE)) %>%
arrange(desc(Average_Survival))
print(treatment_effectiveness)
## # A tibble: 4 × 2
## Treatment_Received Average_Survival
## <chr> <dbl>
## 1 Surgery 54.6
## 2 None 54.5
## 3 Radiation 54.4
## 4 Chemotherapy 54.3
Interpretation
The average survival rate (%) for patients based on the treatment they received.
– Surgery has the highest average survival rate (54.6%), making it the most effective treatment in this dataset.
– No Treatment (None) comes second with 54.5%, which is almost the same as surgery. This indicates that some cases didn’t require treatment or have other factors influencing survival.
– Radiation follows closely with a 54.4% survival rate.
– Chemotherapy has the lowest survival rate (54.3%) among the four treatments.
– The difference in survival rates across treatments is very small, suggesting that multiple factors (like patient condition, tumor type, or severity) might be affecting survival.
– Surgery is slightly more effective than other treatments.
– The “None” category having a high survival rate .
# Select only Age and Survival Rate columns
correlation_data <- brain_tumor_data[, c("Age", "Survival_Rate(%)")]
cor_matrix <- cor(correlation_data)
print(cor_matrix)
## Age Survival_Rate(%)
## Age 1.000000000 0.002885231
## Survival_Rate(%) 0.002885231 1.000000000
corrplot(cor_matrix, method = "circle", type = "upper",
tl.col = "black", tl.srt = 45)
Interpretation – Correlation analysis was done between Age
and Survival Rate (%).
– The correlation value is very close to 0.
– This means there is almost no relationship between Age and Survival Rate.
- Finding:
–Patients’ Age does not significantly affect their Survival Rate.
- Conclusion:
Whether a patient is younger or older, it does not majorly change their chances of survival.
# Select only Tumor Size and Survival Rate columns
correlation_data <- brain_tumor_data[, c("Tumor_Size", "Survival_Rate(%)")]
# Calculate correlation matrix (default without complete.obs)
cor_matrix <- cor(correlation_data)
print(cor_matrix)
## Tumor_Size Survival_Rate(%)
## Tumor_Size 1.000000000 0.001916102
## Survival_Rate(%) 0.001916102 1.000000000
corrplot(cor_matrix, method = "circle", type = "upper",
tl.col = "black", tl.srt = 45)
Interpretation – Correlation coefficient between Tumor Size
and Survival Rate = 0.0019.
– This value is very close to 0, meaning almost no correlation.
- Finding:
–Patients with large tumors and small tumors have similar survival rates.
- Conclusion:
–Tumor Size does not significantly affect Survival Rate.
model1 <- lm(`Survival_Rate(%)` ~ Age, data = brain_tumor_data)
summary(model1)
##
## Call:
## lm(formula = `Survival_Rate(%)` ~ Age, data = brain_tumor_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.610 -22.497 0.393 22.507 44.647
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.337840 0.112362 483.596 <2e-16 ***
## Age 0.003060 0.002121 1.443 0.149
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26 on 249998 degrees of freedom
## Multiple R-squared: 8.325e-06, Adjusted R-squared: 4.325e-06
## F-statistic: 2.081 on 1 and 249998 DF, p-value: 0.1491
# Scatter plot with a trend line
ggplot(brain_tumor_data, aes(x = Age, y = `Survival_Rate(%)`)) +
geom_point(color = "darkblue", size = 2) +
geom_smooth(method = "lm", color = "red") + # Regression line
labs(title = "Scatter Plot: Age vs Survival Rate",
x = "Age",
y = "Survival Rate (%)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Interpretation –Age has a very weak and non-significant
impact on Survival Rate.
–As Age increases, the Survival Rate slightly increases (0.003%), but this is so tiny that it’s practically meaningless.
–The p-value (0.149) shows that this relationship is NOT statistically significant — we cannot trust this relationship based on the data.
–The R-squared (~0%) means that Age alone does not explain survival rate variations among patients
–A scatter plot was created between Age and Survival Rate (%).
–Blue dots show individual patient data.
–The red line (trend line) is almost flat.
–This means Age does not have much effect on Survival Rate.
–Conclusion: No strong relationship between Age and Survival Rate.
brain_tumor_data$Symptom_Score <- as.numeric(factor(brain_tumor_data$Symptom_Severity))
model2 <- lm(`Survival_Rate(%)` ~ Age + Tumor_Size + Symptom_Score, data = brain_tumor_data)
summary(model2)
##
## Call:
## lm(formula = `Survival_Rate(%)` ~ Age + Tumor_Size + Symptom_Score,
## data = brain_tumor_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -44.723 -22.499 0.322 22.508 44.772
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.150782 0.196455 275.640 <2e-16 ***
## Age 0.003051 0.002121 1.438 0.150
## Tumor_Size 0.018125 0.018960 0.956 0.339
## Symptom_Score 0.046133 0.063617 0.725 0.468
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26 on 249996 degrees of freedom
## Multiple R-squared: 1.409e-05, Adjusted R-squared: 2.086e-06
## F-statistic: 1.174 on 3 and 249996 DF, p-value: 0.318
Interpretation
–The p-values for all variables (Age, Tumor Size, and Symptom Severity) are greater than 0.05, meaning none of them are statistically significant.
–The R-squared (~0%) tells us that these three variables together explain almost none of the variations in Survival Rate.
–The F-statistic p-value (0.318) confirms that the overall model is not significant.
ggplot(brain_tumor_data, aes(x = Symptom_Severity)) +
geom_bar(fill = "skyblue") +
labs(title = "Number of Patients by Symptom Severity", x = "Severity", y = "Count") +
theme_minimal()
Interpretation
–Mild and Severe symptoms are almost equal in number.
–Moderate symptoms are slightly less common compared to Mild and Severe.
–Overall, the number of patients is very similar across all three symptom severity levels.
anova_location <- aov(`Survival_Rate(%)` ~ Tumor_Location, data = brain_tumor_data)
summary(anova_location)
## Df Sum Sq Mean Sq F value Pr(>F)
## Tumor_Location 4 3358 839.5 1.242 0.291
## Residuals 249995 168995087 676.0
Interpretation
Since the p-value = 0.291 (> 0.05),
–We fail to reject the null hypothesis.
–Tumor location does not significantly affect the survival rate.
This means where the tumor is located in the brain doesn’t make a big difference to how long patients survive.
ggplot(brain_tumor_data, aes(x = `Survival_Rate(%)`)) +
geom_histogram(fill = "tomato", bins = 20, color = "black") +
labs(title = "Distribution of Survival Rates", x = "Survival Rate (%)", y = "Count") +
theme_minimal()
Interpretation
–Survival rates are widely spread across patients — no single survival rate is dominating.
–Patients are almost evenly distributed across different survival percentages.
–This suggests a lot of variation among patients’ survival outcomes.
##Compare Survival Rates by Gender (Boxplot)
ggplot(brain_tumor_data, aes(x = Gender, y = `Survival_Rate(%)`)) +
geom_boxplot(fill = "lightgreen") +
labs(title = "Survival Rate by Gender", x = "Gender", y = "Survival Rate (%)") +
theme_minimal()
Interpretation:
–Median survival rates for Female, Male, and Other genders are almost the same (around 50%).
–The spread (variation) of survival rates is similar for all genders.
–Minimum and maximum survival rates are nearly equal across all gender groups.
–No extreme outliers are visible in any gender category.
ggplot(brain_tumor_data, aes(x = Treatment_Received)) +
geom_bar(fill = "salmon") +
labs(title = "Number of Patients per Treatment", x = "Treatment", y = "Count") +
theme_minimal()
Interpretation:
–The number of patients across all treatment types — Chemotherapy, Radiation, Surgery, and even No Treatment — are almost equal.
–No treatment type stands out with a very high or very low patient count.
–Each treatment category has around 62,000+ patients.
–Patients are evenly distributed across different treatment options.
–No treatment (None) was chosen by almost as many patients as those who received active treatments like chemotherapy, radiation, or surgery.
ggplot(brain_tumor_data, aes(x = Tumor_Type, y = Tumor_Size)) +
geom_boxplot(fill = "violet") +
labs(title = "Tumor Size per Tumor Type", x = "Tumor Type", y = "Tumor Size") +
theme_minimal()
Interpretation:
–Benign and Malignant tumors have almost the same tumor size distribution.
–The median tumor size (middle line inside the box) is very similar for both tumor types.
–Both tumor types have similar ranges — from very small tumors (close to 0) up to tumors larger than 10 units.
–No major outliers or extreme differences are visible between benign and malignant tumors.
-Key Findings:
–Tumor size alone may not clearly differentiate between Benign and Malignant tumors.
–Even small tumors can be malignant, and large tumors can still be benign.
–Other factors (like tumor location, symptom severity, or genetic risk) might be more important for predicting tumor type.
##Histogram of Age Distribution
ggplot(brain_tumor_data, aes(x = as.numeric(Age))) +
geom_histogram(fill = "lightblue", bins = 20, color = "black") +
labs(title = "Distribution of Patient Ages", x = "Age", y = "Number of Patients") +
theme_minimal()
Interpretation
–The age distribution of patients is fairly even across different age groups.
–Most age groups — whether young (20s) or older (60s-70s) — have a similar number of patients.
–There are slightly fewer very young patients (under 10 years) and very old patients (above 80 years).
–The majority of patients fall between 20 and 80 years old.
-Key Findings:
–Brain tumors affect all age groups almost equally — from young adults to older people.
–Special focus may be needed for patients between 20-80 years, where most cases occur.
–Pediatric (child) brain tumors are less common in this dataset.
##Visualizing Risk Category Count
ggplot(brain_tumor_data, aes(x = Risk_Category)) +
geom_bar(fill = "orange") +
labs(title = "Patient Count by Risk Category", x = "Risk Level", y = "Number of Patients") +
theme_minimal()
Interpretation:
–High Risk patients are the largest group — there are many more patients above 50 years old.
–Moderate Risk patients (aged 26–50) are less than high risk, but still significant.
–Low Risk patients (aged 25 years and below) are the fewest.
-key Findings:
–Older patients (High Risk) form the majority of brain tumor cases.
–Young adults and children (Low Risk) have fewer brain tumor cases compared to older adults.
–Focus on High Risk group is critical for screening, early diagnosis, and preventive healthcare.
ggplot(brain_tumor_data, aes(x = Tumor_Location, y = `Survival_Rate(%)`)) +
geom_boxplot(fill = "gold") +
labs(title = "Survival Rate by Tumor Location", x = "Tumor Location", y = "Survival Rate (%)") +
theme_minimal()
Interpretation:
–The median survival rate is almost the same across all tumor locations (Cerebellum, Frontal, Occipital, Parietal, Temporal).
–Spread (variability) of survival rates looks very similar for each location.
–There are some patients with very low survival rates in every location (seen as longer lower whiskers).
-key Findings:
–Tumor location (Cerebellum, Frontal, Occipital, Parietal, Temporal) does not drastically affect the survival rate.
–Median survival rates are almost identical across different parts of the brain.
–Wide variability in survival exists within each group — survival depends on other factors too (not just tumor location).
avg_survival <- brain_tumor_data %>%
group_by(Age) %>%
summarise(Average_Survival = mean(`Survival_Rate(%)`, na.rm = TRUE))
ggplot(avg_survival, aes(x = Age, y = Average_Survival)) +
geom_line(color = "purple") +
labs(title = "Average Survival Rate by Age", x = "Age", y = "Average Survival Rate (%)") +
theme_minimal()
Interpretation:
The survival rate is not constant; it fluctuates with age.
The survival rate mostly stays between 54% and 55.5%.
There are some sharp increases and decreases at certain ages.
Key Findings:
Age affects survival rate, but the overall changes are small.
No clear trend (like a steady increase or decrease) is visible — the pattern looks irregular.
Survival rate is relatively stable across different ages, with only slight variations.
anova_treatment <- aov(`Survival_Rate(%)` ~ Treatment_Received, data = brain_tumor_data)
summary(anova_treatment)
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment_Received 3 2877 958.9 1.418 0.235
## Residuals 249996 168995568 676.0
Interpretation:
–The p-value is 0.235 (greater than 0.05).
–F-value is 1.418, which is not very high.
–Key Findings:
–Since p-value > 0.05, we fail to reject the null hypothesis.
–Conclusion:
–There is no statistically significant difference in survival rates between different treatments (Surgery, Radiation, Chemotherapy, or None).
–Treatments did not show a strong impact on survival rate in this dataset.
ggplot(brain_tumor_data, aes(x = Treatment_Received, fill = Gender)) +
geom_bar(position = "dodge") +
labs(title = "Treatment Distribution by Gender", x = "Treatment", y = "Count") +
theme_minimal()
Interpretation:
–The bar chart shows the number of patients (Count) by Treatment Type and Gender (Female, Male, Other).
–The bars for all three genders (red for Female, green for Male, blue for Other) are almost equal height for each treatment type (Chemotherapy, None, Radiation, Surgery).
–key Findings:
–Treatment usage is fairly balanced across all genders.
–No major gender-based difference in the number of patients receiving a particular treatment.
–Whether it is Chemotherapy, Radiation, Surgery, or no treatment, males, females, and others are getting similar treatments.