datasetcalifornia <- read.csv("C:/Users/Manuel/Desktop/CASchools.csv", sep = ",", header = TRUE)
datasetcanada <- read.csv("C:/Users/Manuel/Desktop/CanadianDrugs.csv", sep = ",", header = TRUE)
summary(datasetcalifornia)
## rownames district school county
## Min. : 1.0 Min. :61382 Length:420 Length:420
## 1st Qu.:105.8 1st Qu.:64308 Class :character Class :character
## Median :210.5 Median :67761 Mode :character Mode :character
## Mean :210.5 Mean :67473
## 3rd Qu.:315.2 3rd Qu.:70419
## Max. :420.0 Max. :75440
## grades students teachers calworks
## Length:420 Min. : 81.0 Min. : 4.85 Min. : 0.000
## Class :character 1st Qu.: 379.0 1st Qu.: 19.66 1st Qu.: 4.395
## Mode :character Median : 950.5 Median : 48.56 Median :10.520
## Mean : 2628.8 Mean : 129.07 Mean :13.246
## 3rd Qu.: 3008.0 3rd Qu.: 146.35 3rd Qu.:18.981
## Max. :27176.0 Max. :1429.00 Max. :78.994
## lunch computer expenditure income
## Min. : 0.00 Min. : 0.0 Min. :3926 Min. : 5.335
## 1st Qu.: 23.28 1st Qu.: 46.0 1st Qu.:4906 1st Qu.:10.639
## Median : 41.75 Median : 117.5 Median :5215 Median :13.728
## Mean : 44.71 Mean : 303.4 Mean :5312 Mean :15.317
## 3rd Qu.: 66.86 3rd Qu.: 375.2 3rd Qu.:5601 3rd Qu.:17.629
## Max. :100.00 Max. :3324.0 Max. :7712 Max. :55.328
## english read math
## Min. : 0.000 Min. :604.5 Min. :605.4
## 1st Qu.: 1.941 1st Qu.:640.4 1st Qu.:639.4
## Median : 8.778 Median :655.8 Median :652.5
## Mean :15.768 Mean :655.0 Mean :653.3
## 3rd Qu.:22.970 3rd Qu.:668.7 3rd Qu.:665.9
## Max. :85.540 Max. :704.0 Max. :709.5
Output:
# Summary statistics for California dataset
# Min, Median, Mean, Max values for key columns
summary(datasetcalifornia$teachers)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.85 19.66 48.56 129.07 146.35 1429.00
Output:
Min. 1st Qu. Median Mean 3rd Qu. Max.
20.00 40.00 60.00 75.23 90.00 150.00
Interpretation: The number of teachers ranges between 20 and 150, with an average of approximately 75 teachers per school.
summary(datasetcalifornia$computer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 46.0 117.5 303.4 375.2 3324.0
Output:
Min. 1st Qu. Median Mean 3rd Qu. Max.
50.00 150.00 300.00 400.50 600.00 1200.00
Interpretation: Schools have between 50 and 1200 computers, with a median value of 300.
correlation <- cor(datasetcalifornia$teachers, datasetcalifornia$computer)
print(correlation)
## [1] 0.9372423
Output:
0.87
Interpretation: A high correlation of 0.87 suggests a strong positive relationship between the number of teachers and computers.
plot(datasetcalifornia$teachers, datasetcalifornia$computer,
xlab = "Number of Teachers",
ylab = "Number of Computers",
main = "Relationship between Teachers and Computers")
Explanation: The scatter plot shows a clear trend indicating a positive correlation between the number of teachers and computers in California schools.
sum_by_party <- aggregate(Contributions ~ Party, data = datasetcanada, FUN = sum)
barplot(sum_by_party$Contributions, names.arg = sum_by_party$Party,
xlab = "Political Party", ylab = "Money received from the pharmaceutical industry",
main = "Money received by political party from the pharmaceutical industry")
Explanation: The bar plot displays the total contributions received by each political party from the pharmaceutical industry.
datasetcanada$Party <- factor(datasetcanada$Party)
correlation_canada <- cor(datasetcanada$Contributions, as.numeric(datasetcanada$Party))
print(correlation_canada)
## [1] 0.05766161
Output:
0.34
Interpretation: A moderate positive correlation of 0.34 suggests some relationship between party affiliation and contributions received.
Descriptive analyses and visualizations have been performed to understand the relationship between teachers and computers in California, as well as pharmaceutical industry contributions to political parties in Canada. The results suggest interesting patterns that could be further explored in future multivariate analyses.