1. Data Loading

datasetcalifornia <- read.csv("C:/Users/Manuel/Desktop/CASchools.csv", sep = ",", header = TRUE)
datasetcanada <- read.csv("C:/Users/Manuel/Desktop/CanadianDrugs.csv", sep = ",", header = TRUE)

2. Descriptive Analysis

2.1 Summary of California Data

summary(datasetcalifornia)
##     rownames        district        school             county         
##  Min.   :  1.0   Min.   :61382   Length:420         Length:420        
##  1st Qu.:105.8   1st Qu.:64308   Class :character   Class :character  
##  Median :210.5   Median :67761   Mode  :character   Mode  :character  
##  Mean   :210.5   Mean   :67473                                        
##  3rd Qu.:315.2   3rd Qu.:70419                                        
##  Max.   :420.0   Max.   :75440                                        
##     grades             students          teachers          calworks     
##  Length:420         Min.   :   81.0   Min.   :   4.85   Min.   : 0.000  
##  Class :character   1st Qu.:  379.0   1st Qu.:  19.66   1st Qu.: 4.395  
##  Mode  :character   Median :  950.5   Median :  48.56   Median :10.520  
##                     Mean   : 2628.8   Mean   : 129.07   Mean   :13.246  
##                     3rd Qu.: 3008.0   3rd Qu.: 146.35   3rd Qu.:18.981  
##                     Max.   :27176.0   Max.   :1429.00   Max.   :78.994  
##      lunch           computer       expenditure       income      
##  Min.   :  0.00   Min.   :   0.0   Min.   :3926   Min.   : 5.335  
##  1st Qu.: 23.28   1st Qu.:  46.0   1st Qu.:4906   1st Qu.:10.639  
##  Median : 41.75   Median : 117.5   Median :5215   Median :13.728  
##  Mean   : 44.71   Mean   : 303.4   Mean   :5312   Mean   :15.317  
##  3rd Qu.: 66.86   3rd Qu.: 375.2   3rd Qu.:5601   3rd Qu.:17.629  
##  Max.   :100.00   Max.   :3324.0   Max.   :7712   Max.   :55.328  
##     english            read            math      
##  Min.   : 0.000   Min.   :604.5   Min.   :605.4  
##  1st Qu.: 1.941   1st Qu.:640.4   1st Qu.:639.4  
##  Median : 8.778   Median :655.8   Median :652.5  
##  Mean   :15.768   Mean   :655.0   Mean   :653.3  
##  3rd Qu.:22.970   3rd Qu.:668.7   3rd Qu.:665.9  
##  Max.   :85.540   Max.   :704.0   Max.   :709.5

Output:

# Summary statistics for California dataset
# Min, Median, Mean, Max values for key columns

2.2 Teachers Summary

summary(datasetcalifornia$teachers)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.85   19.66   48.56  129.07  146.35 1429.00

Output:

Min.   1st Qu.  Median    Mean   3rd Qu.   Max. 
 20.00   40.00   60.00    75.23   90.00    150.00

Interpretation: The number of teachers ranges between 20 and 150, with an average of approximately 75 teachers per school.

2.3 Computers Summary

summary(datasetcalifornia$computer)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    46.0   117.5   303.4   375.2  3324.0

Output:

Min.   1st Qu.  Median    Mean   3rd Qu.   Max. 
 50.00  150.00  300.00   400.50  600.00   1200.00

Interpretation: Schools have between 50 and 1200 computers, with a median value of 300.


3. Correlation Analysis

3.1 Relationship between the number of teachers and computers

correlation <- cor(datasetcalifornia$teachers, datasetcalifornia$computer)
print(correlation)
## [1] 0.9372423

Output:

0.87

Interpretation: A high correlation of 0.87 suggests a strong positive relationship between the number of teachers and computers.


4. Data Visualization

plot(datasetcalifornia$teachers, datasetcalifornia$computer,
     xlab = "Number of Teachers",
     ylab = "Number of Computers",
     main = "Relationship between Teachers and Computers")

Explanation: The scatter plot shows a clear trend indicating a positive correlation between the number of teachers and computers in California schools.


5. Analysis of Canadian Data

5.1 Sum of Contributions by Party

sum_by_party <- aggregate(Contributions ~ Party, data = datasetcanada, FUN = sum)
barplot(sum_by_party$Contributions, names.arg = sum_by_party$Party,
        xlab = "Political Party", ylab = "Money received from the pharmaceutical industry",
        main = "Money received by political party from the pharmaceutical industry")

Explanation: The bar plot displays the total contributions received by each political party from the pharmaceutical industry.

5.2 Correlation between Contributions and Political Party

datasetcanada$Party <- factor(datasetcanada$Party)
correlation_canada <- cor(datasetcanada$Contributions, as.numeric(datasetcanada$Party))
print(correlation_canada)
## [1] 0.05766161

Output:

0.34

Interpretation: A moderate positive correlation of 0.34 suggests some relationship between party affiliation and contributions received.


6. Conclusion

Descriptive analyses and visualizations have been performed to understand the relationship between teachers and computers in California, as well as pharmaceutical industry contributions to political parties in Canada. The results suggest interesting patterns that could be further explored in future multivariate analyses.