Introduction:

In Lab 4, I conducted a data analysis using R to explore patterns, relationships, and summary statistics within the given dataset. Techniques such as data cleaning, visualization, and descriptive analysis were used to better understand the structure and behaviour of the data.

library (HSAUR2)
## Loading required package: tools
data("Forbes2000",package="HSAUR2")
head(Forbes2000)
##   rank                name        country             category  sales profits
## 1    1           Citigroup  United States              Banking  94.71   17.85
## 2    2    General Electric  United States        Conglomerates 134.19   15.59
## 3    3 American Intl Group  United States            Insurance  76.66    6.46
## 4    4          ExxonMobil  United States Oil & gas operations 222.88   20.96
## 5    5                  BP United Kingdom Oil & gas operations 232.57   10.27
## 6    6     Bank of America  United States              Banking  49.01   10.81
##    assets marketvalue
## 1 1264.03      255.30
## 2  626.93      328.54
## 3  647.66      194.87
## 4  166.99      277.02
## 5  177.57      173.54
## 6  736.45      117.55
summary(Forbes2000)
##       rank            name                     country   
##  Min.   :   1.0   Length:2000        United States :751  
##  1st Qu.: 500.8   Class :character   Japan         :316  
##  Median :1000.5   Mode  :character   United Kingdom:137  
##  Mean   :1000.5                      Germany       : 65  
##  3rd Qu.:1500.2                      France        : 63  
##  Max.   :2000.0                      Canada        : 56  
##                                      (Other)       :612  
##                    category        sales            profits        
##  Banking               : 313   Min.   :  0.010   Min.   :-25.8300  
##  Diversified financials: 158   1st Qu.:  2.018   1st Qu.:  0.0800  
##  Insurance             : 112   Median :  4.365   Median :  0.2000  
##  Utilities             : 110   Mean   :  9.697   Mean   :  0.3811  
##  Materials             :  97   3rd Qu.:  9.547   3rd Qu.:  0.4400  
##  Oil & gas operations  :  90   Max.   :256.330   Max.   : 20.9600  
##  (Other)               :1120                     NA's   :5         
##      assets          marketvalue    
##  Min.   :   0.270   Min.   :  0.02  
##  1st Qu.:   4.025   1st Qu.:  2.72  
##  Median :   9.345   Median :  5.15  
##  Mean   :  34.042   Mean   : 11.88  
##  3rd Qu.:  22.793   3rd Qu.: 10.60  
##  Max.   :1264.030   Max.   :328.54  
## 

BASIC R COMMANDS

R COMMANDS are the basis for the data analysis and statistical modelling in environment.

class(Forbes2000)
## [1] "data.frame"
str(Forbes2000)
## 'data.frame':    2000 obs. of  8 variables:
##  $ rank       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ name       : chr  "Citigroup" "General Electric" "American Intl Group" "ExxonMobil" ...
##  $ country    : Factor w/ 61 levels "Africa","Australia",..: 60 60 60 60 56 60 56 28 60 60 ...
##  $ category   : Factor w/ 27 levels "Aerospace & defense",..: 2 6 16 19 19 2 2 8 9 20 ...
##  $ sales      : num  94.7 134.2 76.7 222.9 232.6 ...
##  $ profits    : num  17.85 15.59 6.46 20.96 10.27 ...
##  $ assets     : num  1264 627 648 167 178 ...
##  $ marketvalue: num  255 329 195 277 174 ...
dim(Forbes2000)
## [1] 2000    8

ploting histogram

A histrogrsm shows statistical data that correlates with the frequecy of a varibale and the siaz ofits rane in consecutive numerical intervals.

Discussion:

The results of the analysis revealed several important patterns, including changes in variable values, data distribution, and differences between groups. The visualizations helped highlight trends that are not easily observed from raw data. Although the analysis provides a good overall understanding, there are some limitations such as a small dataset and the possibility of outliers. Overall, this lab successfully provides a basic understanding of the data and forms a foundation for further analysis.