Introduction:
In Lab 4, I conducted a data analysis using R to explore patterns, relationships, and summary statistics within the given dataset. Techniques such as data cleaning, visualization, and descriptive analysis were used to better understand the structure and behaviour of the data.
library (HSAUR2)
## Loading required package: tools
data("Forbes2000",package="HSAUR2")
head(Forbes2000)
## rank name country category sales profits
## 1 1 Citigroup United States Banking 94.71 17.85
## 2 2 General Electric United States Conglomerates 134.19 15.59
## 3 3 American Intl Group United States Insurance 76.66 6.46
## 4 4 ExxonMobil United States Oil & gas operations 222.88 20.96
## 5 5 BP United Kingdom Oil & gas operations 232.57 10.27
## 6 6 Bank of America United States Banking 49.01 10.81
## assets marketvalue
## 1 1264.03 255.30
## 2 626.93 328.54
## 3 647.66 194.87
## 4 166.99 277.02
## 5 177.57 173.54
## 6 736.45 117.55
summary(Forbes2000)
## rank name country
## Min. : 1.0 Length:2000 United States :751
## 1st Qu.: 500.8 Class :character Japan :316
## Median :1000.5 Mode :character United Kingdom:137
## Mean :1000.5 Germany : 65
## 3rd Qu.:1500.2 France : 63
## Max. :2000.0 Canada : 56
## (Other) :612
## category sales profits
## Banking : 313 Min. : 0.010 Min. :-25.8300
## Diversified financials: 158 1st Qu.: 2.018 1st Qu.: 0.0800
## Insurance : 112 Median : 4.365 Median : 0.2000
## Utilities : 110 Mean : 9.697 Mean : 0.3811
## Materials : 97 3rd Qu.: 9.547 3rd Qu.: 0.4400
## Oil & gas operations : 90 Max. :256.330 Max. : 20.9600
## (Other) :1120 NA's :5
## assets marketvalue
## Min. : 0.270 Min. : 0.02
## 1st Qu.: 4.025 1st Qu.: 2.72
## Median : 9.345 Median : 5.15
## Mean : 34.042 Mean : 11.88
## 3rd Qu.: 22.793 3rd Qu.: 10.60
## Max. :1264.030 Max. :328.54
##
BASIC R COMMANDS
R COMMANDS are the basis for the data analysis and statistical modelling in environment.
class(Forbes2000)
## [1] "data.frame"
str(Forbes2000)
## 'data.frame': 2000 obs. of 8 variables:
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ name : chr "Citigroup" "General Electric" "American Intl Group" "ExxonMobil" ...
## $ country : Factor w/ 61 levels "Africa","Australia",..: 60 60 60 60 56 60 56 28 60 60 ...
## $ category : Factor w/ 27 levels "Aerospace & defense",..: 2 6 16 19 19 2 2 8 9 20 ...
## $ sales : num 94.7 134.2 76.7 222.9 232.6 ...
## $ profits : num 17.85 15.59 6.46 20.96 10.27 ...
## $ assets : num 1264 627 648 167 178 ...
## $ marketvalue: num 255 329 195 277 174 ...
dim(Forbes2000)
## [1] 2000 8
ploting histogram
A histrogrsm shows statistical data that correlates with the frequecy of a varibale and the siaz ofits rane in consecutive numerical intervals.
Discussion:
The results of the analysis revealed several important patterns, including changes in variable values, data distribution, and differences between groups. The visualizations helped highlight trends that are not easily observed from raw data. Although the analysis provides a good overall understanding, there are some limitations such as a small dataset and the possibility of outliers. Overall, this lab successfully provides a basic understanding of the data and forms a foundation for further analysis.