INTRODUCTION In this report, we explore the basics of using RStudio, a popular software for data analysis and statistical work. RStudio provides an easy and organized environment for writing R commands, viewing outputs, and analyzing datasets.
Overall, this introduction to RStudio allows us to learn fundamental commands and basic data visualization techniques, which are essential steps for data analysis and statistical modelling.
install.packages(“HSAUR2”) library(HSAUR2)
data("Forbes2000",package="HSAUR2")
head(Forbes2000)
## rank name country category sales profits
## 1 1 Citigroup United States Banking 94.71 17.85
## 2 2 General Electric United States Conglomerates 134.19 15.59
## 3 3 American Intl Group United States Insurance 76.66 6.46
## 4 4 ExxonMobil United States Oil & gas operations 222.88 20.96
## 5 5 BP United Kingdom Oil & gas operations 232.57 10.27
## 6 6 Bank of America United States Banking 49.01 10.81
## assets marketvalue
## 1 1264.03 255.30
## 2 626.93 328.54
## 3 647.66 194.87
## 4 166.99 277.02
## 5 177.57 173.54
## 6 736.45 117.55
summary(Forbes2000)
## rank name country
## Min. : 1.0 Length:2000 United States :751
## 1st Qu.: 500.8 Class :character Japan :316
## Median :1000.5 Mode :character United Kingdom:137
## Mean :1000.5 Germany : 65
## 3rd Qu.:1500.2 France : 63
## Max. :2000.0 Canada : 56
## (Other) :612
## category sales profits
## Banking : 313 Min. : 0.010 Min. :-25.8300
## Diversified financials: 158 1st Qu.: 2.018 1st Qu.: 0.0800
## Insurance : 112 Median : 4.365 Median : 0.2000
## Utilities : 110 Mean : 9.697 Mean : 0.3811
## Materials : 97 3rd Qu.: 9.547 3rd Qu.: 0.4400
## Oil & gas operations : 90 Max. :256.330 Max. : 20.9600
## (Other) :1120 NA's :5
## assets marketvalue
## Min. : 0.270 Min. : 0.02
## 1st Qu.: 4.025 1st Qu.: 2.72
## Median : 9.345 Median : 5.15
## Mean : 34.042 Mean : 11.88
## 3rd Qu.: 22.793 3rd Qu.: 10.60
## Max. :1264.030 Max. :328.54
##
BASIC R COMMANDS
R commands are the foundation for data analysis and statistical modelling in the R environment. In this activity, several basic functions were used to understand the Forbes2000 dataset:
class(Forbes2000) This command is used to check the data type or class of the object. It helps us know whether the dataset is a data frame, matrix, list, or another structure.
str(Forbes2000) This function displays the internal structure of the dataset, including the variables, data types, and sample values. It gives a quick overview of what the dataset contains.
dim(Forbes2000) The dim() command shows the dimensions of the dataset, specifically the number of rows and columns. This helps us understand the size of the data we are working with.
These basic commands are useful for getting an initial understanding of any dataset before performing further analysis.
class(Forbes2000)
## [1] "data.frame"
str(Forbes2000)
## 'data.frame': 2000 obs. of 8 variables:
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ name : chr "Citigroup" "General Electric" "American Intl Group" "ExxonMobil" ...
## $ country : Factor w/ 61 levels "Africa","Australia",..: 60 60 60 60 56 60 56 28 60 60 ...
## $ category : Factor w/ 27 levels "Aerospace & defense",..: 2 6 16 19 19 2 2 8 9 20 ...
## $ sales : num 94.7 134.2 76.7 222.9 232.6 ...
## $ profits : num 17.85 15.59 6.46 20.96 10.27 ...
## $ assets : num 1264 627 648 167 178 ...
## $ marketvalue: num 255 329 195 277 174 ...
dim(Forbes2000)
## [1] 2000 8
Plotting histogram
A histogram shows statistical data that correlates with the frequency of a variable and the size of its range in consecutive numerical intervals.
Discussion
From this activity, we learned how to use basic R commands to understand and explore a dataset in RStudio. By using class(), str(), and dim(), we were able to identify the type of data, see the structure of the variables, and know the size of the Forbes2000 dataset. These commands are important because they give us a clear overview of the dataset before doing any analysis.
The histograms that we plotted for market value, profits, assets, and sales helped us visualize how the data is distributed. Through the graphs, we could observe patterns such as whether the data is spread out, concentrated in certain ranges, or has extreme values. Visualization makes it easier to compare variables and understand the characteristics of the dataset.
Overall, this activity introduced us to essential RStudio skills. We learned how to inspect datasets using basic commands and how to create simple visualizations to support data interpretation. These skills are important for more advanced data analysis and statistical modelling in the future.