INTRODUCTION In this report, we explore the basics of using RStudio, a popular software for data analysis and statistical work. RStudio provides an easy and organized environment for writing R commands, viewing outputs, and analyzing datasets.

Overall, this introduction to RStudio allows us to learn fundamental commands and basic data visualization techniques, which are essential steps for data analysis and statistical modelling.

install.packages(“HSAUR2”) library(HSAUR2)

data("Forbes2000",package="HSAUR2")
head(Forbes2000)
##   rank                name        country             category  sales profits
## 1    1           Citigroup  United States              Banking  94.71   17.85
## 2    2    General Electric  United States        Conglomerates 134.19   15.59
## 3    3 American Intl Group  United States            Insurance  76.66    6.46
## 4    4          ExxonMobil  United States Oil & gas operations 222.88   20.96
## 5    5                  BP United Kingdom Oil & gas operations 232.57   10.27
## 6    6     Bank of America  United States              Banking  49.01   10.81
##    assets marketvalue
## 1 1264.03      255.30
## 2  626.93      328.54
## 3  647.66      194.87
## 4  166.99      277.02
## 5  177.57      173.54
## 6  736.45      117.55

R Markdown

summary(Forbes2000)
##       rank            name                     country   
##  Min.   :   1.0   Length:2000        United States :751  
##  1st Qu.: 500.8   Class :character   Japan         :316  
##  Median :1000.5   Mode  :character   United Kingdom:137  
##  Mean   :1000.5                      Germany       : 65  
##  3rd Qu.:1500.2                      France        : 63  
##  Max.   :2000.0                      Canada        : 56  
##                                      (Other)       :612  
##                    category        sales            profits        
##  Banking               : 313   Min.   :  0.010   Min.   :-25.8300  
##  Diversified financials: 158   1st Qu.:  2.018   1st Qu.:  0.0800  
##  Insurance             : 112   Median :  4.365   Median :  0.2000  
##  Utilities             : 110   Mean   :  9.697   Mean   :  0.3811  
##  Materials             :  97   3rd Qu.:  9.547   3rd Qu.:  0.4400  
##  Oil & gas operations  :  90   Max.   :256.330   Max.   : 20.9600  
##  (Other)               :1120                     NA's   :5         
##      assets          marketvalue    
##  Min.   :   0.270   Min.   :  0.02  
##  1st Qu.:   4.025   1st Qu.:  2.72  
##  Median :   9.345   Median :  5.15  
##  Mean   :  34.042   Mean   : 11.88  
##  3rd Qu.:  22.793   3rd Qu.: 10.60  
##  Max.   :1264.030   Max.   :328.54  
## 

BASIC R COMMANDS

R commands are the foundation for data analysis and statistical modelling in the R environment. In this activity, several basic functions were used to understand the Forbes2000 dataset:

class(Forbes2000) This command is used to check the data type or class of the object. It helps us know whether the dataset is a data frame, matrix, list, or another structure.

str(Forbes2000) This function displays the internal structure of the dataset, including the variables, data types, and sample values. It gives a quick overview of what the dataset contains.

dim(Forbes2000) The dim() command shows the dimensions of the dataset, specifically the number of rows and columns. This helps us understand the size of the data we are working with.

These basic commands are useful for getting an initial understanding of any dataset before performing further analysis.

class(Forbes2000)
## [1] "data.frame"
str(Forbes2000)
## 'data.frame':    2000 obs. of  8 variables:
##  $ rank       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ name       : chr  "Citigroup" "General Electric" "American Intl Group" "ExxonMobil" ...
##  $ country    : Factor w/ 61 levels "Africa","Australia",..: 60 60 60 60 56 60 56 28 60 60 ...
##  $ category   : Factor w/ 27 levels "Aerospace & defense",..: 2 6 16 19 19 2 2 8 9 20 ...
##  $ sales      : num  94.7 134.2 76.7 222.9 232.6 ...
##  $ profits    : num  17.85 15.59 6.46 20.96 10.27 ...
##  $ assets     : num  1264 627 648 167 178 ...
##  $ marketvalue: num  255 329 195 277 174 ...
dim(Forbes2000)
## [1] 2000    8

Plotting histogram

A histogram shows statistical data that correlates with the frequency of a variable and the size of its range in consecutive numerical intervals.

Discussion

From this activity, we learned how to use basic R commands to understand and explore a dataset in RStudio. By using class(), str(), and dim(), we were able to identify the type of data, see the structure of the variables, and know the size of the Forbes2000 dataset. These commands are important because they give us a clear overview of the dataset before doing any analysis.

The histograms that we plotted for market value, profits, assets, and sales helped us visualize how the data is distributed. Through the graphs, we could observe patterns such as whether the data is spread out, concentrated in certain ranges, or has extreme values. Visualization makes it easier to compare variables and understand the characteristics of the dataset.

Overall, this activity introduced us to essential RStudio skills. We learned how to inspect datasets using basic commands and how to create simple visualizations to support data interpretation. These skills are important for more advanced data analysis and statistical modelling in the future.