R Markdown

library(dslabs)
data("longley")
head(longley)
##      GNP.deflator     GNP Unemployed Armed.Forces Population Year Employed
## 1947         83.0 234.289      235.6        159.0    107.608 1947   60.323
## 1948         88.5 259.426      232.5        145.6    108.632 1948   61.122
## 1949         88.2 258.054      368.2        161.6    109.773 1949   60.171
## 1950         89.5 284.599      335.1        165.0    110.929 1950   61.187
## 1951         96.2 328.975      209.9        309.9    112.075 1951   63.221
## 1952         98.1 346.999      193.2        359.4    113.270 1952   63.639

Formula method

barplot(GNP ~ Year, data = longley)

barplot(cbind(Employed, Unemployed) ~ Year, data = longley)

This code creates a barplot where the variable GNP is plotted against the variable Year using the data from the longley dataset. The ~ symbol is used to specify the relationship between the variables. And in 2nd line This code creates a grouped barplot where the variables Employed and Unemployed are plotted against the variable Year using the data from the longley dataset. The cbind function is used to combine the two variables into a matrix.

3rd form of formula - 2 categories :

op <- par(mfrow = 2:1, mgp = c(3,1,0)/2, mar = .1+c(3,3:1))
summary(d.Titanic <- as.data.frame(Titanic))
##   Class       Sex        Age     Survived      Freq       
##  1st :8   Male  :16   Child:16   No :16   Min.   :  0.00  
##  2nd :8   Female:16   Adult:16   Yes:16   1st Qu.:  0.75  
##  3rd :8                                   Median : 13.50  
##  Crew:8                                   Mean   : 68.78  
##                                           3rd Qu.: 77.00  
##                                           Max.   :670.00
barplot(Freq ~ Class + Survived, data = d.Titanic,
        subset = Age == "Adult" & Sex == "Male",
        main = "barplot(Freq ~ Class + Survived, *)", ylab = "# {passengers}", legend.text = TRUE)

This code involves creating a barplot using the barplot function with a formula method. The formula Freq ~ Class + Survived indicates that the frequency (Freq) is plotted against the interaction of the variables Class and Survived. The plot is further subsetted to include only cases where Age is “Adult” and Sex is “Male”. # Corresponding table :

(xt <- xtabs(Freq ~ Survived + Class + Sex, d.Titanic, subset = Age=="Adult"))
## , , Sex = Male
## 
##         Class
## Survived 1st 2nd 3rd Crew
##      No  118 154 387  670
##      Yes  57  14  75  192
## 
## , , Sex = Female
## 
##         Class
## Survived 1st 2nd 3rd Crew
##      No    4  13  89    3
##      Yes 140  80  76   20

This code uses the xtabs function to create a contingency table (xt) based on the formula Freq ~ Survived + Class + Sex from the d.Titanic dataset. It further subsets the data to include only cases where Age is “Adult”. # Alternatively, a mosaic plot :

mosaicplot(xt[,,"Male"], main = "mosaicplot(Freq ~ Class + Survived, *)", color=TRUE)

par(op)

This line of code uses the mosaicplot function to create a mosaic plot. The data for the plot is taken from the contingency table xt for the subset of males ([,,“Male”]). The main title of the plot is set to “mosaicplot(Freq ~ Class + Survived, *)” and the color parameter is set to TRUE to use color in the plot.Following the creation of the mosaic plot, this line of code uses the par function to reset the graphics parameters to their original values. The op object was previously defined to store the original graphics parameters before making changes. This step ensures that any subsequent plots are not affected by the changes made for the mosaic plot, reverting the graphics settings to their initial state. # Default method

# Default method
require(grDevices) # for colours
tN <- table(Ni <- stats::rpois(100, lambda = 5))
r <- barplot(tN, col = rainbow(20))
#- type = "h" plotting *is* 'bar'plot
lines(r, tN, type = "h", col = "red", lwd = 2)

barplot(tN, space = 1.5, axisnames = FALSE,
        sub = "barplot(..., space= 1.5, axisnames = FALSE)")

barplot(VADeaths, plot = FALSE)
## [1] 0.7 1.9 3.1 4.3
barplot(VADeaths, plot = FALSE, beside = TRUE)
##      [,1] [,2] [,3] [,4]
## [1,]  1.5  7.5 13.5 19.5
## [2,]  2.5  8.5 14.5 20.5
## [3,]  3.5  9.5 15.5 21.5
## [4,]  4.5 10.5 16.5 22.5
## [5,]  5.5 11.5 17.5 23.5
mp <- barplot(VADeaths) # default
tot <- colMeans(VADeaths)
text(mp, tot + 3, format(tot), xpd = TRUE, col = "blue")

barplot(VADeaths, beside = TRUE,
        col = c("lightblue", "mistyrose", "lightcyan",
                "lavender", "cornsilk"),
        legend.text = rownames(VADeaths), ylim = c(0, 100))
title(main = "Death Rates in Virginia", font.main = 4)

hh <- t(VADeaths)[, 5:1]
mybarcol <- "gray20"
mp <- barplot(hh, beside = TRUE,
        col = c("lightblue", "mistyrose",
                "lightcyan", "lavender"),
        legend.text = colnames(VADeaths), ylim = c(0,100),
        main = "Death Rates in Virginia", font.main = 4,
        sub = "Faked upper 2*sigma error bars", col.sub = mybarcol,
        cex.names = 1.5)
segments(mp, hh, mp, hh + 2*sqrt(1000*hh/100), col = mybarcol, lwd = 1.5)
stopifnot(dim(mp) == dim(hh))  # corresponding matrices
mtext(side = 1, at = colMeans(mp), line = -2,
      text = paste("Mean", formatC(colMeans(hh))), col = "red")

# Bar shading example
barplot(VADeaths, angle = 15+10*1:5, density = 20, col = "black",
        legend.text = rownames(VADeaths))
title(main = list("Death Rates in Virginia", font = 4))

Generate Poisson-distributed Data:

Generates a random dataset Ni with 100 values following a Poisson distribution with a mean of 5. Creates a table tN to count the occurrences of each unique value in Ni. Create a Basic Bar Plot:

Creates a basic bar plot (r) using the counts from the previous step and colors the bars with a rainbow palette. Add Lines to the Bar Plot:

Adds horizontal lines to the existing bar plot (r) in red, enhancing the visual representation. Customize Bar Plot:

Creates a bar plot with customized spacing and without axis names. Includes a subtitle providing information about the customization. Multiple Grouped Bar Plots:

Creates two variations of bar plots for the “VADeaths” dataset, one as a single plot and the other as a grouped plot without plotting it. Annotate Bar Plot:

Creates a default bar plot (mp) for the “VADeaths” dataset. Computes the column means of the dataset and adds text labels above each bar. Colored Grouped Bar Plot with Legend:

Creates a grouped bar plot with custom colors for each group. Adds a legend, sets y-axis limits, and adds a main title. Error Bars and Statistical Annotations:

Creates a grouped bar plot with custom colors. Adds error bars based on statistical calculations and includes a subtitle with additional information. Bar Shading Example:

Creates a bar plot with angled lines, increased density, and black color. Adds a legend and a main title with a specific font.

Bar shading example

barplot(VADeaths, angle = 15+10*1:5, density = 20, col = "black",
        legend.text = rownames(VADeaths))
title(main = list("Death Rates in Virginia", font = 4))

This barplot is creating a bar chart (barplot) of the data in VADeaths. The bars have shading with an angle that varies for each bar (angle = 15+10*1:5), a density of 20, and a black color. The legend is set to display the row names of VADeaths, and a main title is added to the chart. # Border color

barplot(VADeaths, border = "dark blue") 

This barplot is similar to the previous one, but with the addition of a dark blue border around each bar.

Log scales (not much sense here)

barplot(tN, col = heat.colors(12), log = "y")

barplot(tN, col = gray.colors(20), log = "xy")

The first barplot uses a logarithmic scale on the y-axis (log = “y”) with a color palette from heat.colors(12). The second barplot uses a logarithmic scale on both the x and y axes (log = “xy”) with a gray color palette from gray.colors(20). # Legend location

barplot(height = cbind(x = c(465, 91) / 465 * 100,
                       y = c(840, 200) / 840 * 100,
                       z = c(37, 17) / 37 * 100),
        beside = FALSE,
        width = c(465, 840, 37),
        col = c(1, 2),
        legend.text = c("A", "B"),
        args.legend = list(x = "topleft"))

This barplot creates a chart with bars specified by the height matrix. The beside = FALSE argument arranges the bars in a stacked manner. Widths of the bars are set, and colors are assigned. A legend is added with custom text (“A” and “B”) at the top-left corner of the plot.

###……………histogram……………………………………

op <- par(mfrow = c(2, 2))
hist(islands)
utils::str(hist(islands, col = "gray", labels = TRUE))
## List of 6
##  $ breaks  : num [1:10] 0 2000 4000 6000 8000 10000 12000 14000 16000 18000
##  $ counts  : int [1:9] 41 2 1 1 1 1 0 0 1
##  $ density : num [1:9] 4.27e-04 2.08e-05 1.04e-05 1.04e-05 1.04e-05 ...
##  $ mids    : num [1:9] 1000 3000 5000 7000 9000 11000 13000 15000 17000
##  $ xname   : chr "islands"
##  $ equidist: logi TRUE
##  - attr(*, "class")= chr "histogram"
hist(sqrt(islands), breaks = 12, col = "lightblue", border = "pink")
##-- For non-equidistant breaks, counts should NOT be graphed unscaled:
r <- hist(sqrt(islands), breaks = c(4*0:5, 10*3:5, 70, 100, 140),
          col = "blue1")
text(r$mids, r$density, r$counts, adj = c(.5, -.5), col = "blue3")
sapply(r[2:3], sum)
##    counts   density 
## 48.000000  0.215625
sum(r$density * diff(r$breaks)) # == 1
## [1] 1
lines(r, lty = 3, border = "purple") # -> lines.histogram(*)

par(op)

require(utils) # for str
str(hist(islands, breaks = 12, plot =  FALSE)) #-> 10 (~= 12) breaks
## List of 6
##  $ breaks  : num [1:10] 0 2000 4000 6000 8000 10000 12000 14000 16000 18000
##  $ counts  : int [1:9] 41 2 1 1 1 1 0 0 1
##  $ density : num [1:9] 4.27e-04 2.08e-05 1.04e-05 1.04e-05 1.04e-05 ...
##  $ mids    : num [1:9] 1000 3000 5000 7000 9000 11000 13000 15000 17000
##  $ xname   : chr "islands"
##  $ equidist: logi TRUE
##  - attr(*, "class")= chr "histogram"
str(hist(islands, breaks = c(12,20,36,80,200,1000,17000), plot = FALSE))
## List of 6
##  $ breaks  : num [1:7] 12 20 36 80 200 1000 17000
##  $ counts  : int [1:6] 12 11 8 6 4 7
##  $ density : num [1:6] 0.03125 0.014323 0.003788 0.001042 0.000104 ...
##  $ mids    : num [1:6] 16 28 58 140 600 9000
##  $ xname   : chr "islands"
##  $ equidist: logi FALSE
##  - attr(*, "class")= chr "histogram"
hist(islands, breaks = c(12,20,36,80,200,1000,17000), freq = TRUE,
     main = "WRONG histogram") # and warning
## Warning in plot.histogram(r, freq = freq1, col = col, border = border, angle =
## angle, : the AREAS in the plot are wrong -- rather use 'freq = FALSE'

## Extreme outliers; the "FD" rule would take very large number of 'breaks':
XXL <- c(1:9, c(-1,1)*1e300)
hh <- hist(XXL, "FD") # did not work in R <= 3.4.1; now gives warning
## Warning in hist.default(XXL, "FD"): 'breaks = 4.44796e+299' is too large and
## set to 1e6

## pretty() determines how many counts are used (platform dependently!):
length(hh$breaks) ## typically 1 million -- though 1e6 was "a suggestion only"
## [1] 1000001
## R >= 4.2.0: no "*.5" labels on y-axis:
hist(c(2,3,3,5,5,6,6,6,7))

require(stats)
set.seed(14)
x <- rchisq(100, df = 4)

## Histogram with custom x-axis:
hist(x, xaxt = "n")
axis(1, at = 0:17)

## Comparing data with a model distribution should be done with qqplot()!
qqplot(x, qchisq(ppoints(x), df = 4)); abline(0, 1, col = 2, lty = 2)

## if you really insist on using hist() ... :
hist(x, freq = FALSE, ylim = c(0, 0.2))
curve(dchisq(x, df = 4), col = 2, lty = 2, lwd = 2, add = TRUE)

##description of above code ##[The provided R code is a series of commands demonstrating the use of the `hist function to create histograms and perform related analyses. Here’s a breakdown of the code:

#1.op <- par(mfrow = c(2, 2)):- Sets up a 2x2 plotting layout using the `par function.

#2hist(islands)`:- Creates a histogram of the ‘islands’ dataset.

#3.utils::str(hist(islands, col = “gray”, labels = TRUE))`:- Creates a histogram of ‘islands’ with custom settings (gray color, labels) and prints the structure of the resulting histogram.

#4. hist(sqrt(islands), breaks = 12, col = “lightblue”, border = “pink”)`- Creates a histogram of the square root of ‘islands’ with specified breaks, color, and border.

#5. r <- hist(sqrt(islands), breaks = c(40:5, 103:5, 70, 100, 140), col = “blue1”)`:- Creates a histogram with non-equidistant breaks and stores the result in ‘r’.

#6.text(r\(mids, r\)density, r$counts, adj = c(.5, -.5), col = “blue3”):- Adds text to the plot, annotating counts at the midpoints of the bars.

#7. sapply(r[2:3], sum):- Calculates the sum of the histogram density and counts.

#8.sum(r\(density * diff(r\)breaks)):- Checks if the sum of the density times the bin width equals 1.

#9.lines(r, lty = 3, border = “purple”):- Adds dashed lines to the histogram, demonstrating the use olines.histogram`.

#10. par(op)`- Restores the original plotting parameters.

#11.require(utils) # for str:- Loads the ‘utils’ package for the str` function.

#12.str(hist(islands, breaks = 12, plot = FALSE))`:- Prints the structure of a histogram object without plotting it.

#13.str(hist(islands, breaks = c(12,20,36,80,200,1000,17000), plot = FALSE))`:- Prints the structure of a histogram object with custom breaks without plotting it.

#14.hist(islands, breaks = c(12,20,36,80,200,1000,17000), freq = TRUE, main = “WRONG histogram”)`:- Creates a histogram with custom breaks and a frequency scale, with a title indicating potential issues.

#15.XXL <- c(1:9, c(-1,1)*1e300) - Creates a vector with extreme outliers.

#16.hh <- hist(XXL, “FD”)`:- Attempts to create a histogram using the “FD” rule (Freedman-Diaconis) with a warning about the large number of breaks.

#17length(hh$breaks)`: Prints the length of the breaks, which is platform-dependent but typically 1 million.

##18hist(c(2,3,3,5,5,6,6,6,7))`: Creates a histogram with default settings for a small dataset.

#19.require(stats)`:- Loads the ‘stats’ package.

#20.set.seed(14)`- Sets the seed for reproducibility.

#21.x <- rchisq(100, df = 4)`:- Generates a random sample from a chi-squared distribution with 4 degrees of freedom.

#22.hist(x, xaxt = “n”)`: - Creates a histogram without x-axis ticks.

#23.axis(1, at = 0:17)`: - Adds custom x-axis ticks.

##(24qqplot(x, qchisq(ppoints(x), df = 4)); abline(0, 1, col = 2, lty = 2)`:- Creates a quantile-quantile plot comparing the distribution of ‘x’ to a chi-squared distribution with 4 degrees of freedom.)

##25hist(x, freq = FALSE, ylim = c(0, 0.2))`:- Creates a histogram with a density scale and a specified y-axis limit.

#26curve(dchisq(x, df = 4), col = 2, lty = 2, lwd = 2, add = TRUE)`: #- Adds a chi-squared density curve to the previous histogram.

#The code covers various aspects of histogram creation and customization, addressing potential issues and outliers. It also demonstrates the Freedman-Diaconis rule and includes a Q-Q plot for comparing the distribution of a sample to a theoretical distribution.]