library(dslabs)
data("longley")
head(longley)
## GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
## 1947 83.0 234.289 235.6 159.0 107.608 1947 60.323
## 1948 88.5 259.426 232.5 145.6 108.632 1948 61.122
## 1949 88.2 258.054 368.2 161.6 109.773 1949 60.171
## 1950 89.5 284.599 335.1 165.0 110.929 1950 61.187
## 1951 96.2 328.975 209.9 309.9 112.075 1951 63.221
## 1952 98.1 346.999 193.2 359.4 113.270 1952 63.639
barplot(GNP ~ Year, data = longley)
barplot(cbind(Employed, Unemployed) ~ Year, data = longley)
This code creates a barplot where the variable GNP is plotted against
the variable Year using the data from the longley dataset. The ~ symbol
is used to specify the relationship between the variables. And in 2nd
line This code creates a grouped barplot where the variables Employed
and Unemployed are plotted against the variable Year using the data from
the longley dataset. The cbind function is used to combine the two
variables into a matrix.
op <- par(mfrow = 2:1, mgp = c(3,1,0)/2, mar = .1+c(3,3:1))
summary(d.Titanic <- as.data.frame(Titanic))
## Class Sex Age Survived Freq
## 1st :8 Male :16 Child:16 No :16 Min. : 0.00
## 2nd :8 Female:16 Adult:16 Yes:16 1st Qu.: 0.75
## 3rd :8 Median : 13.50
## Crew:8 Mean : 68.78
## 3rd Qu.: 77.00
## Max. :670.00
barplot(Freq ~ Class + Survived, data = d.Titanic,
subset = Age == "Adult" & Sex == "Male",
main = "barplot(Freq ~ Class + Survived, *)", ylab = "# {passengers}", legend.text = TRUE)
This code involves creating a barplot using the barplot function with a
formula method. The formula Freq ~ Class + Survived indicates that the
frequency (Freq) is plotted against the interaction of the variables
Class and Survived. The plot is further subsetted to include only cases
where Age is “Adult” and Sex is “Male”. # Corresponding table :
(xt <- xtabs(Freq ~ Survived + Class + Sex, d.Titanic, subset = Age=="Adult"))
## , , Sex = Male
##
## Class
## Survived 1st 2nd 3rd Crew
## No 118 154 387 670
## Yes 57 14 75 192
##
## , , Sex = Female
##
## Class
## Survived 1st 2nd 3rd Crew
## No 4 13 89 3
## Yes 140 80 76 20
This code uses the xtabs function to create a contingency table (xt) based on the formula Freq ~ Survived + Class + Sex from the d.Titanic dataset. It further subsets the data to include only cases where Age is “Adult”. # Alternatively, a mosaic plot :
mosaicplot(xt[,,"Male"], main = "mosaicplot(Freq ~ Class + Survived, *)", color=TRUE)
par(op)
This line of code uses the mosaicplot function to create a mosaic plot. The data for the plot is taken from the contingency table xt for the subset of males ([,,“Male”]). The main title of the plot is set to “mosaicplot(Freq ~ Class + Survived, *)” and the color parameter is set to TRUE to use color in the plot.Following the creation of the mosaic plot, this line of code uses the par function to reset the graphics parameters to their original values. The op object was previously defined to store the original graphics parameters before making changes. This step ensures that any subsequent plots are not affected by the changes made for the mosaic plot, reverting the graphics settings to their initial state. # Default method
# Default method
require(grDevices) # for colours
tN <- table(Ni <- stats::rpois(100, lambda = 5))
r <- barplot(tN, col = rainbow(20))
#- type = "h" plotting *is* 'bar'plot
lines(r, tN, type = "h", col = "red", lwd = 2)
barplot(tN, space = 1.5, axisnames = FALSE,
sub = "barplot(..., space= 1.5, axisnames = FALSE)")
barplot(VADeaths, plot = FALSE)
## [1] 0.7 1.9 3.1 4.3
barplot(VADeaths, plot = FALSE, beside = TRUE)
## [,1] [,2] [,3] [,4]
## [1,] 1.5 7.5 13.5 19.5
## [2,] 2.5 8.5 14.5 20.5
## [3,] 3.5 9.5 15.5 21.5
## [4,] 4.5 10.5 16.5 22.5
## [5,] 5.5 11.5 17.5 23.5
mp <- barplot(VADeaths) # default
tot <- colMeans(VADeaths)
text(mp, tot + 3, format(tot), xpd = TRUE, col = "blue")
barplot(VADeaths, beside = TRUE,
col = c("lightblue", "mistyrose", "lightcyan",
"lavender", "cornsilk"),
legend.text = rownames(VADeaths), ylim = c(0, 100))
title(main = "Death Rates in Virginia", font.main = 4)
hh <- t(VADeaths)[, 5:1]
mybarcol <- "gray20"
mp <- barplot(hh, beside = TRUE,
col = c("lightblue", "mistyrose",
"lightcyan", "lavender"),
legend.text = colnames(VADeaths), ylim = c(0,100),
main = "Death Rates in Virginia", font.main = 4,
sub = "Faked upper 2*sigma error bars", col.sub = mybarcol,
cex.names = 1.5)
segments(mp, hh, mp, hh + 2*sqrt(1000*hh/100), col = mybarcol, lwd = 1.5)
stopifnot(dim(mp) == dim(hh)) # corresponding matrices
mtext(side = 1, at = colMeans(mp), line = -2,
text = paste("Mean", formatC(colMeans(hh))), col = "red")
# Bar shading example
barplot(VADeaths, angle = 15+10*1:5, density = 20, col = "black",
legend.text = rownames(VADeaths))
title(main = list("Death Rates in Virginia", font = 4))
Generate Poisson-distributed Data:
Generates a random dataset Ni with 100 values following a Poisson distribution with a mean of 5. Creates a table tN to count the occurrences of each unique value in Ni. Create a Basic Bar Plot:
Creates a basic bar plot (r) using the counts from the previous step and colors the bars with a rainbow palette. Add Lines to the Bar Plot:
Adds horizontal lines to the existing bar plot (r) in red, enhancing the visual representation. Customize Bar Plot:
Creates a bar plot with customized spacing and without axis names. Includes a subtitle providing information about the customization. Multiple Grouped Bar Plots:
Creates two variations of bar plots for the “VADeaths” dataset, one as a single plot and the other as a grouped plot without plotting it. Annotate Bar Plot:
Creates a default bar plot (mp) for the “VADeaths” dataset. Computes the column means of the dataset and adds text labels above each bar. Colored Grouped Bar Plot with Legend:
Creates a grouped bar plot with custom colors for each group. Adds a legend, sets y-axis limits, and adds a main title. Error Bars and Statistical Annotations:
Creates a grouped bar plot with custom colors. Adds error bars based on statistical calculations and includes a subtitle with additional information. Bar Shading Example:
Creates a bar plot with angled lines, increased density, and black color. Adds a legend and a main title with a specific font.
barplot(VADeaths, angle = 15+10*1:5, density = 20, col = "black",
legend.text = rownames(VADeaths))
title(main = list("Death Rates in Virginia", font = 4))
This barplot is creating a bar chart (barplot) of the data in VADeaths.
The bars have shading with an angle that varies for each bar (angle =
15+10*1:5), a density of 20, and a black color. The legend is set to
display the row names of VADeaths, and a main title is added to the
chart. # Border color
barplot(VADeaths, border = "dark blue")
This barplot is similar to the previous one, but with the addition of a
dark blue border around each bar.
barplot(tN, col = heat.colors(12), log = "y")
barplot(tN, col = gray.colors(20), log = "xy")
The first barplot uses a logarithmic scale on the y-axis (log = “y”)
with a color palette from heat.colors(12). The second barplot uses a
logarithmic scale on both the x and y axes (log = “xy”) with a gray
color palette from gray.colors(20). # Legend location
barplot(height = cbind(x = c(465, 91) / 465 * 100,
y = c(840, 200) / 840 * 100,
z = c(37, 17) / 37 * 100),
beside = FALSE,
width = c(465, 840, 37),
col = c(1, 2),
legend.text = c("A", "B"),
args.legend = list(x = "topleft"))
This barplot creates a chart with bars specified by the height matrix.
The beside = FALSE argument arranges the bars in a stacked manner.
Widths of the bars are set, and colors are assigned. A legend is added
with custom text (“A” and “B”) at the top-left corner of the plot.
###……………histogram……………………………………
op <- par(mfrow = c(2, 2))
hist(islands)
utils::str(hist(islands, col = "gray", labels = TRUE))
## List of 6
## $ breaks : num [1:10] 0 2000 4000 6000 8000 10000 12000 14000 16000 18000
## $ counts : int [1:9] 41 2 1 1 1 1 0 0 1
## $ density : num [1:9] 4.27e-04 2.08e-05 1.04e-05 1.04e-05 1.04e-05 ...
## $ mids : num [1:9] 1000 3000 5000 7000 9000 11000 13000 15000 17000
## $ xname : chr "islands"
## $ equidist: logi TRUE
## - attr(*, "class")= chr "histogram"
hist(sqrt(islands), breaks = 12, col = "lightblue", border = "pink")
##-- For non-equidistant breaks, counts should NOT be graphed unscaled:
r <- hist(sqrt(islands), breaks = c(4*0:5, 10*3:5, 70, 100, 140),
col = "blue1")
text(r$mids, r$density, r$counts, adj = c(.5, -.5), col = "blue3")
sapply(r[2:3], sum)
## counts density
## 48.000000 0.215625
sum(r$density * diff(r$breaks)) # == 1
## [1] 1
lines(r, lty = 3, border = "purple") # -> lines.histogram(*)
par(op)
require(utils) # for str
str(hist(islands, breaks = 12, plot = FALSE)) #-> 10 (~= 12) breaks
## List of 6
## $ breaks : num [1:10] 0 2000 4000 6000 8000 10000 12000 14000 16000 18000
## $ counts : int [1:9] 41 2 1 1 1 1 0 0 1
## $ density : num [1:9] 4.27e-04 2.08e-05 1.04e-05 1.04e-05 1.04e-05 ...
## $ mids : num [1:9] 1000 3000 5000 7000 9000 11000 13000 15000 17000
## $ xname : chr "islands"
## $ equidist: logi TRUE
## - attr(*, "class")= chr "histogram"
str(hist(islands, breaks = c(12,20,36,80,200,1000,17000), plot = FALSE))
## List of 6
## $ breaks : num [1:7] 12 20 36 80 200 1000 17000
## $ counts : int [1:6] 12 11 8 6 4 7
## $ density : num [1:6] 0.03125 0.014323 0.003788 0.001042 0.000104 ...
## $ mids : num [1:6] 16 28 58 140 600 9000
## $ xname : chr "islands"
## $ equidist: logi FALSE
## - attr(*, "class")= chr "histogram"
hist(islands, breaks = c(12,20,36,80,200,1000,17000), freq = TRUE,
main = "WRONG histogram") # and warning
## Warning in plot.histogram(r, freq = freq1, col = col, border = border, angle =
## angle, : the AREAS in the plot are wrong -- rather use 'freq = FALSE'
## Extreme outliers; the "FD" rule would take very large number of 'breaks':
XXL <- c(1:9, c(-1,1)*1e300)
hh <- hist(XXL, "FD") # did not work in R <= 3.4.1; now gives warning
## Warning in hist.default(XXL, "FD"): 'breaks = 4.44796e+299' is too large and
## set to 1e6
## pretty() determines how many counts are used (platform dependently!):
length(hh$breaks) ## typically 1 million -- though 1e6 was "a suggestion only"
## [1] 1000001
## R >= 4.2.0: no "*.5" labels on y-axis:
hist(c(2,3,3,5,5,6,6,6,7))
require(stats)
set.seed(14)
x <- rchisq(100, df = 4)
## Histogram with custom x-axis:
hist(x, xaxt = "n")
axis(1, at = 0:17)
## Comparing data with a model distribution should be done with qqplot()!
qqplot(x, qchisq(ppoints(x), df = 4)); abline(0, 1, col = 2, lty = 2)
## if you really insist on using hist() ... :
hist(x, freq = FALSE, ylim = c(0, 0.2))
curve(dchisq(x, df = 4), col = 2, lty = 2, lwd = 2, add = TRUE)
##description of above code ##[The provided R code is a series of commands demonstrating the use of the `hist function to create histograms and perform related analyses. Here’s a breakdown of the code:
#1.op <- par(mfrow = c(2, 2)):- Sets up a 2x2 plotting layout using the `par function.
#2hist(islands)`:- Creates a histogram of the ‘islands’ dataset.
#3.utils::str(hist(islands, col = “gray”, labels = TRUE))`:- Creates a histogram of ‘islands’ with custom settings (gray color, labels) and prints the structure of the resulting histogram.
#4. hist(sqrt(islands), breaks = 12, col = “lightblue”, border = “pink”)`- Creates a histogram of the square root of ‘islands’ with specified breaks, color, and border.
#5. r <- hist(sqrt(islands), breaks = c(40:5, 103:5, 70, 100, 140), col = “blue1”)`:- Creates a histogram with non-equidistant breaks and stores the result in ‘r’.
#6.text(r\(mids, r\)density, r$counts, adj = c(.5, -.5), col = “blue3”):- Adds text to the plot, annotating counts at the midpoints of the bars.
#7. sapply(r[2:3], sum):- Calculates the sum of the histogram density and counts.
#8.sum(r\(density * diff(r\)breaks)):- Checks if the sum of the density times the bin width equals 1.
#9.lines(r, lty = 3, border = “purple”):- Adds dashed lines to the histogram, demonstrating the use olines.histogram`.
#10. par(op)`- Restores the original plotting parameters.
#11.require(utils) # for str:- Loads the ‘utils’ package for the str` function.
#12.str(hist(islands, breaks = 12, plot = FALSE))`:- Prints the structure of a histogram object without plotting it.
#13.str(hist(islands, breaks = c(12,20,36,80,200,1000,17000), plot = FALSE))`:- Prints the structure of a histogram object with custom breaks without plotting it.
#14.hist(islands, breaks = c(12,20,36,80,200,1000,17000), freq = TRUE, main = “WRONG histogram”)`:- Creates a histogram with custom breaks and a frequency scale, with a title indicating potential issues.
#15.XXL <- c(1:9, c(-1,1)*1e300) - Creates a vector with extreme outliers.
#16.hh <- hist(XXL, “FD”)`:- Attempts to create a histogram using the “FD” rule (Freedman-Diaconis) with a warning about the large number of breaks.
#17length(hh$breaks)`: Prints the length of the breaks, which is platform-dependent but typically 1 million.
##18hist(c(2,3,3,5,5,6,6,6,7))`: Creates a histogram with default settings for a small dataset.
#19.require(stats)`:- Loads the ‘stats’ package.
#20.set.seed(14)`- Sets the seed for reproducibility.
#21.x <- rchisq(100, df = 4)`:- Generates a random sample from a chi-squared distribution with 4 degrees of freedom.
#22.hist(x, xaxt = “n”)`: - Creates a histogram without x-axis ticks.
#23.axis(1, at = 0:17)`: - Adds custom x-axis ticks.
##(24qqplot(x, qchisq(ppoints(x), df = 4)); abline(0, 1, col = 2, lty = 2)`:- Creates a quantile-quantile plot comparing the distribution of ‘x’ to a chi-squared distribution with 4 degrees of freedom.)
##25hist(x, freq = FALSE, ylim = c(0, 0.2))`:- Creates a histogram with a density scale and a specified y-axis limit.
#26curve(dchisq(x, df = 4), col = 2, lty = 2, lwd = 2, add = TRUE)`: #- Adds a chi-squared density curve to the previous histogram.
#The code covers various aspects of histogram creation and customization, addressing potential issues and outliers. It also demonstrates the Freedman-Diaconis rule and includes a Q-Q plot for comparing the distribution of a sample to a theoretical distribution.]