R Markdown

barplot(GNP ~ Year, data = longley)

## The bar plot represents the Gross National Product (GNP) across different years.
## Each bar represents a specific year, and the height of the bar indicates the GNP value to that particular year.
## By looking at the bars, one can visually compare the GNP values across different years.
 barplot(cbind(Employed, Unemployed) ~ Year, data = longley)

## The bar for 1947 shows that there were approximately 59 million employed people and 2.4 million unemployed people in the United States that year.
## The bar for 1967 shows that there were approximately 70 million employed people and 3.3 million unemployed people in the United States that year.
#The upward trend in the employed bars indicates that the number of employed people in the United States increased steadily from 1947 to 1960.
#The downward trend in the unemployed bars from 1947 to 1956 indicates that the number of unemployed people in the United States decreased during that period.
#The upward trend in the unemployed bars from 1957 to 1967 indicates that the number of unemployed people in the United States increased during that period.
op <- par(mfrow = 2:1, mgp = c(3,1,0)/2, mar = .1+c(3,3:1))
summary(d.Titanic <- as.data.frame(Titanic))
##   Class       Sex        Age     Survived      Freq       
##  1st :8   Male  :16   Child:16   No :16   Min.   :  0.00  
##  2nd :8   Female:16   Adult:16   Yes:16   1st Qu.:  0.75  
##  3rd :8                                   Median : 13.50  
##  Crew:8                                   Mean   : 68.78  
##                                           3rd Qu.: 77.00  
##                                           Max.   :670.00
#op <- par(mfrow = 2:1, mgp = c(3,1,0)/2, mar = .1+c(3,3:1)): This line sets up the plotting parameters for a 2x1 grid of plots with adjusted margins and font sizes.

#summary(d.Titanic <- as.data.frame(Titanic)): This line does several things:

#It converts the built-in Titanic dataset into a data frame (as.data.frame(Titanic)).
#Assigns this data frame to d.Titanic.
#Utilizes the summary() function to generate summary statistics for the d.Titanic dataset.
barplot(Freq ~ Class + Survived, data = d.Titanic,
        subset = Age == "Adult" & Sex == "Male",
        main = "barplot(Freq ~ Class + Survived, *)", ylab = "# {passengers}", legend.text = TRUE)

### Alternative graph
(xt <- xtabs(Freq ~ Survived + Class + Sex, d.Titanic, subset = Age=="Adult"))
## , , Sex = Male
## 
##         Class
## Survived 1st 2nd 3rd Crew
##      No  118 154 387  670
##      Yes  57  14  75  192
## 
## , , Sex = Female
## 
##         Class
## Survived 1st 2nd 3rd Crew
##      No    4  13  89    3
##      Yes 140  80  76   20
#The barplot() function is used to create a bar plot in R. In this case, the code intends to create a bar plot for the frequency (Freq) of passengers based on their Class and Survived status, with conditions set for Age being "Adult" and Sex being "Male".

#The subset argument is filtering the data to consider only adult male passengers (Age == "Adult" & Sex == "Male").

#The main argument defines the title of the plot, set as "barplot(Freq ~ Class + Survived, *)".

#ylab sets the label for the y-axis as "# passengers".

#legend.text = TRUE indicates that the legend should be displayed in the plot.

#For the second part of the code, it seems to be creating a contingency table using the xtabs() function. This table summarizes the frequency (Freq) of passengers based on whether they survived (Survived), their Class, and Sex, with conditions set for Age being "Adult".
mosaicplot(xt[,,"Male"], main = "mosaicplot(Freq ~ Class + Survived, *)", color=TRUE)

#The mosaic plot being generated likely visualizes the relationship or distribution of categories within the data subset where the third dimension is "Male."
#The plot title suggests that it might be examining the relationship between variables named "Freq," "Class," and "Survived," possibly showing how these variables are distributed or associated within the subset of "Male" observations.
#The use of colors within the plot could help differentiate between the categories or levels of the variables, providing additional insights into the patterns or associations among them.
#The code generates a bar plot representing the counts of values generated from a Poisson distribution with a mean of 5.
#Each bar in the plot represents the frequency of occurrence of different values from the Poisson distribution.
#The bars are filled with different colors using the rainbow function.
#Additionally, red horizontal lines are plotted over the bars at the heights corresponding to the values in the tN table.

par(op)


# Default method
require(grDevices) # for colours
tN <- table(Ni <- stats::rpois(100, lambda = 5))
r <- barplot(tN, col = rainbow(20))
#- type = "h" plotting *is* 'bar'plot
lines(r, tN, type = "h", col = "red", lwd = 2)

barplot(tN, space = 1.5, axisnames = FALSE,
        sub = "barplot(..., space= 1.5, axisnames = FALSE)")

#tN: This is presumably the data used for generating the bar plot. It represents the heights or values for each bar.

#space = 1.5: This parameter determines the amount of space between the bars in the bar plot. Here, the value is set to 1.5, which means there will be a space equivalent to 1.5 times the width of a single bar between adjacent bars.

#axisnames = FALSE: This parameter controls whether axis labels (both x and y-axis labels) are displayed on the plot. Setting it to FALSE means that the axis labels will not be shown on the plot.

#sub = "barplot(..., space= 1.5, axisnames = FALSE)": This parameter specifies the subtitle of the plot. In this case, it's set to a string that provides information about how the plot was created, mentioning the parameters used (space and axisnames).
barplot(VADeaths, plot = FALSE)
## [1] 0.7 1.9 3.1 4.3
#Data Used: The VADeaths dataset contains mortality counts for five causes of death (row-wise) across four age groups (column-wise) in Virginia in 1940.

#Bar Plot Creation: barplot(VADeaths, plot = FALSE) creates a bar plot representation of this data. However, by setting plot = FALSE, it only generates the bar plot object without displaying it on the screen.

#Stored Plot Object: The resulting bar plot is not directly visible because plot = FALSE prevents immediate display. Instead, the bar plot is stored in a variable. For instance, if you assign the output to a variable named myBarPlot, you can later display it by calling plot(myBarPlot).

#Further Analysis or Customization: Storing the plot object allows you to perform additional operations on the plot, such as adding labels, titles, changing colors, or combining it with other plots before displaying it.
barplot(VADeaths, plot = FALSE, beside = TRUE)
##      [,1] [,2] [,3] [,4]
## [1,]  1.5  7.5 13.5 19.5
## [2,]  2.5  8.5 14.5 20.5
## [3,]  3.5  9.5 15.5 21.5
## [4,]  4.5 10.5 16.5 22.5
## [5,]  5.5 11.5 17.5 23.5
#X-axis: Represents the four periods for which death rates are recorded.
#Y-axis: Displays the death rates.
#Grouped Bars: Each period will have a set of bars, one for each age group, arrangedx side by side.
mp <- barplot(VADeaths) # default
tot <- colMeans(VADeaths)
text(mp, tot + 3, format(tot), xpd = TRUE, col = "blue")

#The code generates a bar plot (barplot) of the VADeaths dataset.
#It calculates the mean values for each column (probably representing death rates for different age groups or causes) and stores them in tot.
#The text() function adds text labels to the top of each bar in the plot. These labels likely display the mean values of each category, slightly above the bars and in blue color.
barplot(VADeaths, beside = TRUE,
        col = c("lightblue", "mistyrose", "lightcyan",
                "lavender", "cornsilk"),
        legend.text = rownames(VADeaths), ylim = c(0, 100))
title(main = "Death Rates in Virginia", font.main = 4)

#The code generates a grouped bar plot that visualizes death rates in Virginia across different age groups.
#Each age group is represented by a set of bars with different colors (lightblue, mistyrose, lightcyan, lavender, cornsilk) displayed side by side.
#The legend shows the corresponding age group for each color used in the plot.
#The y-axis ranges from 0 to 100, indicating the percentage or rate scale for the death rates.
#The title of the plot is "Death Rates in Virginia," displayed in bold font.
hh <- t(VADeaths)[, 5:1]
mybarcol <- "gray20"
mp <- barplot(hh, beside = TRUE,
        col = c("lightblue", "mistyrose",
                "lightcyan", "lavender"),
        legend.text = colnames(VADeaths), ylim = c(0,100),
        main = "Death Rates in Virginia", font.main = 4,
        sub = "Faked upper 2*sigma error bars", col.sub = mybarcol,
        cex.names = 1.5)
segments(mp, hh, mp, hh + 2*sqrt(1000*hh/100), col = mybarcol, lwd = 1.5)
stopifnot(dim(mp) == dim(hh))  # corresponding matrices
mtext(side = 1, at = colMeans(mp), line = -2,
      text = paste("Mean", formatC(colMeans(hh))), col = "red")

#This code essentially creates a barplot representing reordered death rates in Virginia with fake upper error bars and mean values displayed above each group of bars.
barplot(VADeaths, angle = 15+10*1:5, density = 20, col = "black",
        legend.text = rownames(VADeaths))
title(main = list("Death Rates in Virginia", font = 4))

#The bar plot is created using the VADeaths dataset, which likely contains information about death rates in Virginia for different age groups or categories.
#The bars in the plot will have different shading angles (from 15 to 55 degrees) determined by the sequence 15+10*1:5.
#The bars will be black and have a density of 20 for the shading lines.
#The legend of the plot will display the row names of the VADeaths dataset, likely indicating the categories or groups represented in the plot.
#The main title of the plot will be "Death Rates in Virginia", possibly in bold font.
barplot(VADeaths, border = "dark blue")

#The resulting bar plot will display bars representing the death counts in Virginia in 1940 across different age groups and genders. Each bar will have a dark blue border outlining it, making it easier to distinguish individual bars within the plot. This border color enhancement can aid in visual clarity and make the bar chart more visually appealing by emphasizing the boundaries of each bar.
barplot(tN, col = heat.colors(12), log = "y")

#The interpretation of the plot itself would depend on the specific data in tN and how it's distributed. The logarithmic scale on the y-axis means that the differences in values will be more apparent for smaller values compared to larger ones. The choice of heat.colors(12) will color the bars in a sequence of colors transitioning from dark to light.
barplot(tN, col = gray.colors(20), log = "xy")

#Bar heights: The heights of the bars will represent the values in the tN variable. As the axes are set to a logarithmic scale, the visual representation of the values might not be linear. Larger differences between values might not appear as proportional differences in bar heights on the plot.

#Color palette: The bars will be displayed using shades of gray as specified by the gray.colors(20) function. Each bar might have a different shade of gray for differentiation.

#Logarithmic scale: The use of a logarithmic scale on both the x and y axes could affect the interpretation of the data. Small differences may be more visible, especially when dealing with a wide range of values.
barplot(height = cbind(x = c(465, 91) / 465 * 100,
                       y = c(840, 200) / 840 * 100,
                       z = c(37, 17) / 37 * 100),
        beside = FALSE,
        width = c(465, 840, 37),
        col = c(1, 2),
        legend.text = c("A", "B"),
        args.legend = list(x = "topleft"))

#However, the code itself does not execute independently as it is just a snippet for the barplot() function. To execute and display the plot, you need to run this code within an R environment (like RStudio or an R console). Upon execution, it would generate a bar plot with stacked bars representing groups x, y, and z, along with legends indicating the respective groups.

boxplot on a formula:

boxplot(count ~ spray, data = InsectSprays, col = "lightgray")
# *add* notches (somewhat funny here <--> warning "notches .. outside hinges"):
boxplot(count ~ spray, data = InsectSprays,
        notch = TRUE, add = TRUE, col = "blue")
## Warning in (function (z, notch = FALSE, width = NULL, varwidth = FALSE, : some
## notches went outside hinges ('box'): maybe set notch=FALSE

boxplot(decrease ~ treatment, data = OrchardSprays, col = "bisque",
        log = "y")

#This boxplot illustrates the distribution of decrease values across different treatment categories in the OrchardSprays dataset. The boxes' height and spread indicate the variability in decrease values for each treatment, with the y-axis being logarithmically scaled to potentially handle a wide range of values more effectively.

horizontal=TRUE, switching y <–> x :

boxplot(decrease ~ treatment, data = OrchardSprays, col = "bisque",
        log = "x", horizontal=TRUE)

rb <- boxplot(decrease ~ treatment, data = OrchardSprays, col = "bisque")
title("Comparing boxplot()s and non-robust mean +/- SD")
mn.t <- tapply(OrchardSprays$decrease, OrchardSprays$treatment, mean)
sd.t <- tapply(OrchardSprays$decrease, OrchardSprays$treatment, sd)
xi <- 0.3 + seq(rb$n)
points(xi, mn.t, col = "orange", pch = 18)
arrows(xi, mn.t - sd.t, xi, mn.t + sd.t,
       code = 3, col = "pink", angle = 75, length = .1)

#The code generates a boxplot comparing the 'decrease' variable across different 'treatment' levels in the OrchardSprays dataset.
#Additionally, it calculates the mean and standard deviation separately for each treatment level and overlays these statistics on the boxplot using orange points for means and pink arrows representing the standard deviation.
#This visualization helps to compare the distributions of 'decrease' across different 'treatment' categories and shows how the means and standard deviations vary between these groups. The use of different visual elements aids in understanding the spread and central tendency of the 'decrease' variable within each treatment category.

boxplot on a matrix:

mat <- cbind(Uni05 = (1:100)/21, Norm = rnorm(100),
             `5T` = rt(100, df = 5), Gam2 = rgamma(100, shape = 2))
boxplot(mat) # directly, calling boxplot.matrix()

#Interpreting boxplots involves looking at the median (line inside the box), the range of the data (whiskers), and potential outliers (points outside the whiskers) for each distribution. They offer a visual summary of the data's central tendency, spread, and skewness.

boxplot on a data frame:

df. <- as.data.frame(mat)
par(las = 1) # all axis labels horizontal
boxplot(df., main = "boxplot(*, horizontal = TRUE)", horizontal = TRUE)

#converts matrix data into a data frame and then creates a horizontal boxplot to visualize the distribution of values in the data frame columns. 

Using ‘at =’ and adding boxplots – example idea by Roger Bivand :

boxplot(len ~ dose, data = ToothGrowth,
        boxwex = 0.25, at = 1:3 - 0.2,
        subset = supp == "VC", col = "yellow",
        main = "Guinea Pigs' Tooth Growth",
        xlab = "Vitamin C dose mg",
        ylab = "tooth length",
        xlim = c(0.5, 3.5), ylim = c(0, 35), yaxs = "i")
boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
        boxwex = 0.25, at = 1:3 + 0.2,
        subset = supp == "OJ", col = "orange")
legend(2, 9, c("Ascorbic acid", "Orange juice"),
       fill = c("yellow", "orange"))

 #this code produces a comparative visualization (side-by-side boxplot) of the effect of different doses of vitamin C from ascorbic acid and orange juice on guinea pigs' tooth growth. The plot helps to visually analyze and compare the distribution of tooth lengths across different doses and types of vitamin C supplements.

With less effort (slightly different) using factor interaction:

boxplot(len ~ dose:supp, data = ToothGrowth,
        boxwex = 0.5, col = c("orange", "yellow"),
        main = "Guinea Pigs' Tooth Growth",
        xlab = "Vitamin C dose mg", ylab = "tooth length",
        sep = ":", lex.order = TRUE, ylim = c(0, 35), yaxs = "i")

#It shows the distribution of tooth growth lengths for different combinations of doses and supplements given to guinea pigs.
#The x-axis likely represents different doses of vitamin C administered to the guinea pigs, possibly in combination with different supplements.
#The y-axis represents the length of tooth growth in millimeters.
#The orange and yellow boxes represent the spread and distribution of tooth lengths corresponding to different combinations of dose and supplement.
op <- par(mfrow = c(2, 2))
hist(islands)
utils::str(hist(islands, col = "gray", labels = TRUE))
## List of 6
##  $ breaks  : num [1:10] 0 2000 4000 6000 8000 10000 12000 14000 16000 18000
##  $ counts  : int [1:9] 41 2 1 1 1 1 0 0 1
##  $ density : num [1:9] 4.27e-04 2.08e-05 1.04e-05 1.04e-05 1.04e-05 ...
##  $ mids    : num [1:9] 1000 3000 5000 7000 9000 11000 13000 15000 17000
##  $ xname   : chr "islands"
##  $ equidist: logi TRUE
##  - attr(*, "class")= chr "histogram"
hist(sqrt(islands), breaks = 12, col = "lightblue", border = "pink")

#The first histogram gives us the original distribution of landmass areas. It shows how these areas are distributed in terms of frequency or count within specific ranges of landmass sizes.

#The second histogram, displayed due to the misuse of str() with hist(), might not provide meaningful information since str() is not typically used with hist() objects. It generates another histogram of the same dataset but with different aesthetics (gray bars with labels), which might not add any significant insights into the data.

#The third histogram illustrates the distribution of the square root of landmass areas. By taking the square root of the data, it might be an attempt to transform the data and explore how a different transformation affects the distribution. The choice of 12 bins, along with the color scheme, helps visualize this transformed distribution.

##– For non-equidistant breaks, counts should NOT be graphed unscaled:

r <- hist(sqrt(islands), breaks = c(4*0:5, 10*3:5, 70, 100, 140),
          col = "blue1")
text(r$mids, r$density, r$counts, adj = c(.5, -.5), col = "blue3")
sapply(r[2:3], sum)
##    counts   density 
## 48.000000  0.215625
sum(r$density * diff(r$breaks)) # == 1
## [1] 1
lines(r, lty = 3, border = "purple") # -> lines.histogram(*)

par(op)

require(utils) # for str
str(hist(islands, breaks = 12, plot =  FALSE)) #-> 10 (~= 12) breaks
## List of 6
##  $ breaks  : num [1:10] 0 2000 4000 6000 8000 10000 12000 14000 16000 18000
##  $ counts  : int [1:9] 41 2 1 1 1 1 0 0 1
##  $ density : num [1:9] 4.27e-04 2.08e-05 1.04e-05 1.04e-05 1.04e-05 ...
##  $ mids    : num [1:9] 1000 3000 5000 7000 9000 11000 13000 15000 17000
##  $ xname   : chr "islands"
##  $ equidist: logi TRUE
##  - attr(*, "class")= chr "histogram"
str(hist(islands, breaks = c(12,20,36,80,200,1000,17000), plot = FALSE))
## List of 6
##  $ breaks  : num [1:7] 12 20 36 80 200 1000 17000
##  $ counts  : int [1:6] 12 11 8 6 4 7
##  $ density : num [1:6] 0.03125 0.014323 0.003788 0.001042 0.000104 ...
##  $ mids    : num [1:6] 16 28 58 140 600 9000
##  $ xname   : chr "islands"
##  $ equidist: logi FALSE
##  - attr(*, "class")= chr "histogram"
hist(islands, breaks = c(12,20,36,80,200,1000,17000), freq = TRUE,
     main = "WRONG histogram") # and warning
## Warning in plot.histogram(r, freq = freq1, col = col, border = border, angle =
## angle, : the AREAS in the plot are wrong -- rather use 'freq = FALSE'

#The code mainly deals with creating histograms, analyzing their properties, and displaying them with custom breaks. It evaluates the histogram characteristics such as counts, densities, and the distribution of the 'islands' dataset. The second section demonstrates the use of the hist() function with different break specifications and examines the resulting histograms and their structures.

Extreme outliers; the “FD” rule would take very large number of ‘breaks’:

XXL <- c(1:9, c(-1,1)*1e300)
hh <- hist(XXL, "FD") # did not work in R <= 3.4.1; now gives warning
## Warning in hist.default(XXL, "FD"): 'breaks = 4.44796e+299' is too large and
## set to 1e6

## pretty() determines how many counts are used (platform dependently!):
length(hh$breaks) ## typically 1 million -- though 1e6 was "a suggestion only"
## [1] 1000001
#R might encounter difficulties in accurately representing bins because of the vast range in the data, resulting in issues with histogram interpretation or visualization.

R >= 4.2.0: no “*.5” labels on y-axis:

hist(c(2,3,3,5,5,6,6,6,7))

require(stats)
set.seed(14)
x <- rchisq(100, df = 4)

#The histogram command creates a visual representation of the frequency distribution of a specific dataset (2, 3, 3, 5, 5, 6, 6, 6, 7).
#Loading the 'stats' package ensures access to various statistical functions provided by R.
#Setting the seed to 14 allows for reproducibility in generating random numbers.
#Generating 100 random numbers from a chi-squared distribution with 4 degrees of freedom and storing them in 'x' can be used for various statistical analyses or simulations.

Histogram with custom x-axis:

hist(x, xaxt = "n")
axis(1, at = 0:17)

## Comparing data with a model distribution should be done with qqplot()!
qqplot(x, qchisq(ppoints(x), df = 4)); abline(0, 1, col = 2, lty = 2)

#For the histogram, observe the shape, center, spread, and any unusual patterns or outliers within the data.
#For the QQ plot, check the alignment of the points with the diagonal line. If most points fall along the line, it indicates good conformity between the data and the chi-squared distribution. If points deviate significantly, it suggests that the chi-squared distribution might not be the best fit for the data.

if you really insist on using hist() … :

hist(x, freq = FALSE, ylim = c(0, 0.2))
curve(dchisq(x, df = 4), col = 2, lty = 2, lwd = 2, add = TRUE)

#The code generates a visualization that displays the distribution of the data in x using a histogram and overlays a theoretical chi-square distribution curve with 4 degrees of freedom on the same plot. This can be helpful for visually comparing the empirical distribution of the data with the theoretical chi-square distribution to assess how closely they match. The chi-square distribution is often used in statistical inference for hypothesis testing and confidence intervals, especially in the context of testing goodness of fit or in certain types of regression analysis.
require(grDevices)
pie(rep(1, 24), col = rainbow(24), radius = 0.9)

#The resulting pie chart will display 24 slices with different colors, representing equal portions of a whole circle.
pie.sales <- c(0.12, 0.3, 0.26, 0.16, 0.04, 0.12)
names(pie.sales) <- c("Blueberry", "Cherry",
    "Apple", "Boston Cream", "Other", "Vanilla Cream")
pie(pie.sales) # default colours

#The resulting pie chart visually displays the proportions of each pie flavor in the dataset, allowing for a quick comparison of sales distribution among the different types of pies.
pie(pie.sales, col = c("purple", "violetred1", "green3",
                       "cornsilk", "cyan", "white"))

#pie charts might not always be the best choice for data visualization due to potential difficulties in accurately comparing proportions. Bar plots or other types of visualizations might often be more effective in conveying information, especially when dealing with multiple categories or segments.
pie(pie.sales, col = gray(seq(0.4, 1.0, length.out = 6)))

#This code will display a pie chart where each segment represents a category or value from pie.sales, and the colors of these segments will vary in shades of gray from lighter to darker, depending on their position in the sequence generated by gray(seq(0.4, 1.0, length.out = 6)).
pie(pie.sales, density = 10, angle = 15 + 10 * 1:6)

# a pie chart where each slice's size is determined by the values in pie.sales, and the slices would be shaded with lines, each slice starting at the specified angles (25, 35, 45, 55, 65, and 75 degrees). The exact appearance and interpretation of the chart would require knowledge of the pie.sales data.
pie(pie.sales, clockwise = TRUE, main = "pie(*, clockwise = TRUE)")
segments(0, 0, 0, 1, col = "red", lwd = 2)
text(0, 1, "init.angle = 90", col = "red")

#The pie() function creates a pie chart using the data provided in pie.sales.
#The segments() function draws a red line segment.
n <- 200
pie(rep(1, n), labels = "", col = rainbow(n), border = NA,
    main = "pie(*, labels=\"\", col=rainbow(n), border=NA,..")

#This code generates a pie chart with 200 slices, each having a different color from the rainbow spectrum. Since the values for each slice are the same (1), all slices will have an equal share of the pie. No labels will be displayed for the slices, and there won't be any visible borders around the slices. The title of the chart displays the function parameters used to create the chart.

Another case showing pie() is rather fun than science:

(original by FinalBackwardsGlance on http://imgur.com/gallery/wWrpU4X)

pie(c(Sky = 78, "Sunny side of pyramid" = 17, "Shady side of pyramid" = 5),
    init.angle = 315, col = c("deepskyblue", "yellow", "yellow3"), border = FALSE)

#The pie() function in R is used to create pie charts, where each slice represents a proportion of the whole. In this case, the chart would display three slices representing the "Sky," "Sunny side of pyramid," and "Shady side of pyramid" categories, with their respective proportions of 78%, 17%, and 5%. The slices would be colored with the specified colors and positioned according to the specified initial angle (315 degrees).