Advanced R presentation

Introduction

The main aim of this work is to present interestimg plotting techiniques using historical demographic and economic data from Poland. R Studio contains many packages that allow the user to customize even the smallest details of a plot. Thanks to that it is possible to visualize the data in a way that provides as much information as possible whilst being clear and easy to understand.

The reason why historical, not current, data is used, comes from the fact that the second goal of this work is to show that history is easier to learn whilst visualised. Most of the time the history is taught by reading and listening. It is not easy for everyone to remember all of the details in that case, but when looking at a plot, it usually takes just one glance and most of the people would be able to rememeber what was the shape of the plot. Unfortunatelly plots are still unappreaciated especially in humanities, since they are usually considered as an element of science. In this work we want to show that plots are an efficient source of information and they can be successfully applied for teaching history.

The variables chosen for the purpose of the analysis may seem unrelated to each other but they have one specific thing in common - apart from one, they represent variables that significantly decreased in the analysed period of time. The reason why only variables with a decreasing trend were analysed is because the final goal of this work is to show to the reader what used to be a characteristic of Poland and is not anymore.

The data

As mentioned above the data used in this work contains of historical data from Poland between the year 1950 and 1993.This period of time was a time when Poland change the most going from a post war communistic country to the modern republic that we live in now. There are two datasets used in this project. First one was downloaded from www.stat.gov.pl and includes 15 numerical variables that are listed below. They include economic and demographic information like total number of people or airtime of black and white programs on tv. Second dataset was custom made and is includes technical variables needed for building the plot and variables containing data: important events, names of polish presidents and political system in the analysed period of time.

First dataset

df <- read.csv("advr 2.csv")
df <- df[ , c(-13,-14,-15, -16)]
colnames(df) <- c("year","number of people in thousands","number of people in the cities in thousands","number of people in the villages in thousands", "marrieges for 1000 people", "live births for 1000 people", "deaths for 1000 people", "birth rate for 1000 people", "number of cinemas", "airtime of black and white programs on tv", "number of passangers using railway transport", "number of passangers using water trasport", "total number of people employed with a contract", "total number of men employed with  a contract" , "total number of women employed with a contract")
colnames(df)

##  [1] "year"                                           
##  [2] "number of people in thousands"                  
##  [3] "number of people in the cities in thousands"    
##  [4] "number of people in the villages in thousands"  
##  [5] "marrieges for 1000 people"                      
##  [6] "live births for 1000 people"                    
##  [7] "deaths for 1000 people"                         
##  [8] "birth rate for 1000 people"                     
##  [9] "number of cinemas"                              
## [10] "airtime of black and white programs on tv"      
## [11] "number of passangers using railway transport"   
## [12] "number of passangers using water trasport"      
## [13] "total number of people employed with a contract"
## [14] "total number of men employed with  a contract"  
## [15] "total number of women employed with a contract"

str(df)

## 'data.frame':    44 obs. of  15 variables:
##  $ year                                           : int  1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
##  $ number of people in thousands                  : int  25035 25507 25999 26511 27012 27550 28080 28540 29000 29480 ...
##  $ number of people in the cities in thousands    : int  9243 10126 10525 10858 11316 12067 12594 12978 13471 13958 ...
##  $ number of people in the villages in thousands  : int  15792 15381 15474 15653 15696 15483 15486 15562 15529 15522 ...
##  $ marrieges for 1000 people                      : Factor w/ 33 levels "10","10,40","10,70",..: 4 3 2 1 33 31 30 27 28 31 ...
##  $ live births for 1000 people                    : Factor w/ 35 levels "12,90","13,50",..: 34 35 33 32 31 31 30 29 28 27 ...
##  $ deaths for 1000 people                         : Factor w/ 27 levels "10","10,10","10,20",..: 7 8 6 3 4 26 21 25 16 17 ...
##  $ birth rate for 1000 people                     : Factor w/ 33 levels "10","10,10","10,20",..: 16 14 16 17 15 17 16 13 12 11 ...
##  $ number of cinemas                              : int  1376 1895 2033 2202 2419 2672 2881 2913 3005 3111 ...
##  $ airtime of black and white programs on tv      : int  21 22 23 21 48 214 757 2104 2992 3216 ...
##  $ number of passangers using railway transport   : int  612841000 713868000 831287000 848936000 904964000 940316000 955466000 955463000 963181000 904572000 ...
##  $ number of passangers using water trasport      : int  2882000 2805000 3796000 4083000 3827000 3666000 3659000 3616758 2496183 2612719 ...
##  $ total number of people employed with a contract: int  4909600 5375300 5710600 6101200 6287400 6443102 6810600 6817700 6803800 6826900 ...
##  $ total number of men employed with  a contract  : int  3399521 3686500 3920900 4170300 4277200 4368811 4639200 4647900 4625100 4563500 ...
##  $ total number of women employed with a contract : int  1510079 1688800 1789700 1930900 2010200 2074291 2171400 2169800 2178700 2263400 ...

Second dataset

Initial data:

df <- read.csv("historia.csv", sep = ",", dec = ".")
colnames(df)

## [1] "date"      "Event"     "Direction" "Position"  "X"         "X.1"      
## [7] "X.2"

head(df)

##         date
## 1 14.04.1950
## 2 22.07.1952
## 3 12.03.1956
## 4 28.06.1956
## 5 08.04.1965
## 6 12.12.1970
##                                                                                        Event
## 1                              the communists and the Polish Episcopate signed the agreement
## 2                       \nthe constitution of the Polish People's Republic (PRL) was adopted
## 3                                                          \nBolesĹ‚aw Bierut died in Moscow
## 4                                                         \nan uprising broke out in PoznaĹ„
## 5    \nthe friendship treaty with the USSR was extended, the new agreement lasted until 1985
## 6 an increase in food prices was announced on the radio, which was to apply from December 13
##   Direction Position  X X.1 X.2
## 1         1      0.5 NA  NA  NA
## 2        -1     -0.5 NA  NA  NA
## 3         1      1.0 NA  NA  NA
## 4        -1     -1.0 NA  NA  NA
## 5         1      1.5 NA  NA  NA
## 6        -1     -1.5 NA  NA  NA

Adding more data:

library(png)
library(ggplot2)
library(scales)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

df$status <- ifelse(substr(df$date,9,10) <= 90, "PRL",  "RP")

df$x <- ifelse(substr(df$date,9,10) <= 57 ,"Bierut", 
                ifelse((substr(df$date,9,10) > 57 & substr(df$date,9,10) <= 70),"Gomułka",
                    ifelse((substr(df$date,9,10) > 70 & substr(df$date,9,10) <= 80),"Gierek",
                           ifelse((substr(df$date,9,10) > 80 & substr(df$date,9,10) <= 90),"Jeruzalski",
                                ifelse((substr(df$date,9,10) > 90 & substr(df$date,9,10) < 95),"Wałęsa",
                                       ifelse(substr(df$date,9,10) >= 95,"Kwaśniewski",""
                           
                                       )
                                  )                  
                             ) 
                      )       
                )
          )



df2 <- df[,c(1,2,3,4)]
df2 <- cbind(df2,df[,c(8,9)])
df2 <- df2[1:17,]
df<- df2

df$date <- as.Date(df$date, format = "%d.%m.%Y")

text_offset <- 0.05

df$month_count <- ave(df$date==df$date, df$date, FUN=cumsum)
df$text_position <- (df$month_count * text_offset * df$Direction) + df$Position

month_buffer <- 2

month_date_range <- seq(min(df$date) - months(month_buffer), max(df$date) + months(month_buffer) , by='month')


month_format <- format(month_date_range, '%b')
month_df <- data.frame(month_date_range, month_format)

head(df)

##         date
## 1 1950-04-14
## 2 1952-07-22
## 3 1956-03-12
## 4 1956-06-28
## 5 1965-04-08
## 6 1970-12-12
##                                                                                        Event
## 1                              the communists and the Polish Episcopate signed the agreement
## 2                       \nthe constitution of the Polish People's Republic (PRL) was adopted
## 3                                                          \nBolesĹ‚aw Bierut died in Moscow
## 4                                                         \nan uprising broke out in PoznaĹ„
## 5    \nthe friendship treaty with the USSR was extended, the new agreement lasted until 1985
## 6 an increase in food prices was announced on the radio, which was to apply from December 13
##   Direction Position status       x month_count text_position
## 1         1      0.5    PRL  Bierut           1          0.55
## 2        -1     -0.5    PRL  Bierut           1         -0.55
## 3         1      1.0    PRL  Bierut           1          1.05
## 4        -1     -1.0    PRL  Bierut           1         -1.05
## 5         1      1.5    PRL Gomułka           1          1.55
## 6        -1     -1.5    PRL Gomułka           1         -1.55

str(df)

## 'data.frame':    17 obs. of  8 variables:
##  $ date         : Date, format: "1950-04-14" "1952-07-22" ...
##  $ Event        : Factor w/ 18 levels "","\nan uprising broke out in PoznaĹ„",..: 16 5 3 2 7 11 18 12 4 13 ...
##  $ Direction    : int  1 -1 1 -1 1 -1 1 -1 1 -1 ...
##  $ Position     : num  0.5 -0.5 1 -1 1.5 -1.5 2 -2 2.5 -2.5 ...
##  $ status       : chr  "PRL" "PRL" "PRL" "PRL" ...
##  $ x            : chr  "Bierut" "Bierut" "Bierut" "Bierut" ...
##  $ month_count  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ text_position: num  0.55 -0.55 1.05 -1.05 1.55 -1.55 2.05 -2.05 2.55 -2.55 ...

The technical variables used for building a plot are: month_count, text_position, Direction and Position.

Timeline

To bring the analysed period of time closer, the plot below presents some important events and people between 1950 and 1994. The events go from introducing the communism to Poland to creating the Republic of Poland. The people presented in the chart are former presidents. The plot was generated using the png and ggplot2 package. What is interesting about this plot is that it is a timeline with distances between the points equal to the relative distances between the dates in real life. Using geom_segment function, mapply and png function the pictures of presidents could be applied in the correct time periods when each president was leading. The names of the presidents going from the left are - Bierut, Gomułka, Gierek, Jaruzelski, Wałęsa, Kwaśniewski. The dots on the timeline indicate the system that was currently in force - Polish People’s Republic and The Republic of Poland.

status_levels <- c("PRL", "RP")
status_colors <- c("#0070C0", "#FFC000")

df$status <- factor(df$status, levels=status_levels, ordered=TRUE)
df$Position

##  [1]  0.5 -0.5  1.0 -1.0  1.5 -1.5  2.0 -2.0  2.5 -2.5  3.0 -3.0  3.5 -3.5  4.0
## [16] -4.0  4.5

df$numb <- df$date - df$date[1]

df$Event <- as.character(df$Event)
df2 <- df
df <- df2
df <- df[-c(3, 7, 12,15), ]


img1 <- readPNG("bierut.png")
img2 <- readPNG("gomulka.png")
img3 <- readPNG("gierek.png")
img4 <- readPNG("jaruzelski.png")
img5 <- readPNG("walesa.png")
img6 <- readPNG("kwasniewski.png")


v <- c(1000, 5473, 8755, 11566, 15862, 16624)
add <- as.data.frame(v)
add$y <- c(-0.025,0.025,-0.025,0.025,-0.025,0.025)

df$text_position <- - 0.002
ggplot() + 
  theme_classic()+
  geom_hline(yintercept=0, color='coral', size=2) +
  geom_point(data = df,aes(x=numb, y=0,col=status, label=Event),size = 5) +
  geom_text(data = df, aes(x = numb, y = text_position, label = date), size = 3,angle = 45, hjust = 1) +
  geom_point(data = add, aes(x = v, y = y), size = 4)+
  ylim(-0.05,0.05) + 
  xlim(-1000, 18000)+
  scale_color_manual(values=status_colors, labels=status_levels, drop = FALSE)+
  geom_segment(data=add, aes(x = v, xend = v, y=y,yend=0), color='black', size=0.2)+
  mapply(function(xx, yy) 
    annotation_raster(img1, xmin=xx-1500, xmax=xx+1500, ymin=yy-0.02, ymax=yy+0.01),
    1000, -0.03) +
  mapply(function(xx, yy) 
    annotation_raster(img2, xmin=xx-1500, xmax=xx+1500, ymin=yy-0.01, ymax=yy+0.02),
    5473, 0.025) +
  mapply(function(xx, yy) 
    annotation_raster(img3, xmin=xx-1500, xmax=xx+1500, ymin=yy-0.02, ymax=yy+0.01),
    8755, -0.03) +
  mapply(function(xx, yy) 
    annotation_raster(img4, xmin=xx-1500, xmax=xx+1500, ymin=yy-0.01, ymax=yy+0.02),
    11566, 0.025) +
  mapply(function(xx, yy) 
    annotation_raster(img5, xmin=xx-1500, xmax=xx+1500, ymin=yy-0.02, ymax=yy+0.01),
    15862, -0.03) +
  mapply(function(xx, yy) 
    annotation_raster(img6, xmin=xx-1500, xmax=xx+1500, ymin=yy-0.01, ymax=yy+0.02),
    16624, 0.025) +
  theme(axis.line.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.text.x =element_blank(),
        axis.ticks.x =element_blank(),
        axis.line.x =element_blank(),
        legend.position = "bottom"
  ) +
  ggtitle("Timeline of important dates and people in Poland between 1950 and 1993") +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(plot.title = element_text(face = "bold"))

## Warning: Ignoring unknown aesthetics: label

Since putting the event labels on the plot would take too much space, they are described below.

df[,c(1,2)]

##          date
## 1  1950-04-14
## 2  1952-07-22
## 4  1956-06-28
## 5  1965-04-08
## 6  1970-12-12
## 8  1978-10-16
## 9  1981-12-13
## 10 1983-10-05
## 11 1989-06-04
## 13 1990-12-09
## 14 1991-10-27
## 16 1994-04-09
## 17 1995-10-19
##                                                                                                                                                                                                                                                           Event
## 1                                                                                                                                                                                                 the communists and the Polish Episcopate signed the agreement
## 2                                                                                                                                                                                          \nthe constitution of the Polish People's Republic (PRL) was adopted
## 4                                                                                                                                                                                                                            \nan uprising broke out in PoznaĹ„
## 5                                                                                                                                                                       \nthe friendship treaty with the USSR was extended, the new agreement lasted until 1985
## 6                                                                                                                                                                    an increase in food prices was announced on the radio, which was to apply from December 13
## 8                                                                                                                                                                election of Cardinal Karol WojtyĹ‚a as bishop of Rome. The new pope took the name John Paul II
## 9                                                                                                                                                                                           \ncommunist authorities of the PRL introduced martial law in Poland
## 10                                                                                                                                                                                                                 Lech WaĹ‚Ä\231sa received the Nobel Peace Prize
## 11                                                                                                                                                                                      This date is most often adopted as the end of communist power in Poland
## 13                                                                                                                                                                                                                  Lech WaĹ‚Ä\231sa won the presidential election
## 14 \nthe first free parliamentary elections after World War II took place, in which the Democratic Union and the Alliance of the Democratic Left managed as the only groups to win over 10% of the vote - the turnout was 43.2%, and 29 parties got to the Sejm
## 16                                                                                                                                                                                                                Poland has applied to join the European Union
## 17                                                                                                                                                                                                         Aleksander Kwasniewski won the presidential election

From the plot and the table we can learn about events that changed the Polish economy and social structure. 1950 was a year when communism was introduced to Polish law. Just before that, our country slowly started building up from the ruins after World War II. Still being under the influence of USSR, Poland went through a rough period which included dramatic incerase in food prices, faked parliment election and uprising in Poznań. There were, however, good moments as well. Elecion of Cardinal Karol Wojtyła as a bishop of Rome and receiving a Nobel Peace Prize by Lech Wałęsa, gave hope to everyone and in 1989 Poland became a non-communism country. Later on it officialy became The Republic of Poland - Poland that we know now.

Demography

Polish demography structure went through a lot of changes since the World War II. In order to show the basic demographic differences between 1950 and 1993 a custom-made plot using geom_segment and vline functions was created. The y axis represents numbers related to certain characteristics of demography. The x axis does not have a meaning. It is simply used for technical reasons. The plot shows in a very clear way which variables decreased and increased over time marking the geom_segment line with red or green color accordingly.

library(data.table)

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year

# prep data
df <- read.csv("advr 2.csv")
df <- df[ ,c(1,5,6,7,8)]
df <- df[c(1,44),]

# transpose


a <- c(10.8,30.7,11.6,19.1,25.035)
b <- c(5.4,12.9, 10.2,2.7,38.239)
df <- data.frame(a,b)
df$class <- c('red','red','red','red','green')

rownames(df) <- c("\nmarriages per 1000", "live births per 1000", "deaths per 1000", "birth rate per 1000","total number\n of people in mln" )

colnames(df) <- c("1950","1993","class")


left_label <- rownames(df)

rownames(df) <- c("marriages per 1000", "live births per 1000", "deaths per 1000", "birth rate per 1000","total number\n of people in mln" )
right_label <- rownames(df)


# Plot
p <- ggplot(df) + geom_segment(aes(x=2, xend=3, y=`1950`, yend=`1993`,col = class), size=1, show.legend=T) + 
  geom_vline(xintercept=2, color = "black", size=1) + 
  geom_vline(xintercept=3, color = "black", size=1) +
  scale_color_manual(labels = c("Increased", "Decreased"), 
                     values = c("green"="#00ba38", "red"="#f8766d")) +  # color of lines
  labs(x="", y="Amount") +  # Axis labels
  xlim(0, 5) + ylim(-2,45)  # X and Y axis


# Add texts
p <- p + geom_text(label=left_label, y=df$`1950`, x=rep(2, NROW(df)), hjust=1.1, size=3.5 , aes(col = class), show.legend=F)
p <- p + geom_text(label=right_label, y=df$`1993`, x=rep(3, NROW(df)), hjust=-0.1, size=3.5,  aes(col = class), show.legend=F)
p <- p + geom_text(label="1950", x=2, y=45, hjust=1.2, size=5)  # title
p <- p + geom_text(label="1993", x=3, y=45, hjust=-0.1, size=5)  # title

# Minify theme
p + theme_gray() +
    theme( panel.grid.minor.x = element_blank(),
           panel.grid.major.x = element_blank(),
          axis.ticks.x = element_blank(),
          axis.text.x = element_blank(),
          panel.border = element_blank(),
          plot.margin = unit(c(1,2,1,2), "cm")) +
  ggtitle("Differences in demography in Poland between 1950 and 1993") +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(plot.title = element_text(face = "bold"),
        legend.position = "right")

The variables that decreased over time are live births per 1000 people, birth rate per 1000 people, number of deaths per 1000 people and marriages per 1000 people. The only variable that increased is total number of people in milions. The slopes of the lines make it easy to gain knowledge about differences between the two periods of time without even looking at raw numbers on the left. We can easily conclude that live births decreased but so did the birth rate which means that our health system is not worse, what became worse is demography itslf. This way of presenting differences makes the analysis less likely to include false conclusions that could appear whilst looking at raw data when some details might be overlooked.

Transport

One of the things that changed significantly over the years was the type of transport that is most often used. Nowadays, probably the most common form of transport are cars and busses. It is due to the fact that cars are more available and roads are built much quicker. A dumbbell chart is one of the best ways to show small changes of a variable. It can be applied by using a geom_dumbbell function from ggplot2. Using png package pictures indicating each form of transport can be applied, which sufficiently removes the need to provide a legend. The y axis presents years, and the x axis number of people using each form of transport in thousands. Looking at the chart one can easily see how the number of passangers was changing over the time.

library(ggplot2)
library(ggalt)

## Warning: package 'ggalt' was built under R version 3.6.2

## Registered S3 methods overwritten by 'ggalt':
##   method                  from   
##   grid.draw.absoluteGrob  ggplot2
##   grobHeight.absoluteGrob ggplot2
##   grobWidth.absoluteGrob  ggplot2
##   grobX.absoluteGrob      ggplot2
##   grobY.absoluteGrob      ggplot2

df <- read.csv("advr 2.csv")
df <- df[,c(1,11,12)]
         colnames(df) <- c("date","train","water")

df$water <- df$water*100
df$diff2 <- df$water

df$diff[1]   <-  df$train[1]      
for(i in 2:44){
  df$diff[i] <- df$train[i-1] 
  
}

for(i in 2:44){
  df$diff2[i] <- df$water[i-1] 
  
}

df$train <- df$train/1000000
df$water <- df$water/1000000
df$diff  <- df$diff/1000000
df$diff2  <- df$diff2/1000000
theme_set(theme_classic())


img1 <- readPNG("trains.png")
img2 <- readPNG("ship.png")
cols <- c("#ff9966","#a3c4dc")

ggplot(df)+ 
  mapply(function(xx, yy) 
    annotation_raster(img2, xmin=xx-300, xmax=xx+200, ymin=yy-8, ymax=yy+10),
    300, 1975) +
  mapply(function(xx, yy) 
    annotation_raster(img1, xmin=xx-300, xmax=xx+200, ymin=yy-8, ymax=yy+10),
    1500, 1975) +
  geom_dumbbell(aes(x=train, xend = diff, y=date,
                col="#ff9966"),
                col = "#ff9966",
                size=0.75, 
                point.colour.l="#ff9966") + 
  geom_dumbbell(aes(x=water, xend = diff2, y=date,
                col="#a3c4dc"),
                col = "#a3c4dc",
                size=0.75, 
                point.colour.l="#a3c4dc",show.legend = TRUE) +
  scale_color_manual(name="Legend",values=c("#ff9966","#a3c4dc"), labels = c("water","train"))+
#  scale_x_continuous(label=train) + 
  labs(x="number of people in thousands", 
       y="year", 
       title="Dumbbell Chart", 
       subtitle="Water and train transport changes in thousands of people per year") +
  theme(plot.title = element_text(hjust=0.5, face="bold"),
        plot.background=element_rect(fill="#f7f7f7"),
        panel.background=element_rect(fill="#f7f7f7"),
        panel.grid.minor=element_blank(),
        panel.grid.major.y=element_blank(),
        panel.grid.major.x=element_line(),
        axis.ticks=element_blank(),
        legend.position="right",
        panel.border=element_blank()) +
  xlim(0,1700)

## Warning: Ignoring unknown parameters: point.colour.l

## Warning: Ignoring unknown parameters: point.colour.l

 # scale_color_manual(values = c("#a3c4dc","#ffff99")) +
 # xlim(0,1700)

The chart shows that both water and train transport have been similarly popular over the years. For both of them there was a high peak between the 70’s and the 80’s. It might be due to the fact that Poland was coming through economic changes, referred to as the “Decade of Gierek” who aimed to improve Polish economy. The food prices went down and people became relatively more rich.

Contract employees

“Whoever does not work, does not eat,” declared Vladimir Lenin, introducing a work order in the Soviet Union. The leaders of the revolution followed in the footsteps of the revolutionary leaders. The history of the polish workers is very complex. Issued on March 7, 1950 by the Act on Preventing the Liquidity of Staff in Professions or Specialties of Special Importance for the Socialized Economy, the People’s Poland authorities introduced a real work order.Sanctions were also introduced for - as stated - leaving the workplace without just cause. A humble employee could be sentenced to imprisonment of up to 6 months or a fine of up to PLN 250,000. The consequence of the statutory work order was corruption and cronyism - for money or thanks to acquaintances one could be delegated to a more attractive workplace.

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:data.table':
## 
##     between, first, last

## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

## The following objects are masked from 'package:data.table':
## 
##     first, last

####kobiety vs mezczyzni zatrudnieni - pyramid plot ####
data<- read.csv2("advr.csv")
data<- as.data.frame(data)
data$year<-as.data.frame(data$year)

male<-data.frame(year=data$year, number= data$mezczyzni.zatrudnieni.na.podstawie.stosunku.pracy..osoby., sex='male')
female<-data.frame(year=data$year, number= -data$kobiety.zatrudnieni.na.podstawie.stosunku.pracy..osoby., sex='female')

praca<-rbind(male, female)

ggplot(praca, aes(x = data.year, y = number, fill = sex)) + 
  geom_bar(subset = subset(praca, sex == "female"), stat = "identity") + 
  geom_bar(subset = subset(praca,sex == "male"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-10000000, 10000000, 2000000), 
                     labels = paste0(as.character(c(seq(10, 0, -2), seq(2, 10, 2))), "m")) + 
  coord_flip() + 
  theme_bw()+
  geom_curve(
    aes(x = 1950, y = -100000, xend = 1950, yend = -1500000),
    data = praca,curvature = 0.2,
    arrow = arrow(length = unit(0.03, "npc")))+
  annotate("text", x = 1950, y = -4000000, label = "7 march 1950",fontface="italic")+
  scale_fill_manual(values = c("blue", "pink"))+
  labs(title="Contract employees" ,y = "number", x = "year")

The pyramid graph shows the distribution of the polish contract employees based on the gender. The blue color shows male and pink female workers. As we see, the chart has a vase shape. The number of the workers starting from the 1950 year was increasing until the late 70s when we can notice a narrowing. In whole period the number of male employees outweighs the number of female employees.

Cinematography

Shortly after the war, the role of cinema within society and its relation to the government was decided. The communist regime saw film as a propaganda tool which would be fundamental in building a truly socialist country.

The works of documentary and feature filmmakers were marked by social realism. The first Polish film that was faithful to social realism and displayed a vision of socialism favourable to the ruling class was Jasne łany by Eugeniusz Cękalski. There were many others: Uczta Baltazara by Jerzy Zarzycki and Jerzy Passendorf from 1954, and Przygoda na Mariensztacie by Leonard Buczkowski from 1953 (the first Polish film technicolour film). The first films of the greats were also affected by the dogma of social realism: Celuloza by Jerzy Kawalerowicz, Generation by Andrzej Wajda or Piątka z ulicy Barskiej by Aleksander Ford.

#display time of black and white films 
df<-data.frame(year=data$year,
               cinema=data$kina..szt., 
               time=data$czas.antenowy.programow.czarno.bialych..h.)



scaleFactor <- max(df$cinema) / max(df$time)

ggplot(df, aes(x=data.year)) +
  geom_smooth(aes(y=cinema), method="loess", col="blue") +
  geom_smooth(aes(y=time * scaleFactor), method="loess", col="red") +
  scale_y_continuous(name="number of cinemas", sec.axis=sec_axis(~./scaleFactor, name="display time (h)")) +
  theme(
    axis.title.y.left=element_text(color="blue"),
    axis.text.y.left=element_text(color="blue"),
    axis.title.y.right=element_text(color="red"),
    axis.text.y.right=element_text(color="red")
  )+
  labs(title="Number of cinemas\nvs\nblack-white movies display time")+
  xlim(1950, 1993)+
  scale_fill_manual(values = c("blue", "pink"))

The above example might be quite confusing. On the left Y axis the number of cinemas is presented and on the right Y axis the display time of black-white movies in hours. That kind of chart is usefull when we ant to present how general trend was behaving even when the units of the variables are different. We can observe an increase in interest until the mid-1960s when both values begin to decrease.

Culture

After World War II, all culture in Poland was subjected to communist ideology. The doctrine of socialist realism was introduced to cinematography, theater, literature, painting, architecture and other fields of culture, which was to show the enthusiasm of Poles in building a new system, point out a class enemy, and praise the power of the socialist state and its economy.

#graph
widzowie_i_sluchacze<- read.csv2("widzowie_i_sluchacze.csv")

PL <- widzowie_i_sluchacze$widzowie.i.sluchacze
CNT <- widzowie_i_sluchacze$sum
YEAR <- widzowie_i_sluchacze$years

df <- data.frame(PL, YEAR, CNT)


# code to add colors to data frame follows
# first the additional packages needed
library(dplyr)
library(colorspace)  # install via: install.packages("colorspace", repos = "http://R-Forge.R-project.org")
library(scales)

# each color scale is defined by a hue, a number between 0 and 360
hues <- c(300, 50, 250, 100, 200, 150)

# now calculate the colors for each data point
df2 <- df %>%
  mutate(index = as.numeric(factor(YEAR))) %>%
  group_by(index) %>%
  mutate(
    max_CNT = max(CNT),
    color = gradient_n_pal(
      sequential_hcl(
        6,
        h = hues[index[1]],
        c = c(45, 20),
        l = c(30, 80),
        power = .5)
    )(CNT/max_CNT)
  )

ggplot(df2, aes(area = CNT, fill = color, label=PL, subgroup=YEAR)) +
  geom_treemap() +
  geom_treemap_subgroup_border(colour="white") +
  geom_treemap_text(fontface = "italic",
                    colour = "white",
                    place = "centre",
                    grow = F,
                    reflow=T) +
  geom_treemap_subgroup_text(place = "centre",
                             grow = T,
                             alpha = 0.5,
                             colour = "#FAFAFA",
                             min.size = 0) +
  scale_fill_identity()

The following graph was created with the treemapify package. It shows which form of cultural activities from the theatrical performances was most popular in years 50s-80s. It has very clear output for the viewer. The differences in colors are useful and comfortable to remember and distinguish individual periods and the gradient defines the most popular form of performances in a simple way. In this topic the theatrical performances are a favorite among variabales.

Summary

The above graphs show various events and statistics from years 1950-1993 in Poland. We are deeply convinced that the form of charts presented here clearly presents their most important elements. If it were used in literature and history textbooks, learning would be much more enjoyable. A simple and transparent form of presentation is the most important in order to understand the topics discussed.