1 MOMA Data analysis

1.1 Data pre-processing

We will merge two dataset artist and art to acquire the personal info about each artists as well as their art works.

moma <- merge(art, artists , y.by=Artist.ID)

head(moma, 5)
##   Artist.ID                      Name Artwork.ID
## 1         1            Robert Arneson      33599
## 2         1            Robert Arneson      64139
## 3        10             Irene Aronson      64320
## 4       100 Ivan Le Lorraine Albright      70450
## 5       100 Ivan Le Lorraine Albright      81066
##                                         Title             Date
## 1                         Study for Head Bath             1977
## 2                                General Nuke             1986
## 3                                  Moon Night             1951
## 4 Into the World There Came A Soul Called Ida             1940
## 5                         The Artist's Father 1935 (cast 1952)
##                              Medium
## 1 Conté crayon and pencil on paper
## 2                        Lithograph
## 3              Etching and aquatint
## 4                        Lithograph
## 5                            Bronze
##                                                                                           Dimensions
## 1                                                                 41 5/8 x 29 7/8" (105.8 x 75.8 cm)
## 2 composition (irreg.): 31 11/16 x 23 5/16" (80.5 x 59.2 cm); sheet: 36 1/4 x 27 15/16" (92 x 71 cm)
## 3                           plate: 5 13/16 x 4" (14.7 x 10.1cm); sheet: 13 1/8 x 10" (33.3 x 25.4cm)
## 4                composition: 16 15/16 x 14" (43 x 35.5 cm); sheet: 19 5/16 x 15 3/8" (49 x 39.1 cm)
## 5                                                        15 x 6 1/2 x 4 3/4" (38.1 x 16.5 x 12.1 cm)
##   Acquisition.Date                                      Credit Catalogue
## 1       1981-04-28 Gift of the Friends of Contemporary Drawing         Y
## 2       1997-05-28                      Gift of Landfall Press         Y
## 3       1952-11-07                           Given Anonymously         Y
## 4       1965-06-08                         John B. Turner Fund         Y
## 5       1952-10-21                        Gift of Earle Ludgin         Y
##                   Department Classification Object.Number Diameter..cm.
## 1                   Drawings        Drawing       67.1981            NA
## 2 Prints & Illustrated Books          Print      192.1997            NA
## 3 Prints & Illustrated Books          Print      201.1952            NA
## 4 Prints & Illustrated Books          Print      521.1965            NA
## 5       Painting & Sculpture      Sculpture      172.1952            NA
##   Circumference..cm. Height..cm. Length..cm. Width..cm. Depth..cm.
## 1                 NA       105.7          NA       75.9         NA
## 2                 NA        80.5          NA       59.2         NA
## 3                 NA        14.7          NA       10.1         NA
## 4                 NA        43.0          NA       35.5         NA
## 5                 NA        38.1          NA       16.5       12.1
##   Weight..kg. Duration..s. Nationality Gender Birth.Year Death.Year
## 1          NA           NA    American   Male       1930       1992
## 2          NA           NA    American   Male       1930       1992
## 3          NA           NA    American Female       1918         NA
## 4          NA           NA    American   Male       1897       1983
## 5          NA           NA    American   Male       1897       1983

1.2 Artists Overview

In this section, we will explore the nationality , age and gender of the artists.

1.2.1 Nationality

We can see MOMA has majority the collection from American artists, followed by German, French and European artists. Japanese and Argentina or Brazil are the South American representation.

#nationality
country <- group_by(artists, Nationality )
country2 <- dplyr::summarise(country, count=n())
country2 <- country2[order(-country2$count),]
country3 <- filter (country2, count!=2488 & count!=255)
country4<- country3[1:20,]

ggplot(data=country4, aes(x=reorder(Nationality,count), y=count))+geom_bar(stat="identity", aes(fill=Nationality))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Top 20 Nationality of Artists in MOMA") + xlab("Nationality") +ylab("Total number of Artists")+theme(legend.position = "none")  

1.2.2 Most ollected artists

We picked 20 artists has the most art work in MOMA, it seems like MOMA adore American and French artists, there are some German artists. Picasso is the only Spain artists made to the top 20 lists. Atget is a photographer, Bourgoies is famous for her large scaled scupltures. I am very suprised to see MOMA has extremely large amount of collections of Picasso and Matisse.

#artists
countryn <- group_by(moma, Artist.ID, Name, Nationality)
countryn2 <- dplyr::summarise(countryn, count=n())
countryn2 <- countryn2[order(-countryn2$count),]
countryn3 <- filter(countryn2, Name !="Unknown photographer" & Name !="Unknown Artist" & count !=1460)

countryn4<- countryn3[1:20,]

ggplot(data=countryn4, aes(x=reorder(Name,count), y=count))+geom_bar(stat="identity", aes(fill=Nationality))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Artists of the most artworks in MOMA") + xlab("Artists") +ylab("Total number of artists")+theme(legend.position = "bottom")  

#treemap(countryn4, index="Name", vSize="count", type="index", 
       #  title="Artists Treemap", fontsize.title=11)

1.2.3 Gender

From the Gender distribution, we can see the majority of the artists are Male 65%, felmale is less than 20%. The unknown category may include an institution or group work.

#Female
artists$s<- ifelse(artists$Gender %in% c("Male", "male"), "Male",
                   ifelse(artists$Gender %in% c("Female", "female"), "Female",
                          "Unknown"))
sex <- group_by(artists, s )
sex2 <- dplyr::summarise(sex, count=n())

p1<-plot_ly(sex2, labels=~s, values= ~count, type="pie",
           textposition="inside", textinfo="label+percent",
           insidetextfont=list(color="#FFFFFF")) %>%
  layout(title="MOMA artists Gender distribution")
p1

1.2.4 Age distribution

In this section, we used the non-missing birth date and death date to calculated the artists survival age, then use the histogram graph to show the age distribution. In addition, we divided the artists into 3 groups, died before 40 years older, died between 40 to 65 years old, and live up to 65 years old. From the plot, we can see majority of the artists lived up to 65 years old. B. Efimov from Russian lived to 108 years old.

painter <- filter(artists, Birth.Year >0)
age <- filter(artists, Birth.Year >0 & Death.Year >0)
age$age <- age$Death.Year- age$Birth.Year
age1 <- filter(age, age >14 & age <110)
age1$agegp <- ifelse(age1$age <= 40 & age1$age >14, " <= 40 years",
                     ifelse(age1$age >40 & age1$age <= 65, "<= 65 years", " >65 years"))

#histogram
page <- plot_ly(age1, x=~age,  alpha=0.6, color="orange", type="histogram")%>%
  layout(title="MOMA artists Age Distribution", list(yaxis=list(title="Count")))
page
#box
ageg1 <- group_by(age1, age, agegp )
ageg2 <- dplyr::summarise(ageg1, count=n())
page1 <- plot_ly(data=ageg2,x=~agegp, y= ~count, color=~agegp, type="box") %>% layout (scene= list(xaxis=list(title="Age Group "),
                                        yaxis=list(title="Count"),
                                        title= "MOMA artists Age Group Box Plot"))


page1

1.3 Overall Artworks

In this section, we will explore the artworks in MOMA. Such as, the dimensions and durations. What time is the artwork produced and when is the artwork been collected. Furthermore, through which method the museum collected those artworks.

1.3.1 Collection Method

1.3.1.1 Top 20 Collection Method

we can see the Louis E.stern collection is the top contributor. Then a lot of artworks are been purchased. In addition, many of the artworks are gifts from various artists or Funds.

credit <- group_by(art, Credit)
credit1 <- dplyr::summarise(credit, count=n())
credit2 <- filter(credit1, count != 3070)
credit2 <- credit2[order(-credit2$count),]
credit3 <- credit2[c(1:20),]

treemap(credit3, index="Credit", vSize="count", type="index", 
         title="Art Collection Treemap", fontsize.title=11)

1.3.1.2 Generalized Collection Method

In this section, we group all the artworks into 5 groups. Purchased, Gift, Acquired through Fund, Exchange and Other. As we can see, Majority of Artworks are gifts. We are very luckly to have so many generous people and insititutes will to sharing their art collections with public. In addition, we can see the most gift are for books.

credit2 <- na.omit(credit2)

credit2$gc <-as.numeric(0)
credit2$gc<- ifelse (grepl("Purchase", credit2$Credit,  ignore.case=TRUE)=="TRUE","Purchase",
             ifelse(grepl("gift", credit2$Credit, ignore.case=TRUE)=="TRUE","Gift",
            ifelse(grepl("given", credit2$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("generrosity", credit2$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("partial", credit2$Credit, ignore.case=TRUE)=="TRUE", "Partial Gift Partial Purchase",
            ifelse(grepl("exchange", credit2$Credit, ignore.case=TRUE)=="TRUE","Exchange",
            ifelse(grepl("fund", credit2$Credit, ignore.case=TRUE)=="TRUE","Acquired through Fund", "Other")))))))

creditg <- group_by(credit2, gc)
creditg1 <- dplyr::summarise(creditg, count=n())

treemap(creditg1, index="gc", vSize="count", type="index", 
         title="Art Collection Treemap", fontsize.title=11)

art$gc <- ifelse (grepl("Purchase", art$Credit,  ignore.case=TRUE)=="TRUE","Purchase",
             ifelse(grepl("gift", art$Credit, ignore.case=TRUE)=="TRUE","Gift",
            ifelse(grepl("given", art$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("generrosity", art$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("partial", art$Credit, ignore.case=TRUE)=="TRUE", "Partial Gift Partial Purchase",
            ifelse(grepl("exchange", art$Credit, ignore.case=TRUE)=="TRUE","Exchange",
            ifelse(grepl("fund", art$Credit, ignore.case=TRUE)=="TRUE","Acquired through Fund", "Other")))))))

artcgc <- group_by(art, gc, Department)
artcgc1 <- dplyr::summarise(artcgc, count=n())


 ggplot(artcgc1, aes(y=gc, x=Department, fill = count)) + 
    geom_tile(colour = "white")  +
    scale_fill_gradient(low="skyblue", high="Pink") +
    labs(y="Credit", x="Department", title="Heatmap of Department and credit" )+theme(axis.text.x = element_text(angle = 60, hjust = 1))

1.3.2 Artwork Collection Time

1.3.2.1 Artwork Timeline

As we can see from the histogram graph, around 1900, we see a local peak for the art productions. It is very interesting, 1900 is around the Impression art time. Then around 1930s, there is another period where a large amount of artworks are produced. In 1966 that’s the absolute max for MOMA art work production. I think it is the time when people recovered from the damage of WWII. the artwork after 2000 are realitively steady. For the acquistion time, MOMA acquired a lot of pieces in 1960s.

#date histogram
art$date1 <- str_extract(art$Date, '(?<!\\d)\\d{4}(?!\\d)')
art$adate <- str_extract(art$Acquisition.Date, '(?<!\\d)\\d{4}(?!\\d)')

arty <- filter(art,  date1 !="NA")
aart<- filter(art,  adate !="NA")
year <- group_by(art, date1 , adate )

yeart<- filter(year,  date1 !="NA" & adate != "NA")
year2 <- yeart[, 23:24]
year2$daten <- as.numeric(year2$date1)
year2$adaten <- as.numeric(year2$adate)




# painting year
py <- plot_ly(x=arty$date1, type="histogram", alpha=0.6, color="green") %>%
  layout(yaxis=list(type="linear") , title="MOMA artwork producing Year")
py
#acqusition year
ay <- plot_ly(x=aart$adate, type="histogram", alpha=0.6, color="lightgreen") %>%
  layout(yaxis=list(type="linear") , title="MOMA artwork Acquisition Year")
ay

1.3.2.2 Production vs Collection Year

In this section, we can see the 2 dimensional Production vs Collection year graph are consistent with our previous conclusion: 1960s is the golden age for both art productions as well as art collections in MOMA. Furthermore, we noticed around year 2000, the artists reached another high production period.

#scatter plot of making and acqusition
year3 <- group_by(year2, daten , adaten )
year4 <- dplyr::summarise(year3, count=n())

p <- ggplot(year4, aes(daten, adaten, color=count))+ geom_point(aes(size=count))+xlim(1750,2020)+ylim(1925, 2020) + scale_color_gradient(low="blue", high="red")+ labs(x="Year of Artwork Produced", y="Year of Artwork Collected", title="MOMA artwork production vs collection year")
p
## Warning: Removed 2 rows containing missing values (geom_point).

1.3.3 Dimension of Art work

1.3.3.1 Artwork Dimension 3D

Now we look the dimension of the art works

#3d
art3d <- filter(art, art$Height..cm. !="NA" & art$Width..cm. !="NA" & art$Length..cm. !="NA" & art$Height..cm. >0)

art2dd <- art3d[, c(16:18)]
colnames(art2dd) <-c ("h","w", "l")

p3d <- plot_ly(art2dd, x=~h , y= ~w , z= ~l) %>%
  add_markers() %>% layout (scene= list(xaxis=list(title="Height cm"),
                                        yaxis=list(title="Width cm"),
                                        zaxis=list(title="Length cm"),
                                        title= "MOMA artwork 3D scatter distribution"))

p3d
p3ds<- plot_ly(x=~art2dd$h , y=art2dd$w, z= art2dd$l, type="mesh3d") %>% layout (scene= list(xaxis=list(title="Height cm"),
                                        yaxis=list(title="Width cm"),
                                        zaxis=list(title="Length cm"),
                                        title= "MOMA artwork 3D distribution"))
p3ds

1.3.3.2 Artwork 2D

as we can see, majority of the art work size is less than 500cm * 500cm. we examine it a little in depth, majority of the size is 20cm*20cm

#2d graph
art2d <- filter(art, Height..cm. !="NA" & art$Width..cm. !="NA" & art$Height..cm. >0)
art2d1 <- art2d[, c(16, 18)]
colnames(art2d1) <-c ("h","w")
art2d2 <- group_by(art2d1, h , w )
art2d3 <- dplyr::summarise(art2d2, count=n())
art2d3$c <- ifelse(art2d3$count <11 , "Less <10", " Greater >=10") 



p2d <- plot_ly (data=art2d3 , x=~w , y=~ h , type="scatter" , mode="markers",
               marker=list( opacity=0.2 , color="rgb(255,64,54)") )  
p2d1 <- p2d %>%  layout (xaxis=list(range=c(0,2000), title="Width cm"),
                         yaxis=list(range=c(0,2000), title="Height cm"),
                                    title="MOMA art work 2-D distribution" )
p2d1
# 2D contour

p2d2 <- plot_ly(x=art2d3$w, y=art2d3$h) %>%
  add_histogram2dcontour(showscale=FALSE, ncontours=20, colorscale='hot', 
                             contours = list(coloring='heatmap'))%>%
      add_markers(x = art2d3$w[1:50000], y = art2d3$h[1:50000], marker=list(size=1), color=I("black"), 
                  opacity=.2) %>%  layout (xaxis=list(range=c(0,100), title="Width cm"),
                         yaxis=list(range=c(0,100), title="Height cm"),
                                    title="MOMA art work 2-D distribution" )


p2d2

1.3.3.3 Cicular Art Work

we can see the range of circular art work is extremely broad, we take the diameter to cacluate the area. From the graph, we can see, majority of the circular art work has diameter less than 30cm. In addition 10cm is the most popular size.

# calculate the area
artcir <- filter(art, Diameter..cm. >0)
artcir$area <- ceiling(pi*(artcir$Diameter..cm./2)^2)
artcir$d <- ceiling(artcir$Diameter..cm.)





#
artcir1 <- group_by(artcir, area , d )
artcir2 <- dplyr::summarise(artcir1, count=n())

pcicr <- plot_ly (artcir2, x=~d , y =~ area, type="scatter" , mode="marker", size=~count, color=~count , colors = 'Paired',
        marker = list(opacity = 0.4, sizemode = 'diameter'))%>%  layout (xaxis=list(range=c(0,60), title="diameter cm"),
                         yaxis=list(range=c(0,3000), title="area cm*cm"),
                                    title="MOMA Circular art work 2-D distribution" )
pcicr

1.4 Impression artists

1.4.1 Vincent Van Gogh

I remember the first VanGogh painting I saw in my life is the starry Night in MOMA. At that Time, I felt a strong sense of directions, almost like a vector field of differential equations. Many years later, I was in museum, I heard a girl saying that if she could go back to meet any person in the history for 10 minutes. She would go back to tell Vincent, tell him : Don’t be sad, because he is going to be one of the most remarkable artists in the history. Everyone treasures his work. At that moment I am very touched. I don’t know which one is more, the girl’s kindness or Van gogh’s painint

van <- filter(art, art$Name=="Vincent van Gogh")

ggplot(data=van, aes(x=Title, y=date1))+geom_bar(stat="identity", aes(fill=Medium))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Van Gogh in MOMA") + xlab("Artwork Title") +ylab("Produciton year")+theme(legend.position = "bottom")  

1.4.2 Monet and his friends

I always thought artists will live free, die young. When I ran this dataset, the data actually demonstrated a lot of artists lived a relatively long life , Picasso lived to 92. Monet, Matisse, Renoir lived close to 80 years old. Only Van Gogh died before 40. While we explore the dimensions of the artwork. The graph shows the most famous impressionist preferred smaller size artwork, length <50cm, width<50 cm. Gauguin and Cezanne favored large paintings more than 100cm in length. However Monet’s 600cm by 200cm Water Lily Painting is absolutely the most breathe taking painting. It’s interesting how Beethoven was deaf towards the end of his career and composed Symphony 9, while Monet was almost blind while work on those stunning Water Lily.

monet<- filter(art, art$Name %in% c("Claude Monet", "�??douard Manet", "Paul Cézanne" , "Camille Pissarro",
                                    "Pierre-Auguste Renoir" , "Paul Gauguin","Vincent van Gogh") )

monetn <- filter(age, Name %in% c("Claude Monet", "�??douard Manet", "Paul Cézanne" , "Camille Pissarro",
                                    "Pierre-Auguste Renoir" , "Paul Gauguin","Vincent van Gogh","Pablo Picasso","Henri Matisse"))

mp <- plot_ly(monetn , x= ~Birth.Year , y=~Death.Year , type="scatter", mode="markers",
              size=monetn$age, color=~Name, colors="Paired",
              sizes=c(min(monetn$age), max(monetn$age)),
               marker = list(opacity = 0.5, sizemode = 'diameter'),
              text= ~paste( age, Name, Nationality)) %>% layout(title="Famous Impressionism  Artists")
      mp        
monet2 <- group_by(monet, Name)
monet3 <- dplyr::summarise(monet2, count=n())

# scale of the work

monetm <- merge(monet3, monet, y.by=Name )
mcp <- plot_ly(monetm , x= ~Height..cm. , y=~Width..cm. , type="scatter", mode="markers",
              size=~count, color=~Name, colors="Paired", sizes=c(min(monetm$count),max(monetm$count)),
               marker = list(opacity = 0.5, sizemode = 'diameter'),
              text= ~paste(  Name, date1, count, Title)) %>% layout(title="Size of Famous Impressionism  Artwork")
mcp
## Warning: Ignoring 3 observations