We will merge two dataset artist and art to acquire the personal info about each artists as well as their art works.
moma <- merge(art, artists , y.by=Artist.ID)
head(moma, 5)
## Artist.ID Name Artwork.ID
## 1 1 Robert Arneson 33599
## 2 1 Robert Arneson 64139
## 3 10 Irene Aronson 64320
## 4 100 Ivan Le Lorraine Albright 70450
## 5 100 Ivan Le Lorraine Albright 81066
## Title Date
## 1 Study for Head Bath 1977
## 2 General Nuke 1986
## 3 Moon Night 1951
## 4 Into the World There Came A Soul Called Ida 1940
## 5 The Artist's Father 1935 (cast 1952)
## Medium
## 1 Conté crayon and pencil on paper
## 2 Lithograph
## 3 Etching and aquatint
## 4 Lithograph
## 5 Bronze
## Dimensions
## 1 41 5/8 x 29 7/8" (105.8 x 75.8 cm)
## 2 composition (irreg.): 31 11/16 x 23 5/16" (80.5 x 59.2 cm); sheet: 36 1/4 x 27 15/16" (92 x 71 cm)
## 3 plate: 5 13/16 x 4" (14.7 x 10.1cm); sheet: 13 1/8 x 10" (33.3 x 25.4cm)
## 4 composition: 16 15/16 x 14" (43 x 35.5 cm); sheet: 19 5/16 x 15 3/8" (49 x 39.1 cm)
## 5 15 x 6 1/2 x 4 3/4" (38.1 x 16.5 x 12.1 cm)
## Acquisition.Date Credit Catalogue
## 1 1981-04-28 Gift of the Friends of Contemporary Drawing Y
## 2 1997-05-28 Gift of Landfall Press Y
## 3 1952-11-07 Given Anonymously Y
## 4 1965-06-08 John B. Turner Fund Y
## 5 1952-10-21 Gift of Earle Ludgin Y
## Department Classification Object.Number Diameter..cm.
## 1 Drawings Drawing 67.1981 NA
## 2 Prints & Illustrated Books Print 192.1997 NA
## 3 Prints & Illustrated Books Print 201.1952 NA
## 4 Prints & Illustrated Books Print 521.1965 NA
## 5 Painting & Sculpture Sculpture 172.1952 NA
## Circumference..cm. Height..cm. Length..cm. Width..cm. Depth..cm.
## 1 NA 105.7 NA 75.9 NA
## 2 NA 80.5 NA 59.2 NA
## 3 NA 14.7 NA 10.1 NA
## 4 NA 43.0 NA 35.5 NA
## 5 NA 38.1 NA 16.5 12.1
## Weight..kg. Duration..s. Nationality Gender Birth.Year Death.Year
## 1 NA NA American Male 1930 1992
## 2 NA NA American Male 1930 1992
## 3 NA NA American Female 1918 NA
## 4 NA NA American Male 1897 1983
## 5 NA NA American Male 1897 1983
In this section, we will explore the nationality , age and gender of the artists.
We can see MOMA has majority the collection from American artists, followed by German, French and European artists. Japanese and Argentina or Brazil are the South American representation.
#nationality
country <- group_by(artists, Nationality )
country2 <- dplyr::summarise(country, count=n())
country2 <- country2[order(-country2$count),]
country3 <- filter (country2, count!=2488 & count!=255)
country4<- country3[1:20,]
ggplot(data=country4, aes(x=reorder(Nationality,count), y=count))+geom_bar(stat="identity", aes(fill=Nationality))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Top 20 Nationality of Artists in MOMA") + xlab("Nationality") +ylab("Total number of Artists")+theme(legend.position = "none")
We picked 20 artists has the most art work in MOMA, it seems like MOMA adore American and French artists, there are some German artists. Picasso is the only Spain artists made to the top 20 lists. Atget is a photographer, Bourgoies is famous for her large scaled scupltures. I am very suprised to see MOMA has extremely large amount of collections of Picasso and Matisse.
#artists
countryn <- group_by(moma, Artist.ID, Name, Nationality)
countryn2 <- dplyr::summarise(countryn, count=n())
countryn2 <- countryn2[order(-countryn2$count),]
countryn3 <- filter(countryn2, Name !="Unknown photographer" & Name !="Unknown Artist" & count !=1460)
countryn4<- countryn3[1:20,]
ggplot(data=countryn4, aes(x=reorder(Name,count), y=count))+geom_bar(stat="identity", aes(fill=Nationality))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Artists of the most artworks in MOMA") + xlab("Artists") +ylab("Total number of artists")+theme(legend.position = "bottom")
#treemap(countryn4, index="Name", vSize="count", type="index",
# title="Artists Treemap", fontsize.title=11)
From the Gender distribution, we can see the majority of the artists are Male 65%, felmale is less than 20%. The unknown category may include an institution or group work.
#Female
artists$s<- ifelse(artists$Gender %in% c("Male", "male"), "Male",
ifelse(artists$Gender %in% c("Female", "female"), "Female",
"Unknown"))
sex <- group_by(artists, s )
sex2 <- dplyr::summarise(sex, count=n())
p1<-plot_ly(sex2, labels=~s, values= ~count, type="pie",
textposition="inside", textinfo="label+percent",
insidetextfont=list(color="#FFFFFF")) %>%
layout(title="MOMA artists Gender distribution")
p1
In this section, we used the non-missing birth date and death date to calculated the artists survival age, then use the histogram graph to show the age distribution. In addition, we divided the artists into 3 groups, died before 40 years older, died between 40 to 65 years old, and live up to 65 years old. From the plot, we can see majority of the artists lived up to 65 years old. B. Efimov from Russian lived to 108 years old.
painter <- filter(artists, Birth.Year >0)
age <- filter(artists, Birth.Year >0 & Death.Year >0)
age$age <- age$Death.Year- age$Birth.Year
age1 <- filter(age, age >14 & age <110)
age1$agegp <- ifelse(age1$age <= 40 & age1$age >14, " <= 40 years",
ifelse(age1$age >40 & age1$age <= 65, "<= 65 years", " >65 years"))
#histogram
page <- plot_ly(age1, x=~age, alpha=0.6, color="orange", type="histogram")%>%
layout(title="MOMA artists Age Distribution", list(yaxis=list(title="Count")))
page
#box
ageg1 <- group_by(age1, age, agegp )
ageg2 <- dplyr::summarise(ageg1, count=n())
page1 <- plot_ly(data=ageg2,x=~agegp, y= ~count, color=~agegp, type="box") %>% layout (scene= list(xaxis=list(title="Age Group "),
yaxis=list(title="Count"),
title= "MOMA artists Age Group Box Plot"))
page1
In this section, we will explore the artworks in MOMA. Such as, the dimensions and durations. What time is the artwork produced and when is the artwork been collected. Furthermore, through which method the museum collected those artworks.
we can see the Louis E.stern collection is the top contributor. Then a lot of artworks are been purchased. In addition, many of the artworks are gifts from various artists or Funds.
credit <- group_by(art, Credit)
credit1 <- dplyr::summarise(credit, count=n())
credit2 <- filter(credit1, count != 3070)
credit2 <- credit2[order(-credit2$count),]
credit3 <- credit2[c(1:20),]
treemap(credit3, index="Credit", vSize="count", type="index",
title="Art Collection Treemap", fontsize.title=11)
In this section, we group all the artworks into 5 groups. Purchased, Gift, Acquired through Fund, Exchange and Other. As we can see, Majority of Artworks are gifts. We are very luckly to have so many generous people and insititutes will to sharing their art collections with public. In addition, we can see the most gift are for books.
credit2 <- na.omit(credit2)
credit2$gc <-as.numeric(0)
credit2$gc<- ifelse (grepl("Purchase", credit2$Credit, ignore.case=TRUE)=="TRUE","Purchase",
ifelse(grepl("gift", credit2$Credit, ignore.case=TRUE)=="TRUE","Gift",
ifelse(grepl("given", credit2$Credit, ignore.case=TRUE)=="TRUE", "Gift",
ifelse(grepl("generrosity", credit2$Credit, ignore.case=TRUE)=="TRUE", "Gift",
ifelse(grepl("partial", credit2$Credit, ignore.case=TRUE)=="TRUE", "Partial Gift Partial Purchase",
ifelse(grepl("exchange", credit2$Credit, ignore.case=TRUE)=="TRUE","Exchange",
ifelse(grepl("fund", credit2$Credit, ignore.case=TRUE)=="TRUE","Acquired through Fund", "Other")))))))
creditg <- group_by(credit2, gc)
creditg1 <- dplyr::summarise(creditg, count=n())
treemap(creditg1, index="gc", vSize="count", type="index",
title="Art Collection Treemap", fontsize.title=11)
art$gc <- ifelse (grepl("Purchase", art$Credit, ignore.case=TRUE)=="TRUE","Purchase",
ifelse(grepl("gift", art$Credit, ignore.case=TRUE)=="TRUE","Gift",
ifelse(grepl("given", art$Credit, ignore.case=TRUE)=="TRUE", "Gift",
ifelse(grepl("generrosity", art$Credit, ignore.case=TRUE)=="TRUE", "Gift",
ifelse(grepl("partial", art$Credit, ignore.case=TRUE)=="TRUE", "Partial Gift Partial Purchase",
ifelse(grepl("exchange", art$Credit, ignore.case=TRUE)=="TRUE","Exchange",
ifelse(grepl("fund", art$Credit, ignore.case=TRUE)=="TRUE","Acquired through Fund", "Other")))))))
artcgc <- group_by(art, gc, Department)
artcgc1 <- dplyr::summarise(artcgc, count=n())
ggplot(artcgc1, aes(y=gc, x=Department, fill = count)) +
geom_tile(colour = "white") +
scale_fill_gradient(low="skyblue", high="Pink") +
labs(y="Credit", x="Department", title="Heatmap of Department and credit" )+theme(axis.text.x = element_text(angle = 60, hjust = 1))
As we can see from the histogram graph, around 1900, we see a local peak for the art productions. It is very interesting, 1900 is around the Impression art time. Then around 1930s, there is another period where a large amount of artworks are produced. In 1966 that’s the absolute max for MOMA art work production. I think it is the time when people recovered from the damage of WWII. the artwork after 2000 are realitively steady. For the acquistion time, MOMA acquired a lot of pieces in 1960s.
#date histogram
art$date1 <- str_extract(art$Date, '(?<!\\d)\\d{4}(?!\\d)')
art$adate <- str_extract(art$Acquisition.Date, '(?<!\\d)\\d{4}(?!\\d)')
arty <- filter(art, date1 !="NA")
aart<- filter(art, adate !="NA")
year <- group_by(art, date1 , adate )
yeart<- filter(year, date1 !="NA" & adate != "NA")
year2 <- yeart[, 23:24]
year2$daten <- as.numeric(year2$date1)
year2$adaten <- as.numeric(year2$adate)
# painting year
py <- plot_ly(x=arty$date1, type="histogram", alpha=0.6, color="green") %>%
layout(yaxis=list(type="linear") , title="MOMA artwork producing Year")
py
#acqusition year
ay <- plot_ly(x=aart$adate, type="histogram", alpha=0.6, color="lightgreen") %>%
layout(yaxis=list(type="linear") , title="MOMA artwork Acquisition Year")
ay
In this section, we can see the 2 dimensional Production vs Collection year graph are consistent with our previous conclusion: 1960s is the golden age for both art productions as well as art collections in MOMA. Furthermore, we noticed around year 2000, the artists reached another high production period.
#scatter plot of making and acqusition
year3 <- group_by(year2, daten , adaten )
year4 <- dplyr::summarise(year3, count=n())
p <- ggplot(year4, aes(daten, adaten, color=count))+ geom_point(aes(size=count))+xlim(1750,2020)+ylim(1925, 2020) + scale_color_gradient(low="blue", high="red")+ labs(x="Year of Artwork Produced", y="Year of Artwork Collected", title="MOMA artwork production vs collection year")
p
## Warning: Removed 2 rows containing missing values (geom_point).
Now we look the dimension of the art works
#3d
art3d <- filter(art, art$Height..cm. !="NA" & art$Width..cm. !="NA" & art$Length..cm. !="NA" & art$Height..cm. >0)
art2dd <- art3d[, c(16:18)]
colnames(art2dd) <-c ("h","w", "l")
p3d <- plot_ly(art2dd, x=~h , y= ~w , z= ~l) %>%
add_markers() %>% layout (scene= list(xaxis=list(title="Height cm"),
yaxis=list(title="Width cm"),
zaxis=list(title="Length cm"),
title= "MOMA artwork 3D scatter distribution"))
p3d
p3ds<- plot_ly(x=~art2dd$h , y=art2dd$w, z= art2dd$l, type="mesh3d") %>% layout (scene= list(xaxis=list(title="Height cm"),
yaxis=list(title="Width cm"),
zaxis=list(title="Length cm"),
title= "MOMA artwork 3D distribution"))
p3ds
as we can see, majority of the art work size is less than 500cm * 500cm. we examine it a little in depth, majority of the size is 20cm*20cm
#2d graph
art2d <- filter(art, Height..cm. !="NA" & art$Width..cm. !="NA" & art$Height..cm. >0)
art2d1 <- art2d[, c(16, 18)]
colnames(art2d1) <-c ("h","w")
art2d2 <- group_by(art2d1, h , w )
art2d3 <- dplyr::summarise(art2d2, count=n())
art2d3$c <- ifelse(art2d3$count <11 , "Less <10", " Greater >=10")
p2d <- plot_ly (data=art2d3 , x=~w , y=~ h , type="scatter" , mode="markers",
marker=list( opacity=0.2 , color="rgb(255,64,54)") )
p2d1 <- p2d %>% layout (xaxis=list(range=c(0,2000), title="Width cm"),
yaxis=list(range=c(0,2000), title="Height cm"),
title="MOMA art work 2-D distribution" )
p2d1
# 2D contour
p2d2 <- plot_ly(x=art2d3$w, y=art2d3$h) %>%
add_histogram2dcontour(showscale=FALSE, ncontours=20, colorscale='hot',
contours = list(coloring='heatmap'))%>%
add_markers(x = art2d3$w[1:50000], y = art2d3$h[1:50000], marker=list(size=1), color=I("black"),
opacity=.2) %>% layout (xaxis=list(range=c(0,100), title="Width cm"),
yaxis=list(range=c(0,100), title="Height cm"),
title="MOMA art work 2-D distribution" )
p2d2
we can see the range of circular art work is extremely broad, we take the diameter to cacluate the area. From the graph, we can see, majority of the circular art work has diameter less than 30cm. In addition 10cm is the most popular size.
# calculate the area
artcir <- filter(art, Diameter..cm. >0)
artcir$area <- ceiling(pi*(artcir$Diameter..cm./2)^2)
artcir$d <- ceiling(artcir$Diameter..cm.)
#
artcir1 <- group_by(artcir, area , d )
artcir2 <- dplyr::summarise(artcir1, count=n())
pcicr <- plot_ly (artcir2, x=~d , y =~ area, type="scatter" , mode="marker", size=~count, color=~count , colors = 'Paired',
marker = list(opacity = 0.4, sizemode = 'diameter'))%>% layout (xaxis=list(range=c(0,60), title="diameter cm"),
yaxis=list(range=c(0,3000), title="area cm*cm"),
title="MOMA Circular art work 2-D distribution" )
pcicr
I remember the first VanGogh painting I saw in my life is the starry Night in MOMA. At that Time, I felt a strong sense of directions, almost like a vector field of differential equations. Many years later, I was in museum, I heard a girl saying that if she could go back to meet any person in the history for 10 minutes. She would go back to tell Vincent, tell him : Don’t be sad, because he is going to be one of the most remarkable artists in the history. Everyone treasures his work. At that moment I am very touched. I don’t know which one is more, the girl’s kindness or Van gogh’s painint
van <- filter(art, art$Name=="Vincent van Gogh")
ggplot(data=van, aes(x=Title, y=date1))+geom_bar(stat="identity", aes(fill=Medium))+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Van Gogh in MOMA") + xlab("Artwork Title") +ylab("Produciton year")+theme(legend.position = "bottom")
I always thought artists will live free, die young. When I ran this dataset, the data actually demonstrated a lot of artists lived a relatively long life , Picasso lived to 92. Monet, Matisse, Renoir lived close to 80 years old. Only Van Gogh died before 40. While we explore the dimensions of the artwork. The graph shows the most famous impressionist preferred smaller size artwork, length <50cm, width<50 cm. Gauguin and Cezanne favored large paintings more than 100cm in length. However Monet’s 600cm by 200cm Water Lily Painting is absolutely the most breathe taking painting. It’s interesting how Beethoven was deaf towards the end of his career and composed Symphony 9, while Monet was almost blind while work on those stunning Water Lily.
monet<- filter(art, art$Name %in% c("Claude Monet", "�??douard Manet", "Paul Cézanne" , "Camille Pissarro",
"Pierre-Auguste Renoir" , "Paul Gauguin","Vincent van Gogh") )
monetn <- filter(age, Name %in% c("Claude Monet", "�??douard Manet", "Paul Cézanne" , "Camille Pissarro",
"Pierre-Auguste Renoir" , "Paul Gauguin","Vincent van Gogh","Pablo Picasso","Henri Matisse"))
mp <- plot_ly(monetn , x= ~Birth.Year , y=~Death.Year , type="scatter", mode="markers",
size=monetn$age, color=~Name, colors="Paired",
sizes=c(min(monetn$age), max(monetn$age)),
marker = list(opacity = 0.5, sizemode = 'diameter'),
text= ~paste( age, Name, Nationality)) %>% layout(title="Famous Impressionism Artists")
mp
monet2 <- group_by(monet, Name)
monet3 <- dplyr::summarise(monet2, count=n())
# scale of the work
monetm <- merge(monet3, monet, y.by=Name )
mcp <- plot_ly(monetm , x= ~Height..cm. , y=~Width..cm. , type="scatter", mode="markers",
size=~count, color=~Name, colors="Paired", sizes=c(min(monetm$count),max(monetm$count)),
marker = list(opacity = 0.5, sizemode = 'diameter'),
text= ~paste( Name, date1, count, Title)) %>% layout(title="Size of Famous Impressionism Artwork")
mcp
## Warning: Ignoring 3 observations