1.🖼 MOMA Introduction

MOMA is one of the signature NYC tourist places, due to COVID, many people might miss the chance to visit. I decided to do a data analysis on the MOMA artists and the artworks that I like together with GANS to create a cloud tour! Hope you will enjoy this art data journey.

Data Source: MOMA, MOMA website

Color Scheme: Monet, Picasso, Van Gogh

Outline:

  1. Data Cleaning
  2. Overall MOMA artwork EDA
  3. MOMA Cloud Tour by Artists
  4. GANS mix style

Note: the choice of artworks are based on my personal taste, extremely subjective.

Data Cleaning

Merge two datasets artist and art to retrieve the artists’ info such as gender, age, nationality and birth year.

Sample MOMA Dataset

moma <- merge(art, artists , y.by=Artist.ID)
sample <- moma[,c("Name","Title","Date","Medium","Classification","Height..cm.","Nationality","Gender","Birth.Year","Death.Year")]
head(sample, 20)%>% DT::datatable()

2.🧩 MOMA Artwork EDA

Artists Overview

In this section, we will explore the nationality of the artists, age, amount of their works, and their gender. In addition, we will present some of the Impressionism pioneers life span.

Artists Nationality

Excluding the Missing Nationality works, we can see MOMA has majority the collection from American artists, followed by German, French and European artists. Japanese and Argentina or Brazil are Asian and South American representation.

#nationality
country <- group_by(artists, Nationality )
country2 <- dplyr::summarise(country, count=n())
country2 <- country2[order(-country2$count),]
country3 <- filter (country2, count!=2488 & count!=255)
country4<- country3[1:20,]

ggplot(data=country4, aes(x=reorder(Nationality,count), y=count))+geom_bar(stat="identity", aes(fill=Nationality))+theme_minimal()+theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Top 20 Nationality of Artists in MOMA") + xlab("Nationality") +ylab("Total number of Artists")+theme(legend.position = "none")  

country4$country<- ifelse(country4$Nationality %in% c("American"), "USA", 
                      ifelse(country4$Nationality %in% c("German"), "Germany",
                      ifelse(country4$Nationality %in% c("French"), "France",
                      ifelse(country4$Nationality %in% c("British"), "Britian",
                      ifelse(country4$Nationality %in% c("Italian"), "Italy",
                      ifelse(country4$Nationality %in% c("Japanese"), "Japan",
                      ifelse(country4$Nationality %in% c("Swiss"), "Swissland",
                      ifelse(country4$Nationality %in% c("Dutch"), "Netherland",
                       ifelse(country4$Nationality %in% c("Austrian"), "Austria",
                      ifelse(country4$Nationality %in% c("Canadian"), "Canada",
                      ifelse(country4$Nationality %in% c("Russian"), "Russia",
                        ifelse(country4$Nationality %in% c("Brazilian"), "Brazil",
                      ifelse(country4$Nationality %in% c("Spanish"), "Spain",
                      ifelse(country4$Nationality %in% c("Argentine"), "Argentina",
                      ifelse(country4$Nationality %in% c("Swedish"), "Sweden",
                      ifelse(country4$Nationality %in% c("Mexican"), "Mexico",
                      ifelse(country4$Nationality %in% c("Polish"), "Poland",
                      ifelse(country4$Nationality %in% c("Danish"), "Denmark",
                       ifelse(country4$Nationality %in% c("Belgian"), "Belgium",
                               "Czechia"
                  )))))))))))))))))))
n <- joinCountryData2Map(country4, joinCode="NAME", nameJoinColumn="country")
## 16 codes from your data successfully matched countries in the map
## 4 codes from your data failed to match with a country code in the map
## 227 codes from the map weren't represented in your data
mapCountryData(n, nameColumnToPlot="count", mapTitle="Top WildLife Import Country World Map" , colourPalette = "rainbow")

Artists with Most Art Records

We picked 20 artists has the most art records in MOMA, note it does NOT mean the actual artwork, it could be newspaper or replications, the number only meant the record. MOMA adore American and French artists, there are some German artists. Picasso is the only Spain artists made to the top 20 lists. Atget is a photographer, Bourgoies is famous for her large scaled sculptures. I am very surprised to see MOMA has extremely large amount of records and collections of Picasso and Matisse.

This Bar chart color Scheme is inspired by Picasso

#artists
countryn <- group_by(moma, Artist.ID, Name, Nationality)
countryn2 <- dplyr::summarise(countryn, count=n())
countryn2 <- countryn2[order(-countryn2$count),]
countryn3 <- filter(countryn2, Name !="Unknown photographer" & Name !="Unknown Artist" & count !=1460)

countryn4<- countryn3[1:20,]

ggplot(data=countryn4, aes(x=reorder(Name,count), y=count))+geom_bar(stat="identity", aes(fill=Nationality))+ scale_fill_manual(values=c("American"="#de0000","French"="#0547ae","German"="#e5c709","Spanish"="#6c676e"))+ theme_minimal()+ theme(axis.text.x = element_text(angle = 60, hjust = 1)) +ggtitle("Artists of the most artworks in MOMA") + xlab("Artists") +ylab("Total Records")+theme(legend.position = "bottom")  

#treemap(countryn4, index="Name", vSize="count", type="index", 
       #  title="Artists Treemap", fontsize.title=11)
       
      
mmartists <- art[, c(4)]
set.seed(888)
wordcloud(mmartists, max.words=100 ,random.order=FALSE,rot.per=0.35,colors=brewer.pal(4, "Dark2"), main="Artists")

Artists’ Gender

From the Gender distribution, we can see the majority of the artists are Male 65%, female artists are less than 20%. The unknown category may include an institution or group work.

Frida Kahlo

#Female
artists$s<- ifelse(artists$Gender %in% c("Male", "male"), "Male",
                   ifelse(artists$Gender %in% c("Female", "female"), "Female",
                          "Unknown"))
sex <- group_by(artists, s )
sex2 <- dplyr::summarise(sex, count=n())

p1<-plot_ly(sex2, labels=~s, values= ~count, type="pie",
           textposition="inside", textinfo="label+percent",
           insidetextfont=list(color="#FFFFFF"),
           marker=list(colors=(c("#fcd2e4","#9fc05c","#f9cc00")))) %>%
  layout(title="MOMA Artists Gender distribution")
p1

Female Artists Mini Sample

sexf<- subset(sex,sex$Gender=="Female")
sexf1 <- group_by(sexf, Name )
head(sexf1, 20)%>% DT::datatable()

Girls Help Girls

Artemisia Gentileschi

The pioneering painter survived a rape, but scholars are pushing against the idea that her work was defined by it, and celebrating her rich harnessing of motherhood, passion, and ambition. Three painters worked on this subject before, but only in this painting, the maid is not standing and casting a cold eye anymore, she helped to lady to punish the man. Women can understand women, and women can help women!

Artists Age Distribution

In this section, we used the non-missing birth date and death date to calculated the artists survival age, then use the histogram graph to show the age distribution. In addition, we divided the artists into 3 groups, died before 40 years older, died between 40 to 65 years old, and live up to 65 years old. From the plot, we can see majority of the artists lived up to 65 years old. B. Efimov from Russian lived to 108 years old.

painter <- filter(artists, Birth.Year >0)
age <- filter(artists, Birth.Year >0 & Death.Year >0)
age$age <- age$Death.Year- age$Birth.Year
age1 <- filter(age, age >14 & age <110)
age1$agegp <- ifelse(age1$age <= 40 & age1$age >14, " <= 40 years",
                     ifelse(age1$age >40 & age1$age <= 65, "<= 65 years", " >65 years"))

#scatter
page <- plot_ly(age1, x=~age,  alpha=0.6, type="histogram", marker = list(color = "#edbacd"))%>%
  layout(title="MOMA Artists Age Distribution", list(yaxis=list(title="Count")))
page
#box
ageg1 <- group_by(age1, age, agegp )
ageg2 <- dplyr::summarise(ageg1, count=n())
page1 <- plot_ly(data=ageg2,x=~agegp, y= ~count, color=~agegp, type="box",  marker = list(color = c("#edbacd","#496689","#7e9089"))) %>%
   layout(title="MOMA artists Age Group Box Plot")
page1

Impressionism Artists

I always thought artists will live free, die young. When I ran this dataset, the data actually demonstrated a lot of artists lived a relatively long life , Picasso lived to 92. Monet, Matisse, Renoir lived close to 80 years old. Only Van Gogh died before 40. While we explore the dimensions of the artwork. The graph shows the most famous impressionist preferred smaller size artwork, length < 50cm, width < 50 cm. Gauguin and Cezanne favored large paintings more than 100cm in length. However Monet’s 600cm by 200cm Water Lily Painting is absolutely the most breathe taking painting. It’s interesting how Beethoven was deaf towards the end of his career and composed Symphony 9, while Monet was almost blind while work on those stunning Water Lily.

monet<- filter(art, art$Name %in% c("Claude Monet", "Édouard Manet", "Paul Cézanne" , "Camille Pissarro", "Pierre-Auguste Renoir" , "Paul Gauguin","Vincent van Gogh") )

monetn <- filter(age, Name %in% c("Claude Monet", "Édouard Manet", "Paul Cézanne" , "Camille Pissarro", "Pierre-Auguste Renoir" , "Paul Gauguin","Vincent van Gogh","Pablo Picasso","Henri Matisse"))

mp <- plot_ly(monetn , x= ~Birth.Year , y=~Death.Year , type="scatter", mode="markers",
              size=monetn$age, color=~Name, colors="Set3",
              sizes=c(min(monetn$age), max(monetn$age)),
               marker = list(opacity = 0.5, sizemode = 'diameter'),
              text= ~paste( age, Name, Nationality)) %>% layout(title="Famous Impressionism  Artists")
      mp   

Overview of Artworks

In this section, we will explore the artworks in MOMA. Such as, the dimensions and duration. What time is the artwork produced and when is the artwork been collected. Furthermore, through which method the museum collected those artworks.

Collection Method

###Collection Method {.tabset .tabset-fade .tabset-pills}

Top 20 Collection Method

####Top 20 Collection Method we can see the Louis E.stern collection is the top contributor. Then a lot of artworks are been purchased. In addition, many of the artworks are gifts from various artists or Funds.

credit <- group_by(art, Credit)
credit1 <- dplyr::summarise(credit, count=n())
credit2 <- filter(credit1, count != 3070)
credit2 <- credit2[order(-credit2$count),]
credit3 <- credit2[c(1:20),]

treemap(credit3, index="Credit", vSize="count", type="index", 
         title="Art Collection Treemap", fontsize.title=11)

Generalized Collection Method

In this section, we group all the artworks into 5 groups. Purchased, Gift, Acquired through Fund, Exchange and Other. As we can see, Majority of Artworks are gifts. We are very luckily to have so many generous people and institutes will to sharing their art collections with public. In addition, we can see the most gift are for books.

credit2 <- na.omit(credit2)

credit2$gc <-as.numeric(0)
credit2$gc<- ifelse (grepl("Purchase", credit2$Credit,  ignore.case=TRUE)=="TRUE","Purchase",
             ifelse(grepl("gift", credit2$Credit, ignore.case=TRUE)=="TRUE","Gift",
            ifelse(grepl("given", credit2$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("generrosity", credit2$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("partial", credit2$Credit, ignore.case=TRUE)=="TRUE", "Partial Gift Partial Purchase",
            ifelse(grepl("exchange", credit2$Credit, ignore.case=TRUE)=="TRUE","Exchange",
            ifelse(grepl("fund", credit2$Credit, ignore.case=TRUE)=="TRUE","Acquired through Fund", "Other")))))))

creditg <- group_by(credit2, gc)
creditg1 <- dplyr::summarise(creditg, count=n())

treemap(creditg1, index="gc", vSize="count", type="index", 
         title="Art Collection Treemap", fontsize.title=11)

art$gc <- ifelse (grepl("Purchase", art$Credit,  ignore.case=TRUE)=="TRUE","Purchase",
             ifelse(grepl("gift", art$Credit, ignore.case=TRUE)=="TRUE","Gift",
            ifelse(grepl("given", art$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("generrosity", art$Credit, ignore.case=TRUE)=="TRUE", "Gift",
            ifelse(grepl("partial", art$Credit, ignore.case=TRUE)=="TRUE", "Partial Gift Partial Purchase",
            ifelse(grepl("exchange", art$Credit, ignore.case=TRUE)=="TRUE","Exchange",
            ifelse(grepl("fund", art$Credit, ignore.case=TRUE)=="TRUE","Acquired through Fund", "Other")))))))

artcgc <- group_by(art, gc, Department)
artcgc1 <- dplyr::summarise(artcgc, count=n())


 ggplot(artcgc1, aes(y=gc, x=Department, fill = count)) + 
    geom_tile(colour = "white")  +
    scale_fill_gradient(low="skyblue", high="Pink") +
    labs(y="Credit", x="Department", title="Heatmap of Department and credit" )+theme_minimal()+theme(axis.text.x = element_text(angle = 60, hjust = 1))

Artwork Collection Time

0.0.1 Artwork Collection Time

Artwork Timeline

0.0.1.1 Artwork Timeline

As we can see from the histogram graph, around 1900, we see a local peak for the art productions. It is very interesting, 1900 is around the Impression art time. Then around 1930s, there is another period where a large amount of artworks are produced. In 1966 that’s the absolute max for MOMA art work production. I think it is the time when people recovered from the damage of WWII. the artwork after 2000 are relatively steady. For the acquisition time, MOMA acquired a lot of pieces in 1960s.

#date histogram
art$date1 <- str_extract(art$Date, '(?<!\\d)\\d{4}(?!\\d)')
art$adate <- str_extract(art$Acquisition.Date, '(?<!\\d)\\d{4}(?!\\d)')

arty <- filter(art,  date1 !="NA")
aart<- filter(art,  adate !="NA")
year <- group_by(art, date1 , adate )

yeart<- filter(year,  date1 !="NA" & adate != "NA")
year2 <- yeart[, 23:24]
year2$daten <- as.numeric(year2$date1)
year2$adaten <- as.numeric(year2$adate)


# painting year
py <- plot_ly(x=arty$date1, type="histogram", alpha=0.6,mark=list(color = "#edbacd")) %>%
  layout(yaxis=list(type="linear") , title="MOMA artwork producing Year")
py
#acqusition year
ay <- plot_ly(x=aart$adate, type="histogram", alpha=0.6, mark=list(color = "#7e9d89")) %>%
  layout(yaxis=list(type="linear") , title="MOMA artwork Acquisition Year")
ay

Production vs Collection Year

####Production vs Collection Year

In this section, we can see the 2 dimensional Production vs Collection year graph are consistent with our previous conclusion: 1960s is the golden age for both art productions as well as art collections in MOMA. Furthermore, we noticed around year 2000, the artists reached another high production period.

#scatter plot of making and acqusition
year3 <- group_by(year2, daten , adaten )
year4 <- dplyr::summarise(year3, count=n())

p <- ggplot(year4, aes(daten, adaten, color=count))+ geom_point(aes(size=count))+xlim(1750,2020)+ylim(1925, 2020) + scale_color_gradient(low="#a3b3c9", high="#edbacd")+ theme_minimal()+labs(x="Year of Artwork Produced", y="Year of Artwork Collected", title="MOMA artwork production vs collection year")
p

Dimension of Art work

Artwork Dimension 3D

Now we look the dimension of the art works

#3d
art3d <- filter(art, art$Height..cm. !="NA" & art$Width..cm. !="NA" & art$Length..cm. !="NA" & art$Height..cm. >0)

art2dd <- art3d[, c(16:18)]
colnames(art2dd) <-c ("h","w", "l")

p3d <- plot_ly(art2dd, x=~h , y= ~w , z= ~l) %>%
  add_markers() %>% layout (scene= list(xaxis=list(title="Height cm"),
                                        yaxis=list(title="Width cm"),
                                        zaxis=list(title="Length cm"),
                                        title= "MOMA artwork 3D scatter distribution"))

p3d
p3ds<- plot_ly(x=~art2dd$h , y=art2dd$w, z= art2dd$l, type="mesh3d") %>% layout (scene= list(xaxis=list(title="Height cm"),
                                        yaxis=list(title="Width cm"),
                                        zaxis=list(title="Length cm"),
                                        title= "MOMA artwork 3D distribution"))
#p3ds

Artwork 2D map

as we can see, majority of the art work size is less than 500cm * 500cm. we examine it a little in depth, majority of the size is 20cm*20cm

#2d graph
art2d <- filter(art, Height..cm. !="NA" & art$Width..cm. !="NA" & art$Height..cm. >0)
art2d1 <- art2d[, c(16, 18)]
colnames(art2d1) <-c ("h","w")
art2d2 <- group_by(art2d1, h , w )
art2d3 <- dplyr::summarise(art2d2, count=n())
## `summarise()` has grouped output by 'h'. You can override using the `.groups` argument.
art2d3$c <- ifelse(art2d3$count <11 , "Less <10", " Greater >=10") 



p2d <- plot_ly (data=art2d3 , x=~w , y=~ h , type="scatter" , mode="markers",
               marker=list( opacity=0.2 , color="rgb(255,64,54)") )  
p2d1 <- p2d %>%  layout (xaxis=list(range=c(0,2000), title="Width cm"),
                         yaxis=list(range=c(0,2000), title="Height cm"),
                                    title="MOMA art work 2-D distribution" )
#p2d1

# 2D contour

p2d2 <- plot_ly(x=art2d3$w, y=art2d3$h) %>%
  add_histogram2dcontour(showscale=FALSE, ncontours=20, colorscale='hot', 
                             contours = list(coloring='heatmap'))%>%
      add_markers(x = art2d3$w[1:50000], y = art2d3$h[1:50000], marker=list(size=1), color=I("black"), 
                  opacity=.2) %>%  layout (xaxis=list(range=c(0,100), title="Width cm"),
                         yaxis=list(range=c(0,100), title="Height cm"),
                                    title="MOMA art work 2-D distribution" )


p2d2

Circular Art Work

we can see the range of circular art work is extremely broad, we take the diameter to calculate the area.

# calculate the area
artcir <- filter(art, Diameter..cm. >0)
artcir$area <- ceiling(pi*(artcir$Diameter..cm./2)^2)
artcir$d <- ceiling(artcir$Diameter..cm.)

#
artcir1 <- group_by(artcir, area , d )
artcir2 <- dplyr::summarise(artcir1, count=n())

pcicr <- plot_ly (artcir2, x=~d , y =~ area, type="scatter" , mode="marker", size=~count, color=~count , colors = 'Set3',
        marker = list(opacity = 0.4, sizemode = 'diameter'))%>%  layout (xaxis=list(range=c(0,60), title="diameter cm"),
                         yaxis=list(range=c(0,3000), title="area cm*cm"),
                                    title="MOMA Circular art work 2-D distribution" )
pcicr

Impressionism Art Work

While we explore the dimensions of the artwork. The graph shows the most famous impressionist preferred smaller size artwork, length < 50cm, width < 50 cm. Gauguin and Cezanne favored large paintings more than 100cm in length. However Monet’s 600cm by 200cm Water Lily Painting is absolutely the most breathe taking painting.

# scale of the work
monet2 <- group_by(monet, Name)
monet3 <- dplyr::summarise(monet2, count=n())

monetm <- merge(monet3, monet, y.by=Name )
mcp <- plot_ly(monetm , x= ~Height..cm. , y=~Width..cm. , type="scatter", mode="markers", size=~count, color=~Name, colors="Set3", sizes=c(min(monetm$count),max(monetm$count)),
               marker = list(opacity = 0.5, sizemode = 'diameter'),
              text= ~paste( Name, date1, count, Title)) %>% layout(title="Size of Famous Impressionism  Artwork")
#mcp

  1. 🎨 MOMA Cloud Tour

In this section, I will present some of my favorite artists and artworks, hope you like it.

Pain and Passionate: Frida Kahlo

I love her bold color, wild imagination, and dignified self image!

My painting carries with it the message of pain

Starry Night: Vincent Van Gogh

Starry, starry night
Paint your palette blue and gray
Look out on a summer's day
With eyes that know the darkness in my soul
Now, I understand what you tried to say to me
And how you suffered for your sanity
And how you tried to set them free
They would not listen, they did not know how
Perhaps they'll listen now

I remember the first VanGogh painting I saw in my life is the starry Night in MOMA. At that Time, I felt a strong sense of directions, almost like a vector field of differential equations. Many years later, I was in museum, I heard a girl saying that if she could go back to meet any person in the history for 10 minutes. She would go back to tell Vincent, tell him : Don’t be sad, you are going to be one of the most remarkable artists in the history. Everyone loves you and treasures your painting. At that moment I am very touched. not only by Van Gogh’s painting but also by the kindness and the warm heart of that girl.

van <- filter(art, art$Name=="Vincent van Gogh")

ggplot(data=van, aes(x=Title, y=date1))+geom_bar(stat="identity", aes(fill=Medium))+ scale_fill_manual(values=c("Etching"="#204870", "Lithograph"="#F6ED76","Oil on canvas"="#182030")) +ggtitle("Van Gogh in MOMA") + xlab("Artwork Title") +ylab("Produciton year")+theme_minimal()+theme(axis.text.x = element_text(angle = 60, hjust = 1))+theme(legend.position = "bottom")  

Ability to Change: Mark Rothko

I love Rothko’s work for his power of ability to change, his earlier work is quite messy and dark in my opinion, later he transferred his work to have the sytle of warm and simple ! He inspired me to work on the change of my life, and it will succeed.

  1. 🎨 MOMA GANS Mix Style

  1. 💌 Summary & Reference:

[1] 猎罪图鉴 eposide 12, eposide 20

[2] 张法中西方美术史

[3] https://www.moma.org/collection/works/

[4] https://www.newyorker.com/magazine/2020/10/05/a-fuller-picture-of-artemisia-gentileschi

[5]