arte_MOMA <- read.csv2("C:/Estatistica/Base_de_dados-master/arte_MOMA.csv")
Quantas pinturas existem no MoMA? Quantas variáveis existem no banco de dados?
str(arte_MOMA)
## 'data.frame': 2253 obs. of 24 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ title : chr "Rope and People, I" "Fire in the Evening" "Portrait of an Equilibrist" "Guitar" ...
## $ artist : chr "Joan Miró" "Paul Klee" "Paul Klee" "Pablo Picasso" ...
## $ artist_bio : chr "(Spanish, 1893-1983)" "(German, born Switzerland. 1879-1940)" "(German, born Switzerland. 1879-1940)" "(Spanish, 1881-1973)" ...
## $ artist_birth_year: int 1893 1879 1879 1881 1880 1879 1943 1880 1839 1894 ...
## $ artist_death_year: int 1983 1940 1940 1973 1946 1953 1977 1950 1906 1956 ...
## $ num_artists : int 1 1 1 1 1 1 1 1 1 1 ...
## $ n_female_artists : int 0 0 0 0 0 0 0 0 0 0 ...
## $ n_male_artists : int 1 1 1 1 1 1 1 1 1 1 ...
## $ artist_gender : chr "Male" "Male" "Male" "Male" ...
## $ year_acquired : int 1936 1970 1966 1955 1939 1968 1997 1931 1934 1941 ...
## $ year_created : int 1935 1929 1927 1919 1925 1919 1970 1929 1885 1930 ...
## $ circumference_cm : logi NA NA NA NA NA NA ...
## $ depth_cm : num NA NA NA NA NA NA NA NA NA NA ...
## $ diameter_cm : logi NA NA NA NA NA NA ...
## $ height_cm : num 104.8 33.8 60.3 215.9 50.8 ...
## $ length_cm : logi NA NA NA NA NA NA ...
## $ width_cm : num 74.6 33.3 36.8 78.7 54 ...
## $ seat_height_cm : logi NA NA NA NA NA NA ...
## $ purchase : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ gift : logi TRUE FALSE FALSE TRUE TRUE FALSE ...
## $ exchange : logi FALSE FALSE FALSE FALSE TRUE FALSE ...
## $ classification : chr "Painting" "Painting" "Painting" "Painting" ...
## $ department : chr "Painting & Sculpture" "Painting & Sculpture" "Painting & Sculpture" "Painting & Sculpture" ...
R: Existem 2253 pinturas e um total de 24 variáveis.
Qual é a primeira pintura adquirida pelo MoMA? Qual ano? Qual artista? Qual título?
R:House by the Railroad, no ano de 1930, do artista Edward Hopper.
Qual é a pintura mais antiga da coleção? Qual ano? Qual artista? Qual título?
R: Landscape at Daybreak, do ano de 1872, do artista Odilon Redon
Quantos artistas distintos existem?
arte_MOMA$artist <- as.factor(arte_MOMA$artist)
str(arte_MOMA$artist)
## Factor w/ 989 levels "A. E. Gallatin",..: 452 728 728 712 94 278 131 752 723 244 ...
R: 989 artistas
Qual artista tem mais pinturas na coleção?
library(dlookr)
## Loading required package: mice
##
## Attaching package: 'mice'
## The following objects are masked from 'package:base':
##
## cbind, rbind
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'dlookr'
## The following object is masked from 'package:base':
##
## transform
diagnose(arte_MOMA)
## # A tibble: 24 x 6
## variables types missing_count missing_percent unique_count unique_rate
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 X integer 0 0 2253 1
## 2 title charac~ 0 0 2015 0.894
## 3 artist factor 0 0 989 0.439
## 4 artist_bio charac~ 1 0.0444 859 0.381
## 5 artist_birth_~ integer 6 0.266 132 0.0586
## 6 artist_death_~ integer 629 27.9 102 0.0453
## 7 num_artists integer 1 0.0444 5 0.00222
## 8 n_female_arti~ integer 0 0 3 0.00133
## 9 n_male_artists integer 0 0 5 0.00222
## 10 artist_gender charac~ 10 0.444 3 0.00133
## # ... with 14 more rows
diagnose_category(arte_MOMA, artist)
## # A tibble: 10 x 6
## variables levels N freq ratio rank
## * <chr> <fct> <int> <int> <dbl> <int>
## 1 artist Pablo Picasso 2253 55 2.44 1
## 2 artist Henri Matisse 2253 32 1.42 2
## 3 artist On Kawara 2253 32 1.42 3
## 4 artist Jacob Lawrence 2253 30 1.33 4
## 5 artist Batiste Madalena 2253 25 1.11 5
## 6 artist Jean Dubuffet 2253 25 1.11 6
## 7 artist Odilon Redon 2253 25 1.11 7
## 8 artist Ben Vautier 2253 24 1.07 8
## 9 artist Frank Stella 2253 23 1.02 9
## 10 artist Philip Guston 2253 23 1.02 10
R: Pablo Picasso, com 55 pinturas
Quantas pinturas existem por este artista?
R: 55 pinturas
Quantas pinturas de artistas masculinos e femininos?
diagnose(arte_MOMA)
## # A tibble: 24 x 6
## variables types missing_count missing_percent unique_count unique_rate
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 X integer 0 0 2253 1
## 2 title charac~ 0 0 2015 0.894
## 3 artist factor 0 0 989 0.439
## 4 artist_bio charac~ 1 0.0444 859 0.381
## 5 artist_birth_~ integer 6 0.266 132 0.0586
## 6 artist_death_~ integer 629 27.9 102 0.0453
## 7 num_artists integer 1 0.0444 5 0.00222
## 8 n_female_arti~ integer 0 0 3 0.00133
## 9 n_male_artists integer 0 0 5 0.00222
## 10 artist_gender charac~ 10 0.444 3 0.00133
## # ... with 14 more rows
diagnose_category(arte_MOMA, artist_gender)
## # A tibble: 3 x 6
## variables levels N freq ratio rank
## * <chr> <chr> <int> <int> <dbl> <int>
## 1 artist_gender Male 2253 1991 88.4 1
## 2 artist_gender Female 2253 252 11.2 2
## 3 artist_gender <NA> 2253 10 0.444 3
R: São 1991 artistas do gênero masculino, 252 de feminino e 10 obras sem gênero especificado.
Quantos artistas de cada gênero existem?
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
xxx<- arte_MOMA
xxx %>%
count(artist_gender, artist) %>%
count(artist_gender) %>%
mutate(n = as.character(paste(n, "art"))) %>%
table()
## n
## artist_gender 143 art 837 art 9 art
## Female 1 0 0
## Male 0 1 0
R: São 837 artistas do genero masculino e 143 artistas do gênero feminino
Em que ano foram adquiridas mais pinturas?
arte_MOMA$year_acquired <- as.factor(arte_MOMA$year_acquired)
diagnose(arte_MOMA)
## # A tibble: 24 x 6
## variables types missing_count missing_percent unique_count unique_rate
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 X integer 0 0 2253 1
## 2 title charac~ 0 0 2015 0.894
## 3 artist factor 0 0 989 0.439
## 4 artist_bio charac~ 1 0.0444 859 0.381
## 5 artist_birth_~ integer 6 0.266 132 0.0586
## 6 artist_death_~ integer 629 27.9 102 0.0453
## 7 num_artists integer 1 0.0444 5 0.00222
## 8 n_female_arti~ integer 0 0 3 0.00133
## 9 n_male_artists integer 0 0 5 0.00222
## 10 artist_gender charac~ 10 0.444 3 0.00133
## # ... with 14 more rows
diagnose_category(arte_MOMA, year_acquired)
## # A tibble: 10 x 6
## variables levels N freq ratio rank
## * <chr> <fct> <int> <int> <dbl> <int>
## 1 year_acquired 1985 2253 86 3.82 1
## 2 year_acquired 1942 2253 71 3.15 2
## 3 year_acquired 1979 2253 71 3.15 3
## 4 year_acquired 1991 2253 67 2.97 4
## 5 year_acquired 2005 2253 67 2.97 5
## 6 year_acquired 1967 2253 65 2.89 6
## 7 year_acquired 2008 2253 55 2.44 7
## 8 year_acquired 1961 2253 45 2.00 8
## 9 year_acquired 1969 2253 45 2.00 9
## 10 year_acquired 1956 2253 42 1.86 10
R: No ano de 1985, com um total de 86 pinturas.
Em que ano foram Criadas mais pinturas?
arte_MOMA$year_created <- as.factor(arte_MOMA$year_created)
diagnose(arte_MOMA)
## # A tibble: 24 x 6
## variables types missing_count missing_percent unique_count unique_rate
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 X integer 0 0 2253 1
## 2 title charac~ 0 0 2015 0.894
## 3 artist factor 0 0 989 0.439
## 4 artist_bio charac~ 1 0.0444 859 0.381
## 5 artist_birth_~ integer 6 0.266 132 0.0586
## 6 artist_death_~ integer 629 27.9 102 0.0453
## 7 num_artists integer 1 0.0444 5 0.00222
## 8 n_female_arti~ integer 0 0 3 0.00133
## 9 n_male_artists integer 0 0 5 0.00222
## 10 artist_gender charac~ 10 0.444 3 0.00133
## # ... with 14 more rows
diagnose_category(arte_MOMA, year_created)
## # A tibble: 11 x 6
## variables levels N freq ratio rank
## * <chr> <fct> <int> <int> <dbl> <int>
## 1 year_created 1977 2253 57 2.53 1
## 2 year_created 1940 2253 56 2.49 2
## 3 year_created 1964 2253 56 2.49 3
## 4 year_created 1961 2253 50 2.22 4
## 5 year_created 1962 2253 49 2.17 5
## 6 year_created 1963 2253 44 1.95 6
## 7 year_created 1959 2253 42 1.86 7
## 8 year_created 1968 2253 40 1.78 8
## 9 year_created 1960 2253 39 1.73 9
## 10 year_created 1914 2253 37 1.64 10
## 11 year_created 1950 2253 37 1.64 11
R: No ano de 1977, com um total de 57 pinturas.