Katarzyna Smoter

kierunek: geoinformacja

wydział: Geodecji Górniczej i Inżynierii Środowiska


1. Dodanie odpowiednich bibliotek

Do dalszej pracy na danych w R dodałam odpowiednie biblioteki.

library(ggplot2)
library(tidyverse)
library(Hmisc)
library(pastecs)
library(psych)
library(doBy)
library(sm)
library(ggpubr)

2. Podstawowe operacje na danych

Do przeprowadzenia dalszych analiz wybrałam dane “diamonds” z pakietu ggplot. W następnym kroku postanowiłam sprawdzić jakie kolumny zawiera moja tabela “diamonds”. Następnie wybrałam kolumny zawierające wartości tekstowe, i sprawdziłam co zawierają i w jakich ilociach. Ustawiłam także kolumne “cut” jako kolumne główną.

str(diamonds)
## tibble [53,940 x 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
names(diamonds)
##  [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
##  [8] "x"       "y"       "z"
unique(diamonds$cut)
## [1] Ideal     Premium   Good      Very Good Fair     
## Levels: Fair < Good < Very Good < Premium < Ideal
table(diamonds$cut)
## 
##      Fair      Good Very Good   Premium     Ideal 
##      1610      4906     12082     13791     21551
head(diamonds, 2)
## # A tibble: 2 x 10
##   carat cut     color clarity depth table price     x     y     z
##   <dbl> <ord>   <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal   E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium E     SI1      59.8    61   326  3.89  3.84  2.31
unique(diamonds$color)
## [1] E I J H F G D
## Levels: D < E < F < G < H < I < J
table(diamonds$color)
## 
##     D     E     F     G     H     I     J 
##  6775  9797  9542 11292  8304  5422  2808
unique(diamonds$clarity)
## [1] SI2  SI1  VS1  VS2  VVS2 VVS1 I1   IF  
## Levels: I1 < SI2 < SI1 < VS2 < VS1 < VVS2 < VVS1 < IF
table(diamonds$clarity)
## 
##    I1   SI2   SI1   VS2   VS1  VVS2  VVS1    IF 
##   741  9194 13065 12258  8171  5066  3655  1790

3. Dodawanie kolumny

Następnie dodałam nową kolumne zawierającą wartości “kod1”, “kod2” na przemian, zmieniłam także typ kolumny na factor.

diamonds$kod=rep(c("kod1", "kod2"), 26970)
diamonds$kod=as.factor(diamonds$kod)

4. Podstawowe statystyki

Zrobiłam także podstawowe statystyki opisowe.

summary(diamonds)
##      carat               cut        color        clarity          depth      
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065   Min.   :43.00  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258   1st Qu.:61.00  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194   Median :61.80  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171   Mean   :61.75  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066   3rd Qu.:62.50  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655   Max.   :79.00  
##                                     J: 2808   (Other): 2531                  
##      table           price             x                y         
##  Min.   :43.00   Min.   :  326   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710   1st Qu.: 4.720  
##  Median :57.00   Median : 2401   Median : 5.700   Median : 5.710  
##  Mean   :57.46   Mean   : 3933   Mean   : 5.731   Mean   : 5.735  
##  3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540   3rd Qu.: 6.540  
##  Max.   :95.00   Max.   :18823   Max.   :10.740   Max.   :58.900  
##                                                                   
##        z            kod       
##  Min.   : 0.000   kod1:26970  
##  1st Qu.: 2.910   kod2:26970  
##  Median : 3.530               
##  Mean   : 3.539               
##  3rd Qu.: 4.040               
##  Max.   :31.800               
## 

5. Tworzenie wykresów

Utworzyłam także wykresy od wszystkich kolumn.

length(diamonds)
## [1] 11
nazwa=names(diamonds)
par(mfrow=c(4,3))
for(i in 1:length(diamonds)){
  plot(diamonds[,i],ylab=nazwa[i])
}

par(mfrow=c(1,1))


6. Sapply i describe

Utworzyłam statystyki za pomocą sapply i describe.

sapply(diamonds, mean, na.rm=TRUE)
##        carat          cut        color      clarity        depth        table 
##    0.7979397           NA           NA           NA   61.7494049   57.4571839 
##        price            x            y            z          kod 
## 3932.7997219    5.7311572    5.7345260    3.5387338           NA
describe(diamonds)
##          vars     n    mean      sd  median trimmed     mad   min      max
## carat       1 53940    0.80    0.47    0.70    0.73    0.47   0.2     5.01
## cut*        2 53940    3.90    1.12    4.00    4.04    1.48   1.0     5.00
## color*      3 53940    3.59    1.70    4.00    3.55    1.48   1.0     7.00
## clarity*    4 53940    4.05    1.65    4.00    3.91    1.48   1.0     8.00
## depth       5 53940   61.75    1.43   61.80   61.78    1.04  43.0    79.00
## table       6 53940   57.46    2.23   57.00   57.32    1.48  43.0    95.00
## price       7 53940 3932.80 3989.44 2401.00 3158.99 2475.94 326.0 18823.00
## x           8 53940    5.73    1.12    5.70    5.66    1.38   0.0    10.74
## y           9 53940    5.73    1.14    5.71    5.66    1.36   0.0    58.90
## z          10 53940    3.54    0.71    3.53    3.49    0.85   0.0    31.80
## kod*       11 53940    1.50    0.50    1.50    1.50    0.74   1.0     2.00
##             range  skew kurtosis    se
## carat        4.81  1.12     1.26  0.00
## cut*         4.00 -0.72    -0.40  0.00
## color*       6.00  0.19    -0.87  0.01
## clarity*     7.00  0.55    -0.39  0.01
## depth       36.00 -0.08     5.74  0.01
## table       52.00  0.80     2.80  0.01
## price    18497.00  1.62     2.18 17.18
## x           10.74  0.38    -0.62  0.00
## y           58.90  2.43    91.20  0.00
## z           31.80  1.52    47.08  0.00
## kod*         1.00  0.00    -2.00  0.00
stat.desc(diamonds)
##                     carat cut color clarity        depth        table
## nbr.val      5.394000e+04  NA    NA      NA 5.394000e+04 5.394000e+04
## nbr.null     0.000000e+00  NA    NA      NA 0.000000e+00 0.000000e+00
## nbr.na       0.000000e+00  NA    NA      NA 0.000000e+00 0.000000e+00
## min          2.000000e-01  NA    NA      NA 4.300000e+01 4.300000e+01
## max          5.010000e+00  NA    NA      NA 7.900000e+01 9.500000e+01
## range        4.810000e+00  NA    NA      NA 3.600000e+01 5.200000e+01
## sum          4.304087e+04  NA    NA      NA 3.330763e+06 3.099241e+06
## median       7.000000e-01  NA    NA      NA 6.180000e+01 5.700000e+01
## mean         7.979397e-01  NA    NA      NA 6.174940e+01 5.745718e+01
## SE.mean      2.040954e-03  NA    NA      NA 6.168448e-03 9.621063e-03
## CI.mean.0.95 4.000286e-03  NA    NA      NA 1.209021e-02 1.885736e-02
## var          2.246867e-01  NA    NA      NA 2.052404e+00 4.992948e+00
## std.dev      4.740112e-01  NA    NA      NA 1.432621e+00 2.234491e+00
## coef.var     5.940439e-01  NA    NA      NA 2.320057e-02 3.888966e-02
##                     price            x            y            z kod
## nbr.val      5.394000e+04 5.394000e+04 5.394000e+04 5.394000e+04  NA
## nbr.null     0.000000e+00 8.000000e+00 7.000000e+00 2.000000e+01  NA
## nbr.na       0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00  NA
## min          3.260000e+02 0.000000e+00 0.000000e+00 0.000000e+00  NA
## max          1.882300e+04 1.074000e+01 5.890000e+01 3.180000e+01  NA
## range        1.849700e+04 1.074000e+01 5.890000e+01 3.180000e+01  NA
## sum          2.121352e+08 3.091386e+05 3.093203e+05 1.908793e+05  NA
## median       2.401000e+03 5.700000e+00 5.710000e+00 3.530000e+00  NA
## mean         3.932800e+03 5.731157e+00 5.734526e+00 3.538734e+00  NA
## SE.mean      1.717736e+01 4.829974e-03 4.917698e-03 3.038533e-03  NA
## CI.mean.0.95 3.366776e+01 9.466787e-03 9.638727e-03 5.955549e-03  NA
## var          1.591563e+07 1.258347e+00 1.304472e+00 4.980109e-01  NA
## std.dev      3.989440e+03 1.121761e+00 1.142135e+00 7.056988e-01  NA
## coef.var     1.014402e+00 1.957302e-01 1.991681e-01 1.994213e-01  NA
describe(diamonds)
##          vars     n    mean      sd  median trimmed     mad   min      max
## carat       1 53940    0.80    0.47    0.70    0.73    0.47   0.2     5.01
## cut*        2 53940    3.90    1.12    4.00    4.04    1.48   1.0     5.00
## color*      3 53940    3.59    1.70    4.00    3.55    1.48   1.0     7.00
## clarity*    4 53940    4.05    1.65    4.00    3.91    1.48   1.0     8.00
## depth       5 53940   61.75    1.43   61.80   61.78    1.04  43.0    79.00
## table       6 53940   57.46    2.23   57.00   57.32    1.48  43.0    95.00
## price       7 53940 3932.80 3989.44 2401.00 3158.99 2475.94 326.0 18823.00
## x           8 53940    5.73    1.12    5.70    5.66    1.38   0.0    10.74
## y           9 53940    5.73    1.14    5.71    5.66    1.36   0.0    58.90
## z          10 53940    3.54    0.71    3.53    3.49    0.85   0.0    31.80
## kod*       11 53940    1.50    0.50    1.50    1.50    0.74   1.0     2.00
##             range  skew kurtosis    se
## carat        4.81  1.12     1.26  0.00
## cut*         4.00 -0.72    -0.40  0.00
## color*       6.00  0.19    -0.87  0.01
## clarity*     7.00  0.55    -0.39  0.01
## depth       36.00 -0.08     5.74  0.01
## table       52.00  0.80     2.80  0.01
## price    18497.00  1.62     2.18 17.18
## x           10.74  0.38    -0.62  0.00
## y           58.90  2.43    91.20  0.00
## z           31.80  1.52    47.08  0.00
## kod*         1.00  0.00    -2.00  0.00
describeBy(diamonds, diamonds$kod)
## 
##  Descriptive statistics by group 
## group: kod1
##          vars     n    mean      sd  median trimmed     mad   min      max
## carat       1 26970    0.80    0.48    0.70    0.74    0.47   0.2     4.50
## cut*        2 26970    3.90    1.12    4.00    4.04    1.48   1.0     5.00
## color*      3 26970    3.58    1.70    4.00    3.54    1.48   1.0     7.00
## clarity*    4 26970    4.04    1.65    4.00    3.91    1.48   1.0     8.00
## depth       5 26970   61.75    1.43   61.80   61.79    1.04  43.0    79.00
## table       6 26970   57.45    2.23   57.00   57.31    1.48  43.0    95.00
## price       7 26970 3932.63 3989.23 2401.00 3158.89 2475.94 326.0 18818.00
## x           8 26970    5.73    1.12    5.70    5.66    1.36   0.0    10.23
## y           9 26970    5.73    1.11    5.71    5.66    1.36   0.0    10.16
## z          10 26970    3.54    0.72    3.53    3.50    0.85   0.0    31.80
## kod*       11 26970    1.00    0.00    1.00    1.00    0.00   1.0     1.00
##             range  skew kurtosis    se
## carat        4.30  1.13     1.34  0.00
## cut*         4.00 -0.71    -0.40  0.01
## color*       6.00  0.19    -0.87  0.01
## clarity*     7.00  0.55    -0.39  0.01
## depth       36.00  0.05     5.54  0.01
## table       52.00  0.86     4.12  0.01
## price    18492.00  1.62     2.18 24.29
## x           10.23  0.39    -0.62  0.01
## y           10.16  0.39    -0.65  0.01
## z           31.80  2.61    89.38  0.00
## kod*         0.00   NaN      NaN  0.00
## ------------------------------------------------------------ 
## group: kod2
##          vars     n    mean      sd  median trimmed     mad   min      max
## carat       1 26970    0.80    0.47    0.70    0.73    0.47   0.2     5.01
## cut*        2 26970    3.91    1.12    4.00    4.04    1.48   1.0     5.00
## color*      3 26970    3.60    1.70    4.00    3.56    1.48   1.0     7.00
## clarity*    4 26970    4.06    1.65    4.00    3.92    1.48   1.0     8.00
## depth       5 26970   61.74    1.43   61.80   61.78    1.04  43.0    79.00
## table       6 26970   57.46    2.23   57.00   57.32    1.48  44.0    79.00
## price       7 26970 3932.97 3989.72 2401.00 3159.10 2475.94 326.0 18823.00
## x           8 26970    5.73    1.12    5.69    5.66    1.36   0.0    10.74
## y           9 26970    5.73    1.17    5.71    5.66    1.36   0.0    58.90
## z          10 26970    3.54    0.70    3.52    3.49    0.85   0.0     8.06
## kod*       11 26970    2.00    0.00    2.00    2.00    0.00   2.0     2.00
##             range  skew kurtosis    se
## carat        4.81  1.10     1.17  0.00
## cut*         4.00 -0.72    -0.39  0.01
## color*       6.00  0.19    -0.87  0.01
## clarity*     7.00  0.55    -0.40  0.01
## depth       36.00 -0.21     5.93  0.01
## table       35.00  0.73     1.49  0.01
## price    18497.00  1.62     2.18 24.29
## x           10.74  0.37    -0.62  0.01
## y           58.90  4.20   166.19  0.01
## z            8.06  0.33    -0.38  0.00
## kod*         0.00   NaN      NaN  0.00
describeBy(diamonds, diamonds$kod, diamonds$cut)
##           item group1 vars     n         mean           sd  median      trimmed
## carat1       1   kod1    1 26970    0.7984854    0.4751665    0.70    0.7351863
## carat2       2   kod2    1 26970    0.7973941    0.4728613    0.70    0.7347446
## cut*1        3   kod1    2 26970    3.9030033    1.1165241    4.00    4.0410178
## cut*2        4   kod2    2 26970    3.9051910    1.1166954    4.00    4.0438450
## color*1      5   kod1    3 26970    3.5845384    1.7024558    4.00    3.5409251
## color*2      6   kod2    3 26970    3.6038561    1.6997294    4.00    3.5644234
## clarity*1    7   kod1    4 26970    4.0446793    1.6461309    4.00    3.9078142
## clarity*2    8   kod2    4 26970    4.0573600    1.6481468    4.00    3.9217649
## depth1       9   kod1    5 26970   61.7541861    1.4316223   61.80   61.7865406
## depth2      10   kod2    5 26970   61.7446237    1.4336302   61.80   61.7826752
## table1      11   kod1    6 26970   57.4534557    2.2340739   57.00   57.3145532
## table2      12   kod2    6 26970   57.4609121    2.2349424   57.00   57.3221496
## price1      13   kod1    7 26970 3932.6286244 3989.2349561 2401.00 3158.8863089
## price2      14   kod2    7 26970 3932.9708194 3989.7184613 2401.00 3159.0983964
## x1          15   kod1    8 26970    5.7324138    1.1224196    5.70    5.6608282
## x2          16   kod2    8 26970    5.7299006    1.1211209    5.69    5.6593252
## y1          17   kod1    9 26970    5.7343367    1.1136608    5.71    5.6633060
## y2          18   kod2    9 26970    5.7347152    1.1699364    5.71    5.6621825
## z1          19   kod1   10 26970    3.5403089    0.7155786    3.53    3.4955223
## z2          20   kod2   10 26970    3.5371587    0.6956885    3.52    3.4942445
## kod*1       21   kod1   11 26970    1.0000000    0.0000000    1.00    1.0000000
## kod*2       22   kod2   11 26970    2.0000000    0.0000000    2.00    2.0000000
##                   mad   min      max    range        skew    kurtosis
## carat1       0.474432   0.2     4.50     4.30  1.13331065   1.3448172
## carat2       0.474432   0.2     5.01     4.81  1.09949928   1.1651945
## cut*1        1.482600   1.0     5.00     4.00 -0.71355322  -0.4033855
## cut*2        1.482600   1.0     5.00     4.00 -0.72068935  -0.3930713
## color*1      1.482600   1.0     7.00     6.00  0.19356737  -0.8673174
## color*2      1.482600   1.0     7.00     6.00  0.18518525  -0.8664942
## clarity*1    1.482600   1.0     8.00     7.00  0.54995798  -0.3887358
## clarity*2    1.482600   1.0     8.00     7.00  0.55281547  -0.4014077
## depth1       1.037820  43.0    79.00    36.00  0.04580134   5.5376313
## depth2       1.037820  43.0    79.00    36.00 -0.20981285   5.9343078
## table1       1.482600  43.0    95.00    52.00  0.86153647   4.1178222
## table2       1.482600  44.0    79.00    35.00  0.73220034   1.4872204
## price1    2475.942000 326.0 18818.00 18492.00  1.61820156   2.1767756
## price2    2475.942000 326.0 18823.00 18497.00  1.61831892   2.1772216
## x1           1.363992   0.0    10.23    10.23  0.38772844  -0.6217964
## x2           1.363992   0.0    10.74    10.74  0.36952271  -0.6150674
## y1           1.363992   0.0    10.16    10.16  0.38702787  -0.6525155
## y2           1.363992   0.0    58.90    58.90  4.19533118 166.1948031
## z1           0.845082   0.0    31.80    31.80  2.61212158  89.3848321
## z2           0.845082   0.0     8.06     8.06  0.33497846  -0.3836541
## kod*1        0.000000   1.0     1.00     0.00         NaN         NaN
## kod*2        0.000000   2.0     2.00     0.00         NaN         NaN
##                     se
## carat1     0.002893379
## carat2     0.002879342
## cut*1      0.006798727
## cut*2      0.006799770
## color*1    0.010366577
## color*2    0.010349975
## clarity*1  0.010023604
## clarity*2  0.010035879
## depth1     0.008717420
## depth2     0.008729647
## table1     0.013603700
## table2     0.013608989
## price1    24.291209674
## price2    24.294153830
## x1         0.006834626
## x2         0.006826718
## y1         0.006781292
## y2         0.007123965
## z1         0.004357294
## z2         0.004236179
## kod*1      0.000000000
## kod*2      0.000000000

7. Summary

Utworzyłam statystyki za pomocą summary.

summaryBy(depth~cut, data=diamonds,
          FUN=function(x){c(m=mean(x), s=sd(x))})
## # A tibble: 5 x 3
##   cut       depth.m depth.s
##   <ord>       <dbl>   <dbl>
## 1 Fair         64.0   3.64 
## 2 Good         62.4   2.17 
## 3 Very Good    61.8   1.38 
## 4 Premium      61.3   1.16 
## 5 Ideal        61.7   0.719
summaryBy(depth~cut+kod, data=diamonds,
          FUN=function(x){c(m=mean(x), s=sd(x))})
## # A tibble: 10 x 4
##    cut       kod   depth.m depth.s
##    <ord>     <fct>   <dbl>   <dbl>
##  1 Fair      kod1     64.2   3.54 
##  2 Fair      kod2     63.9   3.74 
##  3 Good      kod1     62.4   2.16 
##  4 Good      kod2     62.4   2.18 
##  5 Very Good kod1     61.8   1.39 
##  6 Very Good kod2     61.8   1.37 
##  7 Premium   kod1     61.3   1.16 
##  8 Premium   kod2     61.3   1.16 
##  9 Ideal     kod1     61.7   0.708
## 10 Ideal     kod2     61.7   0.729
summaryBy(depth+table~cut+kod, data=diamonds,
                    FUN=function(x){c(m=mean(x), s=sd(x))})
## # A tibble: 10 x 6
##    cut       kod   depth.m depth.s table.m table.s
##    <ord>     <fct>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Fair      kod1     64.2   3.54     58.9    4.04
##  2 Fair      kod2     63.9   3.74     59.2    3.85
##  3 Good      kod1     62.4   2.16     58.6    2.85
##  4 Good      kod2     62.4   2.18     58.8    2.85
##  5 Very Good kod1     61.8   1.39     58.0    2.14
##  6 Very Good kod2     61.8   1.37     57.9    2.10
##  7 Premium   kod1     61.3   1.16     58.7    1.48
##  8 Premium   kod2     61.3   1.16     58.7    1.48
##  9 Ideal     kod1     61.7   0.708    56.0    1.25
## 10 Ideal     kod2     61.7   0.729    55.9    1.24
summaryBy(.~cut+kod, data=diamonds,
          FUN=function(x){c(m=mean(x), s=sd(x))})
## # A tibble: 10 x 16
##    cut   kod   carat.m carat.s depth.m depth.s table.m table.s price.m price.s
##    <ord> <fct>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Fair  kod1    1.06    0.519    64.2   3.54     58.9    4.04   4450.   3626.
##  2 Fair  kod2    1.04    0.514    63.9   3.74     59.2    3.85   4268.   3494.
##  3 Good  kod1    0.845   0.454    62.4   2.16     58.6    2.85   3927.   3717.
##  4 Good  kod2    0.853   0.454    62.4   2.18     58.8    2.85   3931.   3646.
##  5 Very~ kod1    0.802   0.458    61.8   1.39     58.0    2.14   3946.   3903.
##  6 Very~ kod2    0.810   0.461    61.8   1.37     57.9    2.10   4018.   3969.
##  7 Prem~ kod1    0.893   0.515    61.3   1.16     58.7    1.48   4570.   4323.
##  8 Prem~ kod2    0.891   0.516    61.3   1.16     58.7    1.48   4599.   4375.
##  9 Ideal kod1    0.706   0.437    61.7   0.708    56.0    1.25   3481.   3838.
## 10 Ideal kod2    0.699   0.428    61.7   0.729    55.9    1.24   3434.   3778.
## # ... with 6 more variables: x.m <dbl>, x.s <dbl>, y.m <dbl>, y.s <dbl>,
## #   z.m <dbl>, z.s <dbl>

8. Histogramy

Utworzyłam także histogramy.

hist(diamonds$carat,
     main="Histogram ilosci diamentow o danym karacie",
     xlab="karat", ylab="ilosc")

x=diamonds$carat
h=hist(x,breaks=seq(0,5.5,0.1))
xfit <- seq(min(x), max(x), length=150)
yfit <- dnorm(xfit, mean=mean(x), sd=sd(x))
yfit <- yfit*diff(h$mids[1:2])*length(x)
lines(xfit, yfit, col="blue", lwd=2)


9. Estymator jądrowy gęstości (ang. Kernel Density Plot)

Utworzyłam estymator jądrowy gęstości.

d <- density(diamonds$carat)
plot(d)

library(sm)
sm.density.compare(diamonds$carat, diamonds$cut, xlab="carat")
cyl.f <-diamonds$cut
title(main="karat po rodzaju ciecia")
colfill <- c(2:(2+length(levels(cyl.f))))
legend(4,1.5, levels(cyl.f), fill=colfill)


11. GGboxplot

Utworzyłam boxploty.

ggboxplot(diamonds, y="carat", width=0.5, ylab="karat")

ggboxplot(diamonds, y="carat", x="cut", color="cut", width=0.5, ylab="karat")


12. Gghistogram

Utworzyłam histogramy.

gghistogram(diamonds, x="carat", add="mean")

gghistogram(diamonds, x="carat", bins=9, add="mean")

gghistogram(diamonds, x="carat", add="mean", fill="lightgrey")

gghistogram(diamonds, x="carat", add="mean", fill="lightgrey", rug=T)

gghistogram(diamonds, x = "carat", add ="mean", rug=T, color = "cut",
            palette = c("#00AFBB", "#E7B800","red", "pink", "purple"))

gghistogram(diamonds, x = "carat", add = "mean",rug=T,color = "cut",
            fill="cut", palette = c("#00AFBB", "#E7B800","red", "pink", "purple"))

gghistogram(diamonds, x = "carat", add = "mean",rug=T,color = "cut",
            fill="cut", palette = c("#00AFBB", "#E7B800","red", "pink", "purple"),add_density = TRUE)


13. Wykresy skumulowane

Utworzyłam wykresy skumulowane.

ggecdf(diamonds, x="carat")


14. Wykres kwantyl-kwantyl QQ plots

Utworzyłam kwantyl-kwantyl.

ggqqplot(diamonds, x=="carat")