Visualization

Today’s class

  • A little bit about data visualization
  • Time to work on your plot or final project
  • This Friday: we will review a Bayesian model (optional)

Data visualization

  • My favorite part of stats/data analysis/ etc: It is fun

  • You can be creative! but be smart and ethical.

  • Plots are easy to “manipulate”

Misleading plots

Incorrect and misleading plots

  • Pie chart with total over 100

  • Avoid pie charts. Grouped barplots are potentially the best solution for “parts of a whole”. Stacked barplot can be challenging to distinguish.

Barplot

Barplot

Plotting

  • Estimate (point) and CI

  • Similar to a line and CI

Plots

  • Distribution

  • Correlation

  • Ranking

  • Part of a whole

  • Evolution (time series, line plot)

  • Map

  • Flows

Distribution plots

Distribution plots

Distribution plots

Distribution plots

Correlation

  • Scatterplot

  • Heatmap

  • Correlogram

  • Bubble

Scatterplot

Scatterplot

Scatterplot

Combining scatterplot and distribution

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Combining scatterplot and distribution

Combining scatterplot and distribution

Heatmap

Correlogram

   species tars1 tars2 head aede1 aede2 aede3
1 Concinna   191   131   53   150    15   104
2 Concinna   185   134   50   147    13   105
3 Concinna   200   137   52   144    14   102
4 Concinna   173   127   50   144    16    97
5 Concinna   171   118   49   153    13   106
6 Concinna   160   118   47   140    15    99
      species       tars1           tars2            head           aede1      
 Concinna :21   Min.   :122.0   Min.   :107.0   Min.   :43.00   Min.   :116.0  
 Heikert. :31   1st Qu.:148.0   1st Qu.:118.2   1st Qu.:49.00   1st Qu.:125.5  
 Heptapot.:22   Median :185.5   Median :123.0   Median :50.50   Median :136.5  
                Mean   :177.3   Mean   :124.0   Mean   :50.35   Mean   :134.8  
                3rd Qu.:198.2   3rd Qu.:130.0   3rd Qu.:52.00   3rd Qu.:142.8  
                Max.   :242.0   Max.   :146.0   Max.   :58.00   Max.   :157.0  
     aede2           aede3       
 Min.   : 8.00   Min.   : 55.00  
 1st Qu.:11.00   1st Qu.: 85.25  
 Median :14.00   Median : 98.50  
 Mean   :12.99   Mean   : 95.38  
 3rd Qu.:15.00   3rd Qu.:106.00  
 Max.   :16.00   Max.   :123.00  

Correlogram

Today

I have an activity for you to do… but before…

Why does my model plot looks like this???

Why do my plots look like this?

What is happening?

codmodel <- glm(Prevalence ~ Length+ Depth+ Weight + Area, 
                 data = cod, family = binomial(link="logit"))
  • Length (Continuous)

  • Weight (Continuous)

  • Depth (Continuous)

  • Area (Categorical)

Multiple continuous

Categorical –> Color, Line type, or grid (facet)

Continuous –> Issue… we only have one x axis! So… we have to plot them as categorical

Steps to solve this

  • Choose one continuous to represent in the x axis (Length). Write min and max for this

  • For all others obtain their 25th, 50th and 75th percentile

Steps to solve this

Choose one continuous to represent in the x axis (Length)

For all others obtain their 25th, 50th and 75th percentile

summary(cod$Depth)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   50.0   110.0   180.0   176.2   235.0   293.0 
summary(cod$Weight)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   34.0   765.5  1432.0  1704.3  2222.5  9990.0       6 
summary(cod$Length)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  17.00   44.00   54.00   53.45   62.00  101.00       6 

Depth: 110, 180, 235

Weight: 765, 1432, 2222

Length: 17 to 101

Create a new data frame

Usually when we use predict, we place the predicted value in the same dataframe

predictedmodel <- predict(codmodel,cod,se.fit=TRUE)
  Sample Intensity Prevalence Year Depth Weight Length Sex Stage Age Area   WL2
1      1         0          0 1999   220    148     26   0     0   0    2 19.24
2      2         0          0 1999   220    144     26   0     0   0    2 18.72
3      3         0          0 1999   220    146     27   0     0   0    2 19.71
4      4         0          0 1999   220    138     26   0     0   0    2 17.94
5      5         0          0 1999   220     40     17   0     0   0    2  3.40
6      6         0          0 1999   220     68     20   0     0   0    2  6.80
       fit2        lwr       upr
1 0.2102810 0.14528646 0.2943383
2 0.2106938 0.14566394 0.2947397
3 0.2214005 0.15589886 0.3044953
4 0.2113142 0.14623127 0.2953432
5 0.1374822 0.08001811 0.2260726
6 0.1597150 0.09888745 0.2476748
newdataframe<-expand.grid(Length=seq(17,101, by=5), Depth=c(110,180,235), Weight =c(765,1432,2222), Area = unique(cod$Area))

New dataframe with roups of interest!

predictedmodel <- predict(codmodel,newdataframe,se.fit=TRUE)
cod2 <-cod
newdataframe$fit2 <- plogis(predictedmodel$fit)
newdataframe$lwr <- plogis(predictedmodel$fit - 1.96*predictedmodel$se.fit)
newdataframe$upr <- plogis(predictedmodel$fit + 1.96*predictedmodel$se.fit)

head(newdataframe)
  Length Depth Weight Area       fit2        lwr        upr
1     17   110    765    2 0.03825608 0.01805071 0.07925324
2     22   110    765    2 0.05204935 0.02749554 0.09635740
3     27   110    765    2 0.07045148 0.04142088 0.11733814
4     32   110    765    2 0.09470970 0.06134546 0.14344682
5     37   110    765    2 0.12618675 0.08851427 0.17678323
6     42   110    765    2 0.16620464 0.12293428 0.22086986

PLOT!

ggplot(newdataframe,aes(x=Length,y=fit2, col=Area))+
#  geom_point(data=cod,aes(y=Prevalence))+
  
  geom_line()+
  geom_ribbon(alpha=0.15, aes(ymin=lwr,ymax=upr))+
  xlab("Length")+
  ylab("Prevalence")+
  theme_bw()+
  facet_grid(Depth~Weight)+
  scale_y_continuous(sec.axis = sec_axis(~ . , name = "Depth", breaks = NULL, labels = NULL)) +
  scale_x_continuous(sec.axis = sec_axis(~ . , name = "Weight", breaks = NULL, labels = NULL))

Today

Go to https://r-graph-gallery.com/ and play with plots. Look at them, read about what they do and you can replicate them… they provide the whole code! Then play with it (colors, labels, etc)

Alternative: finish your own plot from previous class

Alternative: Work on project