Visualization

Today’s class

A little bit about data visualization
Time to work on your plot or final project
This Friday: we will review a Bayesian model (optional)

Data visualization

My favorite part of stats/data analysis/ etc: It is fun
You can be creative! but be smart and ethical.
Plots are easy to “manipulate”

Misleading plots

Incorrect and misleading plots

Pie chart with total over 100
Avoid pie charts. Grouped barplots are potentially the best solution for “parts of a whole”. Stacked barplot can be challenging to distinguish.

Barplot

Plotting

Estimate (point) and CI
Similar to a line and CI

Plots

Distribution
Correlation
Ranking
Part of a whole
Evolution (time series, line plot)
Map
Flows

Distribution plots

Distribution plots

Correlation

Scatterplot
Heatmap
Correlogram
Bubble

Scatterplot

Combining scatterplot and distribution

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Combining scatterplot and distribution

Heatmap

Correlogram

   species tars1 tars2 head aede1 aede2 aede3
1 Concinna   191   131   53   150    15   104
2 Concinna   185   134   50   147    13   105
3 Concinna   200   137   52   144    14   102
4 Concinna   173   127   50   144    16    97
5 Concinna   171   118   49   153    13   106
6 Concinna   160   118   47   140    15    99

      species       tars1           tars2            head           aede1      
 Concinna :21   Min.   :122.0   Min.   :107.0   Min.   :43.00   Min.   :116.0  
 Heikert. :31   1st Qu.:148.0   1st Qu.:118.2   1st Qu.:49.00   1st Qu.:125.5  
 Heptapot.:22   Median :185.5   Median :123.0   Median :50.50   Median :136.5  
                Mean   :177.3   Mean   :124.0   Mean   :50.35   Mean   :134.8  
                3rd Qu.:198.2   3rd Qu.:130.0   3rd Qu.:52.00   3rd Qu.:142.8  
                Max.   :242.0   Max.   :146.0   Max.   :58.00   Max.   :157.0  
     aede2           aede3       
 Min.   : 8.00   Min.   : 55.00  
 1st Qu.:11.00   1st Qu.: 85.25  
 Median :14.00   Median : 98.50  
 Mean   :12.99   Mean   : 95.38  
 3rd Qu.:15.00   3rd Qu.:106.00  
 Max.   :16.00   Max.   :123.00

Correlogram

Today

I have an activity for you to do… but before…

Why does my model plot looks like this???

Why do my plots look like this?

What is happening?

codmodel <- glm(Prevalence ~ Length+ Depth+ Weight + Area, 
                 data = cod, family = binomial(link="logit"))

Length (Continuous)
Weight (Continuous)
Depth (Continuous)
Area (Categorical)

Multiple continuous

Categorical –> Color, Line type, or grid (facet)

Continuous –> Issue… we only have one x axis! So… we have to plot them as categorical

Steps to solve this

Choose one continuous to represent in the x axis (Length). Write min and max for this
For all others obtain their 25th, 50th and 75th percentile

Steps to solve this

Choose one continuous to represent in the x axis (Length)

For all others obtain their 25th, 50th and 75th percentile

summary(cod$Depth)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   50.0   110.0   180.0   176.2   235.0   293.0

summary(cod$Weight)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   34.0   765.5  1432.0  1704.3  2222.5  9990.0       6

summary(cod$Length)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  17.00   44.00   54.00   53.45   62.00  101.00       6

Depth: 110, 180, 235

Weight: 765, 1432, 2222

Length: 17 to 101

Create a new data frame

Usually when we use predict, we place the predicted value in the same dataframe

predictedmodel <- predict(codmodel,cod,se.fit=TRUE)

  Sample Intensity Prevalence Year Depth Weight Length Sex Stage Age Area   WL2
1      1         0          0 1999   220    148     26   0     0   0    2 19.24
2      2         0          0 1999   220    144     26   0     0   0    2 18.72
3      3         0          0 1999   220    146     27   0     0   0    2 19.71
4      4         0          0 1999   220    138     26   0     0   0    2 17.94
5      5         0          0 1999   220     40     17   0     0   0    2  3.40
6      6         0          0 1999   220     68     20   0     0   0    2  6.80
       fit2        lwr       upr
1 0.2102810 0.14528646 0.2943383
2 0.2106938 0.14566394 0.2947397
3 0.2214005 0.15589886 0.3044953
4 0.2113142 0.14623127 0.2953432
5 0.1374822 0.08001811 0.2260726
6 0.1597150 0.09888745 0.2476748

newdataframe<-expand.grid(Length=seq(17,101, by=5), Depth=c(110,180,235), Weight =c(765,1432,2222), Area = unique(cod$Area))

New dataframe with roups of interest!

predictedmodel <- predict(codmodel,newdataframe,se.fit=TRUE)
cod2 <-cod
newdataframe$fit2 <- plogis(predictedmodel$fit)
newdataframe$lwr <- plogis(predictedmodel$fit - 1.96*predictedmodel$se.fit)
newdataframe$upr <- plogis(predictedmodel$fit + 1.96*predictedmodel$se.fit)

head(newdataframe)

  Length Depth Weight Area       fit2        lwr        upr
1     17   110    765    2 0.03825608 0.01805071 0.07925324
2     22   110    765    2 0.05204935 0.02749554 0.09635740
3     27   110    765    2 0.07045148 0.04142088 0.11733814
4     32   110    765    2 0.09470970 0.06134546 0.14344682
5     37   110    765    2 0.12618675 0.08851427 0.17678323
6     42   110    765    2 0.16620464 0.12293428 0.22086986

PLOT!

ggplot(newdataframe,aes(x=Length,y=fit2, col=Area))+
#  geom_point(data=cod,aes(y=Prevalence))+
  
  geom_line()+
  geom_ribbon(alpha=0.15, aes(ymin=lwr,ymax=upr))+
  xlab("Length")+
  ylab("Prevalence")+
  theme_bw()+
  facet_grid(Depth~Weight)+
  scale_y_continuous(sec.axis = sec_axis(~ . , name = "Depth", breaks = NULL, labels = NULL)) +
  scale_x_continuous(sec.axis = sec_axis(~ . , name = "Weight", breaks = NULL, labels = NULL))

Today

Go to https://r-graph-gallery.com/ and play with plots. Look at them, read about what they do and you can replicate them… they provide the whole code! Then play with it (colors, labels, etc)

Alternative: finish your own plot from previous class

Alternative: Work on project