LAPORAN ANALISA DATASET GROUP 5

#Group 5

2602074343 - Crecia Viona
2602103431 - Keisha Adara
2602104516 - Aurellia Clarina Tahara
2602185154 - Canisius Aldhi

Institution : Universitas Bina Nusantara, Kampus Alam Sutera

Course : Visual Analytics and Application

Dataset : Beaver1, Austres, Ldeaths

#DATASET 1 : BEAVER 1

Beaver1 is a dataset containing records of the body temperature of a female beaver over a period of time. This data comes from a study that observed the body temperature behavior of beavers as part of a study of thermoregulation in small mammals. In this dataset, each observation records the body temperature of the beaver at a certain time interval, along with information about the animal’s activity—whether or not the beaver was active at the time the temperature was measured.

In more detail, the beaver1 dataset contains several variables, including body temperature (temp), time in minutes (time), and activity (activ), where a value of “1” indicates that the beaver is active and a value of “0” indicates that the beaver is inactive. This dataset is very useful for researchers who want to understand how the body temperature of beavers changes according to their activity, and can also help in exploring basic concepts such as body temperature fluctuation patterns and responses to physical activity.

library(ggplot2)
library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(ggplotlyExtra)

beaver1

##     day time  temp activ
## 1   346  840 36.33     0
## 2   346  850 36.34     0
## 3   346  900 36.35     0
## 4   346  910 36.42     0
## 5   346  920 36.55     0
## 6   346  930 36.69     0
## 7   346  940 36.71     0
## 8   346  950 36.75     0
## 9   346 1000 36.81     0
## 10  346 1010 36.88     0
## 11  346 1020 36.89     0
## 12  346 1030 36.91     0
## 13  346 1040 36.85     0
## 14  346 1050 36.89     0
## 15  346 1100 36.89     0
## 16  346 1110 36.67     0
## 17  346 1120 36.50     0
## 18  346 1130 36.74     0
## 19  346 1140 36.77     0
## 20  346 1150 36.76     0
## 21  346 1200 36.78     0
## 22  346 1210 36.82     0
## 23  346 1220 36.89     0
## 24  346 1230 36.99     0
## 25  346 1240 36.92     0
## 26  346 1250 36.99     0
## 27  346 1300 36.89     0
## 28  346 1310 36.94     0
## 29  346 1320 36.92     0
## 30  346 1330 36.97     0
## 31  346 1340 36.91     0
## 32  346 1350 36.79     0
## 33  346 1400 36.77     0
## 34  346 1410 36.69     0
## 35  346 1420 36.62     0
## 36  346 1430 36.54     0
## 37  346 1440 36.55     0
## 38  346 1450 36.67     0
## 39  346 1500 36.69     0
## 40  346 1510 36.62     0
## 41  346 1520 36.64     0
## 42  346 1530 36.59     0
## 43  346 1540 36.65     0
## 44  346 1550 36.75     0
## 45  346 1600 36.80     0
## 46  346 1610 36.81     0
## 47  346 1620 36.87     0
## 48  346 1630 36.87     0
## 49  346 1640 36.89     0
## 50  346 1650 36.94     0
## 51  346 1700 36.98     0
## 52  346 1710 36.95     0
## 53  346 1720 37.00     0
## 54  346 1730 37.07     1
## 55  346 1740 37.05     0
## 56  346 1750 37.00     0
## 57  346 1800 36.95     0
## 58  346 1810 37.00     0
## 59  346 1820 36.94     0
## 60  346 1830 36.88     0
## 61  346 1840 36.93     0
## 62  346 1850 36.98     0
## 63  346 1900 36.97     0
## 64  346 1910 36.85     0
## 65  346 1920 36.92     0
## 66  346 1930 36.99     0
## 67  346 1940 37.01     0
## 68  346 1950 37.10     1
## 69  346 2000 37.09     0
## 70  346 2010 37.02     0
## 71  346 2020 36.96     0
## 72  346 2030 36.84     0
## 73  346 2040 36.87     0
## 74  346 2050 36.85     0
## 75  346 2100 36.85     0
## 76  346 2110 36.87     0
## 77  346 2120 36.89     0
## 78  346 2130 36.86     0
## 79  346 2140 36.91     0
## 80  346 2150 37.53     1
## 81  346 2200 37.23     0
## 82  346 2210 37.20     0
## 83  346 2230 37.25     1
## 84  346 2240 37.20     0
## 85  346 2250 37.21     0
## 86  346 2300 37.24     1
## 87  346 2310 37.10     0
## 88  346 2320 37.20     0
## 89  346 2330 37.18     0
## 90  346 2340 36.93     0
## 91  346 2350 36.83     0
## 92  347    0 36.93     0
## 93  347   10 36.83     0
## 94  347   20 36.80     0
## 95  347   30 36.75     0
## 96  347   40 36.71     0
## 97  347   50 36.73     0
## 98  347  100 36.75     0
## 99  347  110 36.72     0
## 100 347  120 36.76     0
## 101 347  130 36.70     0
## 102 347  140 36.82     0
## 103 347  150 36.88     0
## 104 347  200 36.94     0
## 105 347  210 36.79     0
## 106 347  220 36.78     0
## 107 347  230 36.80     0
## 108 347  240 36.82     0
## 109 347  250 36.84     0
## 110 347  300 36.86     0
## 111 347  310 36.88     0
## 112 347  320 36.93     0
## 113 347  330 36.97     0
## 114 347  340 37.15     1

str(beaver1)

## 'data.frame':    114 obs. of  4 variables:
##  $ day  : num  346 346 346 346 346 346 346 346 346 346 ...
##  $ time : num  840 850 900 910 920 930 940 950 1000 1010 ...
##  $ temp : num  36.3 36.3 36.4 36.4 36.5 ...
##  $ activ: num  0 0 0 0 0 0 0 0 0 0 ...

#GRAPHICS (BEAVER 1)

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#STACK BARPLOT
df_count <- beaver1 %>%
  mutate(temp_group = cut(temp, breaks = seq(floor(min(temp)), ceiling(max(temp)), by = 0.5))) %>%
  count(temp_group, activ)

plot_ly(
  data = df_count, 
  x = ~temp_group, 
  y = ~n, 
  color = ~factor(activ, labels = c("Inactive", "Active")), 
  type = "bar"
) %>% 
  layout(
    barmode = "stack",
    xaxis = list(title = "Temperature (°C)"),
    yaxis = list(title = "Total"),
    title = "Distribution of Beaver Activity by Temperature"
  )

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

The graph shows the distribution of beaver activity based on temperature in the form of a stack bar plot. There are three temperature ranges displayed on the x-axis, namely (36, 36.5), (36.5, 37), and also (37.2, 38) degrees Celsius. In this graph, the activity of the items is divided into two categories: “Active” and “Inactive”, which are displayed in different colors for easy identification.

From the graph results, it can be seen that most items are in the “Inactive” status at temperatures (36.5, 37) with a total of more than 90. This indicates that this temperature range tends to make items in an inactive state. On the other hand, for the same temperature range (36.5, 37), only a few items are in the “Active” status, but it is still higher than other temperature ranges.

In the temperature range (36, 36.5), there are only a few items recorded, and all of them are in the “Inactive” condition. Similarly, in the temperature range (37.5, 38), the number of items recorded is very small, and all of them are also in the “Inactive” state. This may indicate that items in this dataset are more frequently found in the temperature range (36.5, 37) than in other temperature ranges, and that the “Inactive” state is more dominant across the temperature ranges. Overall, this graph indicates that temperature has an effect on the activity of items, with the temperature range (36.5, 37) being the most common, especially for items with the “Inactive” state.

#BOXPLOT MULTIVARIATE
library(plotly)
plot_ly(data = beaver1, 
        y = ~temp, 
        x = ~factor(activ),  
        type = "box", 
        color = ~factor(activ), 
        colors = c("blue", "red"), 
        boxmean = TRUE) %>%
  layout(
    title = "Boxplot of Beaver Body Temperature by Activity",
    xaxis = list(title = "Activity (0: Inactive, 1: Active)"),
    yaxis = list(title = "Temperature (°C)"),
    showlegend = FALSE  
  )

The figure above shows a boxplot of beaver body temperature based on activity. It can be seen that active beavers (1) have a higher average body temperature compared to inactive beavers. The body temperature for inactive beavers, shown in blue, has a wider distribution, with a median temperature of around 36.8°C. The data for this category also shows greater variability, as seen from the fairly wide interquartile range and the presence of outliers on both sides, both at low temperatures around 36.4°C and at high temperatures above 37.2°C. This means that the beaver’s physical activity affects their metabolism and increases their body temperature.

Meanwhile, items in an active state, shown in red, have a narrower temperature range, with a median closer to 37.2°C. The temperature distribution tends to be more concentrated around this median, with no values deviating significantly. This suggests that items in an active state have a more stable temperature than items in an inactive state.

#SCATTERPLOT
library(plotly)
plot_ly(data = beaver1, 
        x = ~time,            
        y = ~temp,            
        type = "scatter", 
        mode = "markers",     
        color = ~factor(activ),  
        colors = c("blue", "red"),  
        symbol = ~factor(activ), 
        marker = list(size = 8)) %>% 
  layout(
    title = "Scatter Plot of Otter Body Temperature Based on Activity",
    xaxis = list(title = "Time"),
    yaxis = list(title = "Temperature (°C)"),
    showlegend = TRUE
  )

The graph shows the relationship between beaver body temperature and time, with activity categories distinguished by the color and shape of the symbol: blue dots for inactive beavers (0) and red triangles for active ones (1). On the vertical axis, we can see the beavers’ body temperature in degrees Celsius, while the horizontal axis shows the time. From this graph, we can see that the body temperature of inactive beavers mostly ranges between 36.6°C and 37°C, although there are some points outside this range. Inactive beavers are more often seen to be at a relatively stable temperature below 37°C. However, once there is activity (marked with red triangles), the beavers’ body temperature tends to increase, with most being above 37°C. This suggests a link between activity and increased body temperature, with active beavers generally having higher body temperatures than when they are inactive.

In addition, we can also observe a pattern of body temperature fluctuations over a relatively long period of time, with increases in body temperature often coinciding with periods of activity. This suggests that changes in activity have a direct effect on the otters’ body temperature, which may be due to increased metabolism when they are active. Overall, this scatterplot illustrates that the otters’ activity is correlated with increases in their body temperature, especially at certain times..

#LINE CHARTS
library(plotly)
fig <- plot_ly(data = beaver1, 
               x = ~time,           
               y = ~temp,           
               type = 'scatter', 
               mode = 'lines',      
               line = list(color = 'rgb(255, 99, 71)', width = 3)) 
fig

The graph shows the beaver’s body temperature displayed (where x=time, y=temp), and we can see that there is quite a bit of variation in temperature over the observation period. At the beginning of the time, the beaver’s body temperature appears to be at a fairly high level, then drops before going back up and down without any fixed pattern.

There are a few interesting things about this fluctuation pattern. First, we can see sharp spikes and drops in temperature over a short period of time at several points. This may indicate activity or changes in conditions that trigger rapid changes in the beaver’s body temperature. Also, despite the large variation, the beaver’s body temperature generally hovers around 36.8°C, with a slight downward trend over time. While this gives a general idea, body temperature changes occur daily, perhaps even hourly.

#DATASET 2 : AUSTRES

Austres is a dataset that records the estimated population of Australia in thousands from 1971 to 1994, recorded quarterly. This dataset serves as an example of time series data and is often used to teach population trend analysis and various methods of time series analysis.

The Austres data covers Australia’s population growth patterns over two decades, allowing us to see how the country’s population has evolved over time. With this data, R users can learn how to visualize and analyze seasonal trends and long-term population growth. Austres is also often used in population forecasting exercises or in simulations involving demographic growth projections.

library(ggplot2)
library(plotly)
library(ggplotlyExtra)

data("austres")
str(austres)

##  Time-Series [1:89] from 1971 to 1993: 13067 13130 13198 13254 13304 ...

#GRAPHICS (AUSTRES)

library(dplyr)
#HISTOGRAM QUARTERLY
library(plotly)
austres_data <- data.frame(
  Year = rep(1971:1993, each = 4),  
  Qtr1 = c(13067.3, 13254.2, 13459.2, 13669.5, 13862.6, 14004.7, 14155.6, 14330.3, 14478.4, 14646.4, 14874.4, 15121.7, 15346.2, 15531.5, 15736.7, 15961.5, 16203.0, 16478.3, 16777.2, 17026.3, 17239.4, 17447.3, 17627.1),
  Qtr2 = c(13130.5, 13303.7, 13504.5, 13722.6, 13893.0, 14033.1, 14192.2, 14359.3, 14515.7, 14695.4, 14923.3, 15184.2, 15393.5, 15579.4, 15788.3, 16018.3, 16263.3, 16538.2, 16833.1, 17085.4, 17292.0, 17482.6, 17661.5),
  Qtr3 = c(13198.4, 13353.9, 13552.6, 13772.1, 13926.8, 14066.0, 14231.7, 14396.6, 14554.9, 14746.6, 14988.7, 15239.3, 15439.0, 15628.5, 15839.7, 16076.9, 16327.9, 16621.6, 16891.6, 17106.9, 17354.2, 17526.0, NA),
  Qtr4 = c(NA, 13409.3, 13614.3, 13832.0, 13968.9, 14110.1, 14281.5, 14430.8, 14602.5, 14807.4, 15054.1, 15288.9, 15483.5, 15677.3, 15900.6, 16139.0, 16398.9, 16697.0, 16956.8, 17169.4, 17414.2, 17568.7, NA)
)
#QUARTER 1
plot_ly(data = austres_data, 
        x = ~Qtr1,               
        type = "histogram",      
        nbinsx = 20) %>%         
  layout(
    title = "Histogram Quarter 1",
    xaxis = list(title = "Quarter1"),   
    yaxis = list(title = "Frequency"),     
    bargap = 0.1                          
  )

In this histogram, the x-axis shows the population in Australia in thousands, with a range of values from 13,000 to 18,000. Meanwhile, the y-axis shows the frequency or number of observations. Based on the Quarter 1 histogram, it can be seen that the highest frequency is in the range of values around 14,000, as many as around 16. The distribution in the Quarter 1 histogram is uneven and there are quite significant fluctuations. This data shows with large variations in the distribution of the population in Australia in Quarter 1.

#HISTOGRAM QUARTER 2
plot_ly(data = austres_data, 
        x = ~Qtr2,              
        type = "histogram",     
        nbinsx = 20,            
        marker = list(color = 'rgba(255, 100, 102, 0.7)')) %>% 
  layout(
    title = "Histogram Quarter 2",
    xaxis = list(title = "Quarter2"),   
    yaxis = list(title = "Frequency"),    
    bargap = 0.1                          
  )

In the Quarter 2 histogram above, the data distribution looks more stable when compared to Quarter 1. In the range of values around 14,000 to 17,000, it shows a fairly high and consistent frequency with the highest frequency of around 12 for values 14,000 and 17,000. Meanwhile, a lower frequency is seen in the range of values 15,000 and 16,000 which indicates a decrease in the population.

#HISTOGRAM QUARTER 3
plot_ly(data = austres_data, 
        x = ~Qtr3,             
        type = "histogram",    
        nbinsx = 20,           
        marker = list(color = 'rgba(102, 204, 255, 0.7)')) %>% 
  layout(
    title = "Histogram Quarter 3",
    xaxis = list(title = "Quarter 3"),   
    yaxis = list(title = "Frequency"),    
    bargap = 0.1                          
  )

## Warning: Ignoring 4 observations

In the Quarter 3 histogram above, the data distribution shows that the range of 14,000 to 15,000 has the highest frequency, which is 12 times. This shows that the number of Australians is most often in this range for the third quarter period. Then, at higher value ranges, the frequency begins to decrease, especially for values above 17,000. Overall, the number of Australians with the highest frequency is in the range of 14,000 to 15,000 and is increasingly rare in the higher value ranges.

#HISTOGRAM QUARTER 4
plot_ly(data = austres_data, 
        x = ~Qtr4,              
        type = "histogram",    
        nbinsx = 20,           
        marker = list(color = 'rgba(0, 128, 0, 0.7)')) %>% 
  layout(
    title = "Histogram Quarter 4",
    xaxis = list(title = "Quarter 4"),   
    yaxis = list(title = "Frequency"),    
    bargap = 0.1                          
  )

## Warning: Ignoring 8 observations

Based on the histogram of Quarter 4 above, it can be seen that the population of Australia around 14,000 to 15,000 is the most frequent number, which is 12 times. This shows that in 1971-1994, the population of Australia in Quarter 4 generally ranged from 14,000 to 15,000 people. Meanwhile, the population approaching 13,000 and 18,000 appears less, which is around 4 times. Overall, the data distribution for Quarter 4 is quite uniform with no values that are much different from other data.

#DATASET 3 : LDEATHS

Ldeaths is a time series dataset that records the monthly number of deaths from respiratory diseases in the UK, from 1974 to 1979. The data is measured in thousands and provides insight into patterns of mortality that may be related to seasonal conditions, particularly during the winter months when respiratory diseases tend to increase.

Ldeaths allows R users to analyze seasonal trends in respiratory deaths, such as whether there is a significant increase in winter or other fluctuations seen each year. This dataset is often used in time series analysis exercises to understand seasonal patterns, identify long-term trends, or make predictions about future death rates. Interpretation: BARPLOT LDEATHS

library(ggplot2)
library(plotly)
library(ggplotlyExtra)

ldeaths

##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1974 3035 2552 2704 2554 2014 1655 1721 1524 1596 2074 2199 2512
## 1975 2933 2889 2938 2497 1870 1726 1607 1545 1396 1787 2076 2837
## 1976 2787 3891 3179 2011 1636 1580 1489 1300 1356 1653 2013 2823
## 1977 3102 2294 2385 2444 1748 1554 1498 1361 1346 1564 1640 2293
## 1978 2815 3137 2679 1969 1870 1633 1529 1366 1357 1570 1535 2491
## 1979 3084 2605 2573 2143 1693 1504 1461 1354 1333 1492 1781 1915

data("ldeaths")

## Warning in data("ldeaths"): data set 'ldeaths' not found

str(ldeaths)

##  Time-Series [1:72] from 1974 to 1980: 3035 2552 2704 2554 2014 ...

#BAPLOT FOR LDEATHS

ldeaths_df <- as.data.frame(ldeaths)
str(ldeaths_df)

## 'data.frame':    72 obs. of  1 variable:
##  $ x: Time-Series  from 1974 to 1980: 3035 2552 2704 2554 2014 ...

colnames(ldeaths_df) <- "Deaths"

ldeaths_df$Year <- rep(1974:1979, each = 12)
ldeaths_df$Month <- rep(1:12, times = 6)
head(ldeaths_df)

##   Deaths Year Month
## 1   3035 1974     1
## 2   2552 1974     2
## 3   2704 1974     3
## 4   2554 1974     4
## 5   2014 1974     5
## 6   1655 1974     6

library(dplyr)
ldeaths_yearly <- ldeaths_df %>%
  group_by(Year) %>%
  summarise(TotalDeaths = sum(Deaths))

library(ggplot2)
ggplot(ldeaths_yearly, aes(x = factor(Year), y = TotalDeaths, fill = factor(Year))) +
  geom_bar(stat = "identity") +
  labs(title = "Total Number of Deaths Per Year (1974-1979)", 
       x = "Years", 
       y = "Total Number of Deaths") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3")

The graph shows the annual death toll from 1974 to 1979. Each colored bar represents a specific year within the time period, and the height of the bar indicates the total number of deaths that occurred in that year. From this graph, it can be seen that 1974 had the highest death toll compared to other years, followed by 1976. These two years stand out slightly compared to other years in terms of death toll, although the difference is not too large.

In this graph, we also see that 1977 and 1978 had slightly lower death tolls than other years, although they remained relatively stable. In other words, the fluctuations in death toll from year to year were not too extreme; death tolls only varied slightly around the average. This graph shows a consistent stable pattern with no sharp spikes or sharp drops in the annual death toll, indicating that during this period, the death toll was in a fairly controlled or stable condition.

In terms of visualization, these colored bars not only make the data easier to read, but also provide a visual representation that makes it easier to identify the years with the highest or lowest death rates at a glance. While the overall pattern of this graph shows a stable pattern, small differences between the bars are still visible. This suggests that while 1974 and 1976 may have had certain conditions or factors that caused slightly higher death rates, the variations were not significant enough to cause a large spike in the graph.

Overall, this graph suggests a relatively calm period in terms of death rates, with no signs of a health crisis or significant major changes over the six-year period. If there were fluctuations, they were small in scope and did not significantly affect the overall stability of the graph. This can be interpreted as indicating that there may have been successful efforts to maintain or improve health conditions during this time, or at least that there was no drastic deterioration in conditions that caused a sharp increase in annual death rates.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

LAPORAN ANALISA DATASET GROUP 5

2024-11-13