Here I created a new data frame just to test it out if the code works. At the end I mutated a column in the original joined data frame. The age column is in days, and month column is in month.
machine.age <- join %>%select(datapoint_time,machine_commissioning, machine_id) %>%group_by(machine_id)machine.age$dt <-as.vector(machine.age$datapoint_time)machine.age$mt <-as.vector(machine.age$machine_commissioning)machine.age <- machine.age %>%group_by(machine_id) %>%mutate(age = dt - mt, na.rm =TRUE)machine.age <- machine.age %>%group_by(machine_id) %>%mutate(month =round(age/30.417))join$month <- machine.age$monthlibrary(knitr)kable(machine.age[1:5,], caption ="New data frame to calculate age in days and months")
New data frame to calculate age in days and months
datapoint_time
machine_commissioning
machine_id
dt
mt
age
na.rm
month
2020-01-02
2014-06-20
201406209290
18263
16241
2022
TRUE
66
2020-01-02
2007-09-14
200709147860
18263
13770
4493
TRUE
148
2020-01-03
2007-06-17
200706173882
18264
13681
4583
TRUE
151
2020-01-06
2012-01-28
201201287469
18267
15367
2900
TRUE
95
2020-01-07
2009-03-20
200903202799
18268
14323
3945
TRUE
130
Scatterplot of age and cycles per month
Interesting questions:
Do failures depend on age or usage?
Are machines failing early or only after heavy use?
join <- join %>%mutate(cycle_per_month = customer_cycles_amount / month, na.rm =TRUE)ggplot(join, aes(month, cycle_per_month), theme(classic)) +geom_point(size =1)+geom_smooth()+labs(title ="Age and Usage Frequency",x ="Age in months",y ="Cycles per month" )