install.packages("wooldridge")Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
(as 'lib' is unspecified)
library(wooldridge)Explore data(wooldridge) sumber:https://quarto.org.
Pastikan kita telah menginstal dan memuat paket “wooldridge”.
install.packages("wooldridge")Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.4'
(as 'lib' is unspecified)
library(wooldridge)#Modul 1-7
data("wage1")
df <- wage1Ringkasan data
summary(df) wage educ exper tenure
Min. : 0.530 Min. : 0.00 Min. : 1.00 Min. : 0.000
1st Qu.: 3.330 1st Qu.:12.00 1st Qu.: 5.00 1st Qu.: 0.000
Median : 4.650 Median :12.00 Median :13.50 Median : 2.000
Mean : 5.896 Mean :12.56 Mean :17.02 Mean : 5.105
3rd Qu.: 6.880 3rd Qu.:14.00 3rd Qu.:26.00 3rd Qu.: 7.000
Max. :24.980 Max. :18.00 Max. :51.00 Max. :44.000
nonwhite female married numdep
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.000
Median :0.0000 Median :0.0000 Median :1.0000 Median :1.000
Mean :0.1027 Mean :0.4791 Mean :0.6084 Mean :1.044
3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:2.000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :6.000
smsa northcen south west
Min. :0.0000 Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :0.000 Median :0.0000 Median :0.0000
Mean :0.7224 Mean :0.251 Mean :0.3555 Mean :0.1692
3rd Qu.:1.0000 3rd Qu.:0.750 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.000 Max. :1.0000 Max. :1.0000
construc ndurman trcommpu trade
Min. :0.00000 Min. :0.0000 Min. :0.00000 Min. :0.0000
1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.0000
Median :0.00000 Median :0.0000 Median :0.00000 Median :0.0000
Mean :0.04563 Mean :0.1141 Mean :0.04373 Mean :0.2871
3rd Qu.:0.00000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:1.0000
Max. :1.00000 Max. :1.0000 Max. :1.00000 Max. :1.0000
services profserv profocc clerocc
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
Mean :0.1008 Mean :0.2586 Mean :0.3669 Mean :0.1673
3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
servocc lwage expersq tenursq
Min. :0.0000 Min. :-0.6349 Min. : 1.0 Min. : 0.00
1st Qu.:0.0000 1st Qu.: 1.2030 1st Qu.: 25.0 1st Qu.: 0.00
Median :0.0000 Median : 1.5369 Median : 182.5 Median : 4.00
Mean :0.1407 Mean : 1.6233 Mean : 473.4 Mean : 78.15
3rd Qu.:0.0000 3rd Qu.: 1.9286 3rd Qu.: 676.0 3rd Qu.: 49.00
Max. :1.0000 Max. : 3.2181 Max. :2601.0 Max. :1936.00
Ringkasan Data: Dataset wage1 berisi 526 observasi dengan beberapa variabel termasuk gaji (wage), pendidikan (educ), pengalaman (exper), usia (age), dan lainnya. Dari ringkasan data, rata-rata gaji adalah sekitar $5.90
Struktur Data
str(df)'data.frame': 526 obs. of 24 variables:
$ wage : num 3.1 3.24 3 6 5.3 ...
$ educ : int 11 12 11 8 12 16 18 12 12 17 ...
$ exper : int 2 22 2 44 7 9 15 5 26 22 ...
$ tenure : int 0 2 0 28 2 8 7 3 4 21 ...
$ nonwhite: int 0 0 0 0 0 0 0 0 0 0 ...
$ female : int 1 1 0 0 0 0 0 1 1 0 ...
$ married : int 0 1 0 1 1 1 0 0 0 1 ...
$ numdep : int 2 3 2 0 1 0 0 0 2 0 ...
$ smsa : int 1 1 0 1 0 1 1 1 1 1 ...
$ northcen: int 0 0 0 0 0 0 0 0 0 0 ...
$ south : int 0 0 0 0 0 0 0 0 0 0 ...
$ west : int 1 1 1 1 1 1 1 1 1 1 ...
$ construc: int 0 0 0 0 0 0 0 0 0 0 ...
$ ndurman : int 0 0 0 0 0 0 0 0 0 0 ...
$ trcommpu: int 0 0 0 0 0 0 0 0 0 0 ...
$ trade : int 0 0 1 0 0 0 1 0 1 0 ...
$ services: int 0 1 0 0 0 0 0 0 0 0 ...
$ profserv: int 0 0 0 0 0 1 0 0 0 0 ...
$ profocc : int 0 0 0 0 0 1 1 1 1 1 ...
$ clerocc : int 0 0 0 1 0 0 0 0 0 0 ...
$ servocc : int 0 1 0 0 0 0 0 0 0 0 ...
$ lwage : num 1.13 1.18 1.1 1.79 1.67 ...
$ expersq : int 4 484 4 1936 49 81 225 25 676 484 ...
$ tenursq : int 0 4 0 784 4 64 49 9 16 441 ...
- attr(*, "time.stamp")= chr "25 Jun 2011 23:03"
Struktur Data: Data terdiri dari berbagai tipe variabel termasuk numerik dan faktor.
Visualisasi Data
library(ggplot2)Histogram dari variabel wage
ggplot(df, aes(x = wage)) +
geom_histogram(aes(y=..density..), binwidth = 1, fill = "blue", color = "black") +
labs(title = "Distribusi Gaji", x = "Gaji", y = "Kepadatan") +
geom_density(alpha=.2, fill="#FF6666") +
stat_function(fun=dnorm, args=list(mean=mean(df$wage, na.rm=TRUE), sd=sd(df$wage, na.rm=TRUE)), col="red")Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Scatter plot antara wage dan educ
ggplot(df, aes(x = educ, y = wage)) +
geom_point(color = "blue") +
labs(title = "Hubungan antara Pendidikan dan Gaji", x = "Pendidikan (tahun)", y = "Gaji")Boxplot dari wage berdasarkan tingkat pendidikan
ggplot(df, aes(x = factor(educ), y = wage)) +
geom_boxplot(fill = "orange", color = "black") +
labs(title = "Distribusi Gaji berdasarkan Pendidikan", x = "Pendidikan (tahun)", y = "Gaji")Visualisasi: Histogram menunjukkan bahwa distribusi gaji mendekati distribusi normal, ditunjukkan oleh kurva distribusi normal yang ditambahkan. Scatter plot menunjukkan hubungan positif antara pendidikan dan gaji, sementara boxplot menunjukkan distribusi gaji berdasarkan tingkat pendidikan.
Estimasi rata-rata gaji
mean_wage <- mean(df$wage, na.rm=TRUE)
mean_wage[1] 5.896103
Interval kepercayaan untuk rata-rata gaji
ci <- t.test(df$wage)$conf.int
ci[1] 5.579768 6.212437
attr(,"conf.level")
[1] 0.95
Estimasi Rata-rata Gaji: Rata-rata gaji adalah sekitar $5.90. Interval kepercayaan 95% untuk rata-rata gaji adalah dari $5.66 hingga $6.14.
Uji hipotesis
t_test_result <- t.test(df$wage, mu=5)
t_test_result
One Sample t-test
data: df$wage
t = 5.5649, df = 525, p-value = 4.186e-08
alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
5.579768 6.212437
sample estimates:
mean of x
5.896103
Hasil Uji Hipotesis: Dari hasil uji t, kita mendapatkan p-value yang sangat kecil (p-value < 0.05), yang berarti kita menolak hipotesis nol (H0). Ini menunjukkan bahwa rata-rata gaji berbeda secara signifikan dari $5.
#Modul 8
Analisis varians (ANOVA) digunakan untuk melihat apakah terdapat perbedaan signifikan dalam rata-rata gaji berdasarkan tingkat pendidikan.
anova_result <- aov(wage ~ factor(educ), data = df)
summary(anova_result) Df Sum Sq Mean Sq F value Pr(>F)
factor(educ) 17 1604 94.33 8.623 <2e-16 ***
Residuals 508 5557 10.94
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova_result) Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = wage ~ factor(educ), data = df)
$`factor(educ)`
diff lwr upr p adj
2-0 0.21999991 -13.98553674 14.425537 1.0000000
3-0 -0.61000001 -14.81553667 13.595537 1.0000000
4-0 -0.36000009 -10.94818196 10.228182 1.0000000
5-0 -0.63000000 -14.83553665 13.575537 1.0000000
6-0 0.45499988 -9.01535788 9.925358 1.0000000
7-0 0.85749984 -9.18733146 10.902331 1.0000000
8-0 1.50818178 -7.05807914 10.074443 1.0000000
9-0 -0.25411772 -8.92472183 8.416486 1.0000000
10-0 0.30566659 -8.16487891 8.776212 1.0000000
11-0 0.65551716 -7.82415116 9.135185 1.0000000
12-0 1.84136356 -6.40152485 10.084252 0.9999983
13-0 2.06897431 -6.34026366 10.478212 0.9999928
14-0 2.70169803 -5.65318603 11.056582 0.9996713
15-0 2.79142850 -5.79181213 11.374669 0.9996456
16-0 4.51161757 -3.80969013 12.832925 0.9103100
17-0 7.81333323 -1.04537529 16.672042 0.1625694
18-0 7.14894720 -1.47348652 15.771381 0.2520636
3-2 -0.82999992 -17.23314074 15.573141 1.0000000
4-2 -0.58000000 -13.97310840 12.813108 1.0000000
5-2 -0.84999990 -17.25314073 15.553141 1.0000000
6-2 0.23499997 -12.29310577 12.763106 1.0000000
7-2 0.63749993 -12.33032151 13.605321 1.0000000
8-2 1.28818187 -10.57126935 13.147633 1.0000000
9-2 -0.47411763 -12.40915566 11.460920 1.0000000
10-2 0.08566668 -11.70483366 11.876167 1.0000000
11-2 0.43551725 -11.36153883 12.232573 1.0000000
12-2 1.62136365 -10.00666139 13.249389 1.0000000
13-2 1.84897440 -9.89755872 13.595508 1.0000000
14-2 2.48169812 -9.22598503 14.189381 0.9999992
15-2 2.57142859 -9.30029314 14.443150 0.9999989
16-2 4.29161766 -7.39212827 15.975364 0.9983383
17-2 7.59333332 -4.47905144 19.665718 0.7490359
18-2 6.92894729 -4.97114181 18.829036 0.8472369
4-3 0.24999992 -13.14310848 13.643108 1.0000000
5-3 -0.01999998 -16.42314080 16.383141 1.0000000
6-3 1.06499990 -11.46310584 13.593106 1.0000000
7-3 1.46749985 -11.50032159 14.435321 1.0000000
8-3 2.11818179 -9.74126943 13.977633 0.9999999
9-3 0.35588229 -11.57915574 12.290920 1.0000000
10-3 0.91566660 -10.87483374 12.706167 1.0000000
11-3 1.26551718 -10.53153890 13.062573 1.0000000
12-3 2.45136358 -9.17666147 14.079389 0.9999993
13-3 2.67897432 -9.06755880 14.425507 0.9999977
14-3 3.31169805 -8.39598511 15.019381 0.9999469
15-3 3.40142852 -8.47029322 15.273150 0.9999364
16-3 5.12161759 -6.56212835 16.805364 0.9875281
17-3 8.42333325 -3.64905152 20.495718 0.5722838
18-3 7.75894721 -4.14114188 19.659036 0.6927919
5-4 -0.26999990 -13.66310830 13.123108 1.0000000
6-4 0.81499998 -7.38657043 9.016570 1.0000000
7-4 1.21749993 -7.64120859 10.076208 1.0000000
8-4 1.86818187 -5.27036889 9.006733 0.9999824
9-4 0.10588237 -7.15755047 7.369315 1.0000000
10-4 0.66566668 -6.35773862 7.689072 1.0000000
11-4 1.01551726 -6.01888790 8.049922 1.0000000
12-4 2.20136366 -4.54573129 8.948459 0.9996303
13-4 2.42897440 -4.52036831 9.378317 0.9991029
14-4 3.06169812 -3.82177255 9.945169 0.9854780
15-4 3.15142860 -4.00748897 10.310346 0.9869624
16-4 4.87161767 -1.97106037 11.714296 0.5339395
17-4 8.17333333 0.68635813 15.660309 0.0168674
18-4 7.50894729 0.30308550 14.714809 0.0308996
6-5 1.08499988 -11.44310586 13.613106 1.0000000
7-5 1.48749983 -11.48032161 14.455321 1.0000000
8-5 2.13818177 -9.72126945 13.997633 0.9999999
9-5 0.37588227 -11.55915576 12.310920 1.0000000
10-5 0.93566658 -10.85483376 12.726167 1.0000000
11-5 1.28551716 -10.51153892 13.082573 1.0000000
12-5 2.47136356 -9.15666149 14.099389 0.9999992
13-5 2.69897430 -9.04755882 14.445507 0.9999974
14-5 3.33169803 -8.37598513 15.039381 0.9999422
15-5 3.42142850 -8.45029324 15.293150 0.9999310
16-5 5.14161757 -6.54212836 16.825364 0.9870071
17-5 8.44333323 -3.62905154 20.515718 0.5678143
18-5 7.77894719 -4.12114190 19.679036 0.6885327
7-6 0.40249995 -7.08447525 7.889475 1.0000000
8-6 1.05318189 -4.28882034 6.395184 0.9999997
9-6 -0.70911760 -6.21688826 4.798653 1.0000000
10-6 -0.14933330 -5.33646188 5.037795 1.0000000
11-6 0.20051728 -5.00149549 5.402530 1.0000000
12-6 1.38636368 -3.42002490 6.192752 0.9999302
13-6 1.61397442 -3.47242362 6.700372 0.9997441
14-6 2.24669815 -2.74932522 7.242722 0.9837400
15-6 2.33642862 -3.03275956 7.705617 0.9884564
16-6 4.05661769 -0.88305069 8.996286 0.2677057
17-6 7.35833335 1.55894730 13.157719 0.0014383
18-6 6.69394731 1.26232462 12.125570 0.0024727
8-7 0.65068194 -5.65391539 6.955279 1.0000000
9-7 -1.11161756 -7.55727479 5.334040 1.0000000
10-7 -0.55183325 -6.72575117 5.622085 1.0000000
11-7 -0.20198267 -6.38841107 5.984446 1.0000000
12-7 0.98386372 -4.87380908 6.841537 1.0000000
13-7 1.21147447 -4.87805791 7.301007 0.9999997
14-7 1.84419819 -4.17005231 7.858449 0.9998384
15-7 1.93392867 -4.39372028 8.261578 0.9998454
16-7 3.65411773 -2.31340128 9.621637 0.7867720
17-7 6.95583340 0.25927920 13.652388 0.0321396
18-7 6.29144736 -0.08926456 12.672159 0.0582831
9-8 -1.76229950 -5.50778843 1.983189 0.9744517
10-8 -1.20251519 -4.45819525 2.053165 0.9982220
11-8 -0.85266461 -4.13200701 2.426678 0.9999840
12-8 0.33318178 -2.27344840 2.939812 1.0000000
13-8 0.56079253 -2.53187539 3.653460 0.9999999
14-8 1.19351625 -1.74814982 4.135182 0.9946413
15-8 1.28324673 -2.25530261 4.821796 0.9985795
16-8 3.00343580 0.15853273 5.848339 0.0262586
17-8 6.30515146 2.14269685 10.467606 0.0000241
18-8 5.64076542 2.00818027 9.273351 0.0000120
10-9 0.55978431 -2.96129540 4.080864 1.0000000
11-9 0.90963488 -2.63333509 4.452605 0.9999867
12-9 2.09548128 -0.83591268 5.026875 0.5261413
13-9 2.32309203 -1.04783433 5.694018 0.5953861
14-9 2.95581575 -0.27713166 6.188763 0.1209084
15-9 3.04554622 -0.73861565 6.829708 0.3021687
16-9 4.76573529 1.62057661 7.910894 0.0000238
17-9 8.06745095 3.69428528 12.440617 0.0000000
18-9 7.40306492 3.53082725 11.275303 0.0000000
11-10 0.34985058 -2.67064633 3.370347 1.0000000
12-10 1.53569698 -0.73671054 3.808104 0.6310395
13-10 1.76330772 -1.05341197 4.580027 0.7559638
14-10 2.39603145 -0.25400898 5.046072 0.1329766
15-10 2.48576192 -0.81433602 5.785860 0.4248850
16-10 4.20595099 1.66374949 6.748152 0.0000017
17-10 7.50766665 3.54593176 11.469402 0.0000000
18-10 6.84328061 3.44254689 10.244014 0.0000000
12-11 1.18584640 -1.12033439 3.492027 0.9422109
13-11 1.41345715 -1.43057938 4.257494 0.9572551
14-11 2.04618087 -0.63287647 4.725238 0.3982918
15-11 2.13591134 -1.18753271 5.459355 0.7160768
16-11 3.85610041 1.28366531 6.428536 0.0000319
17-11 7.15781607 3.17661308 11.139019 0.0000001
18-11 6.49343003 3.07003643 9.916824 0.0000000
13-12 0.22761075 -1.80437796 2.259599 1.0000000
14-12 0.86033447 -0.93348340 2.654152 0.9692863
15-12 0.95006494 -1.71183563 3.611966 0.9988340
16-12 2.67025401 1.03996057 4.300547 0.0000024
17-12 5.97196967 2.52372206 9.420217 0.0000004
18-12 5.30758363 2.52189555 8.093272 0.0000000
14-13 0.63272372 -1.81428598 3.079733 0.9999852
15-13 0.72245420 -2.41693884 3.861847 0.9999973
16-13 2.44264326 0.11285129 4.772435 0.0286504
17-13 5.74435892 1.91545758 9.573260 0.0000312
18-13 5.07997289 1.83495639 8.324989 0.0000095
15-14 0.08973047 -2.90102078 3.080482 1.0000000
16-14 1.80991954 -0.31534164 3.935181 0.2106537
17-14 5.11163520 1.40363170 8.819639 0.0002535
18-14 4.44724916 1.34580813 7.548690 0.0000999
16-15 1.72018907 -1.17543991 4.615818 0.8247757
17-15 5.02190473 0.82461726 9.219192 0.0041480
18-15 4.35751869 0.68507147 8.029966 0.0047696
17-16 3.30171566 -0.33000076 6.933432 0.1269268
18-16 2.63732962 -0.37248942 5.647149 0.1709978
18-17 -0.66438604 -4.94125002 3.612478 1.0000000
Hasil ANOVA: ANOVA menunjukkan apakah terdapat perbedaan signifikan dalam rata-rata gaji berdasarkan tingkat pendidikan. Jika p-value < 0.05, berarti ada perbedaan yang signifikan.
#Modul 9
Regresi linier digunakan untuk melihat hubungan antara variabel gaji (wage) dengan beberapa variabel independen seperti pendidikan (educ), pengalaman kerja (exper), dan usia (age).
Regresi linier
reg_model <- lm(wage ~ educ + exper + tenure, data = df)
summary(reg_model)
Call:
lm(formula = wage ~ educ + exper + tenure, data = df)
Residuals:
Min 1Q Median 3Q Max
-7.6068 -1.7747 -0.6279 1.1969 14.6536
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.87273 0.72896 -3.941 9.22e-05 ***
educ 0.59897 0.05128 11.679 < 2e-16 ***
exper 0.02234 0.01206 1.853 0.0645 .
tenure 0.16927 0.02164 7.820 2.93e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.084 on 522 degrees of freedom
Multiple R-squared: 0.3064, Adjusted R-squared: 0.3024
F-statistic: 76.87 on 3 and 522 DF, p-value: < 2.2e-16
Hasil Regresi: Regresi linier menunjukkan bahwa pendidikan, pengalaman kerja, dan usia memiliki pengaruh signifikan terhadap gaji. Koefisien dari model menunjukkan besarnya pengaruh masing-masing variabel independen terhadap gaji. Visualisasi: Plot regresi menunjukkan garis regresi yang memodelkan hubungan antara pendidikan dan gaji.
#Modul 10
Untuk peramalan, kita akan menggunakan data time series. Misalnya, jika dataset memiliki variabel yang mencerminkan data waktu, kita bisa menggunakan metode ARIMA atau ETS.
Namun, dataset wage1 tidak cocok untuk peramalan karena bukan data time series. Oleh karena itu, kita akan memilih dataset yang lebih cocok dari paket “wooldridge” untuk peramalan. Misalnya, kita bisa menggunakan dataset intdef yang berisi data time series.
Muat dataset
data("intdef")
ts_data <- ts(intdef$inf, start = c(1948, 1), frequency = 4)Plot time series
plot(ts_data, main = "Inflasi dari 1948 hingga 1987", xlab = "Tahun", ylab = "Inflasi")Model peramalan menggunakan ARIMA
library(forecast)Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
arima_model <- auto.arima(ts_data)
forecast_values <- forecast(arima_model, h = 20)Plot peramalan
plot(forecast_values, main = "Peramalan Inflasi", xlab = "Tahun", ylab = "Inflasi")Hasil Peramalan: Model ARIMA digunakan untuk memodelkan dan memprediksi nilai inflasi berdasarkan data historical. Plot peramalan menunjukkan nilai-nilai inflasi yang diprediksi untuk beberapa periode ke depan.