Το dataset fundamentals περιλαμβάνει οικονομικά χρηματιστηριο τησ Νέας Υόρκης(New York Stock Exchange), όπως αυτά δημοσιεύονται στις ετήσιες οικονομικές καταστάσεις. Κάθε εγγραφή αναφέρεται σε έναν συγκεκριμένο οργανισμό (μέσω του ticker symbol) και ένα συγκεκριμένο οικονομικό έτος .
Η ΠΗΓΗ του dataset είναι διαθέσιμα στο Kaggle ( https://www.kaggle.com/datasets/dgawlik/nyse?resource=download )
##Διάβασμα dataset
exchange_data <- read.csv("/Users/mariakommata/Downloads/archive/fundamentals.csv")
| Μεταβλητή | Τύπος |
|---|---|
| Ticker Symbol | Χαρακτήρας (σύμβολο μετοχής) |
| Period Ending | Ημερομηνία (τέλος περιόδου) |
| Total Revenue | Αριθμητικό (Σύνολο εσόδων) |
| Net Income | Αριθμητικό (Καθαρό Κέρδος) |
| Earnings Per Share | Αριθμητικό (Κέρδος ανά μετοχή) |
| Profit Margin | Αριθμητικό (Καθαρό Κέρδος / Έσοδα) |
| Total Assets | Αριθμητικό (Σύνολο Ενεργητικού) |
| Total Liabilities | Αριθμητικό (Σύνολο Παθητικού) |
| Total Equity | Αριθμητικό (Ίδια Κεφάλαια) |
| Cash Ratio | Αριθμητικό (Δείκτης Ρευστότητας) |
| Current Ratio | Αριθμητικό (Τρέχων Δείκτης) |
| Quick Ratio | Αριθμητικό (Γρήγορος Δείκτης) |
summary(exchange_data)
## X Ticker.Symbol Period.Ending Accounts.Payable
## Min. : 0 Length:1781 Length:1781 Min. :0.000e+00
## 1st Qu.: 445 Class :character Class :character 1st Qu.:5.160e+08
## Median : 890 Mode :character Mode :character Median :1.334e+09
## Mean : 890 Mean :4.673e+09
## 3rd Qu.:1335 3rd Qu.:3.246e+09
## Max. :1780 Max. :2.069e+11
##
## Accounts.Receivable Add.l.income.expense.items After.Tax.ROE
## Min. :-6.452e+09 Min. :-6.768e+09 Min. : 0.0
## 1st Qu.:-1.040e+08 1st Qu.:-2.638e+06 1st Qu.: 10.0
## Median :-1.830e+07 Median : 2.000e+06 Median : 16.0
## Mean :-6.353e+07 Mean : 6.909e+07 Mean : 43.6
## 3rd Qu.: 7.816e+06 3rd Qu.: 3.359e+07 3rd Qu.: 26.0
## Max. : 2.266e+10 Max. : 1.416e+10 Max. :5789.0
##
## Capital.Expenditures Capital.Surplus Cash.Ratio
## Min. :-3.798e+10 Min. :-7.215e+08 Min. : 0.00
## 1st Qu.:-1.151e+09 1st Qu.: 4.791e+08 1st Qu.: 17.00
## Median :-3.580e+08 Median : 1.997e+09 Median : 41.00
## Mean :-1.252e+09 Mean : 5.352e+09 Mean : 74.46
## 3rd Qu.:-1.291e+08 3rd Qu.: 5.735e+09 3rd Qu.: 90.00
## Max. : 5.000e+06 Max. : 1.083e+11 Max. :1041.00
## NA's :299
## Cash.and.Cash.Equivalents Changes.in.Inventories Common.Stocks
## Min. :2.100e+04 Min. :-5.562e+09 Min. :0.000e+00
## 1st Qu.:3.088e+08 1st Qu.:-5.400e+07 1st Qu.:1.628e+06
## Median :8.626e+08 Median : 0.000e+00 Median :7.725e+06
## Mean :8.521e+09 Mean :-6.788e+07 Mean :1.609e+09
## 3rd Qu.:2.310e+09 3rd Qu.: 0.000e+00 3rd Qu.:2.970e+08
## Max. :7.281e+11 Max. : 3.755e+09 Max. :1.581e+11
##
## Cost.of.Revenue Current.Ratio Deferred.Asset.Charges
## Min. :0.000e+00 Min. : 17.0 Min. :0.000e+00
## 1st Qu.:1.194e+09 1st Qu.: 109.0 1st Qu.:0.000e+00
## Median :3.685e+09 Median : 152.0 Median :0.000e+00
## Mean :1.235e+10 Mean : 186.8 Mean :5.908e+08
## 3rd Qu.:9.801e+09 3rd Qu.: 226.0 3rd Qu.:1.471e+08
## Max. :3.651e+11 Max. :1197.0 Max. :3.686e+10
## NA's :299
## Deferred.Liability.Charges Depreciation
## Min. :0.000e+00 Min. :-4.480e+08
## 1st Qu.:0.000e+00 1st Qu.: 1.799e+08
## Median :2.060e+08 Median : 4.280e+08
## Mean :1.611e+09 Mean : 1.084e+09
## 3rd Qu.:1.083e+09 3rd Qu.: 1.047e+09
## Max. :5.618e+10 Max. : 2.952e+10
##
## Earnings.Before.Interest.and.Tax Earnings.Before.Tax Effect.of.Exchange.Rate
## Min. :-2.793e+10 Min. :-2.823e+10 Min. :-3.067e+09
## 1st Qu.: 5.852e+08 1st Qu.: 4.900e+08 1st Qu.:-2.000e+07
## Median : 1.139e+09 Median : 9.601e+08 Median :-6.000e+05
## Mean : 2.710e+09 Mean : 2.375e+09 Mean :-3.849e+07
## 3rd Qu.: 2.586e+09 3rd Qu.: 2.255e+09 3rd Qu.: 0.000e+00
## Max. : 7.905e+10 Max. : 7.873e+10 Max. : 1.160e+09
##
## Equity.Earnings.Loss.Unconsolidated.Subsidiary Fixed.Assets
## Min. :-1.633e+09 Min. :0.000e+00
## 1st Qu.: 0.000e+00 1st Qu.:5.920e+08
## Median : 0.000e+00 Median :2.089e+09
## Mean : 9.134e+07 Mean :8.534e+09
## 3rd Qu.: 0.000e+00 3rd Qu.:9.231e+09
## Max. : 1.501e+10 Max. :2.527e+11
##
## Goodwill Gross.Margin Gross.Profit Income.Tax
## Min. :0.000e+00 Min. : 0.00 Min. :-1.265e+10 Min. :-8.013e+09
## 1st Qu.:1.222e+08 1st Qu.: 29.00 1st Qu.: 1.582e+09 1st Qu.: 1.030e+08
## Median :1.260e+09 Median : 43.00 Median : 2.991e+09 Median : 2.689e+08
## Mean :3.930e+09 Mean : 46.76 Mean : 7.189e+09 Mean : 6.694e+08
## 3rd Qu.:4.091e+09 3rd Qu.: 64.00 3rd Qu.: 6.944e+09 3rd Qu.: 6.264e+08
## Max. :1.046e+11 Max. :100.00 Max. : 1.495e+11 Max. : 3.104e+10
##
## Intangible.Assets Interest.Expense Inventory
## Min. :0.000e+00 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:3.005e+07 1st Qu.:0.000e+00
## Median :3.180e+08 Median :1.223e+08 Median :3.804e+08
## Mean :1.964e+09 Mean :3.263e+08 Mean :1.467e+09
## 3rd Qu.:1.474e+09 3rd Qu.:3.200e+08 3rd Qu.:1.467e+09
## Max. :1.207e+11 Max. :2.061e+10 Max. :4.726e+10
##
## Investments Liabilities Long.Term.Debt
## Min. :-1.651e+11 Min. :-4.017e+10 Min. :0.000e+00
## 1st Qu.:-2.150e+08 1st Qu.:-5.484e+07 1st Qu.:1.107e+09
## Median :-9.700e+04 Median : 2.700e+07 Median :3.346e+09
## Mean :-9.679e+08 Mean : 1.790e+08 Mean :8.477e+09
## 3rd Qu.: 9.000e+06 3rd Qu.: 1.777e+08 3rd Qu.:7.781e+09
## Max. : 3.835e+10 Max. : 3.710e+10 Max. :4.292e+11
##
## Long.Term.Investments Minority.Interest Misc..Stocks
## Min. :0.000e+00 Min. :-1.050e+08 Min. :-151000000
## 1st Qu.:0.000e+00 1st Qu.: 0.000e+00 1st Qu.: 0
## Median :9.260e+07 Median : 1.000e+06 Median : 0
## Mean :2.321e+10 Mean : 4.167e+08 Mean : 42436180
## 3rd Qu.:1.488e+09 3rd Qu.: 8.500e+07 3rd Qu.: 0
## Max. :1.652e+12 Max. : 6.319e+10 Max. :3713000000
##
## Net.Borrowings Net.Cash.Flow Net.Cash.Flow.Operating
## Min. :-9.909e+10 Min. :-4.293e+10 Min. :-1.606e+10
## 1st Qu.:-7.340e+07 1st Qu.:-1.550e+08 1st Qu.: 6.642e+08
## Median : 1.063e+08 Median : 1.000e+07 Median : 1.237e+09
## Mean : 5.155e+08 Mean : 5.273e+07 Mean : 3.258e+09
## 3rd Qu.: 7.810e+08 3rd Qu.: 2.457e+08 3rd Qu.: 3.049e+09
## Max. : 4.971e+10 Max. : 5.044e+10 Max. : 1.080e+11
##
## Net.Cash.Flows.Financing Net.Cash.Flows.Investing Net.Income
## Min. :-1.875e+11 Min. :-1.656e+11 Min. :-2.353e+10
## 1st Qu.:-1.092e+09 1st Qu.:-2.296e+09 1st Qu.: 3.528e+08
## Median :-3.541e+08 Median :-7.568e+08 Median : 6.861e+08
## Mean :-4.576e+08 Mean :-2.718e+09 Mean : 1.706e+09
## 3rd Qu.: 1.279e+08 3rd Qu.:-2.560e+08 3rd Qu.: 1.697e+09
## Max. : 1.182e+11 Max. : 1.070e+11 Max. : 5.339e+10
##
## Net.Income.Adjustments Net.Income.Applicable.to.Common.Shareholders
## Min. :-5.810e+10 Min. :-2.312e+10
## 1st Qu.:-7.200e+06 1st Qu.: 3.512e+08
## Median : 8.895e+07 Median : 6.820e+08
## Mean : 2.198e+08 Mean : 1.688e+09
## 3rd Qu.: 3.431e+08 3rd Qu.: 1.679e+09
## Max. : 1.722e+10 Max. : 5.339e+10
##
## Net.Income.Cont..Operations Net.Receivables Non.Recurring.Items
## Min. :-2.276e+10 Min. :0.000e+00 Min. :-2.524e+09
## 1st Qu.: 3.534e+08 1st Qu.:4.336e+08 1st Qu.: 0.000e+00
## Median : 6.851e+08 Median :1.083e+09 Median : 0.000e+00
## Mean : 1.748e+09 Mean :3.242e+09 Mean : 2.185e+08
## 3rd Qu.: 1.673e+09 3rd Qu.:2.383e+09 3rd Qu.: 5.000e+07
## Max. : 5.989e+10 Max. :9.282e+10 Max. : 2.090e+10
##
## Operating.Income Operating.Margin Other.Assets Other.Current.Assets
## Min. :-2.791e+10 Min. : 0.00 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.: 5.259e+08 1st Qu.: 9.00 1st Qu.:1.070e+08 1st Qu.:5.034e+07
## Median : 1.021e+09 Median : 15.00 Median :4.110e+08 Median :1.837e+08
## Mean : 2.269e+09 Mean : 18.18 Mean :4.860e+09 Mean :6.071e+08
## 3rd Qu.: 2.260e+09 3rd Qu.: 23.00 3rd Qu.:1.385e+09 3rd Qu.:5.480e+08
## Max. : 7.123e+10 Max. :437.00 Max. :3.256e+11 Max. :3.509e+10
##
## Other.Current.Liabilities Other.Equity Other.Financing.Activities
## Min. :0.000e+00 Min. :-2.961e+10 Min. :-9.504e+10
## 1st Qu.:0.000e+00 1st Qu.:-5.522e+08 1st Qu.:-1.900e+07
## Median :1.287e+08 Median :-9.500e+07 Median : 0.000e+00
## Mean :1.501e+10 Mean :-6.208e+08 Mean : 4.844e+08
## 3rd Qu.:8.710e+08 3rd Qu.: 0.000e+00 3rd Qu.: 0.000e+00
## Max. :1.363e+12 Max. : 3.678e+10 Max. : 8.964e+10
##
## Other.Investing.Activities Other.Liabilities Other.Operating.Activities
## Min. :-5.672e+10 Min. :0.000e+00 Min. :-3.367e+10
## 1st Qu.:-2.530e+08 1st Qu.:1.790e+08 1st Qu.:-8.400e+07
## Median :-1.400e+07 Median :6.960e+08 Median :-8.959e+06
## Mean :-4.054e+08 Mean :9.077e+09 Mean : 7.145e+06
## 3rd Qu.: 5.000e+07 3rd Qu.:2.587e+09 3rd Qu.: 2.650e+07
## Max. : 1.160e+10 Max. :7.662e+11 Max. : 8.751e+10
##
## Other.Operating.Items Pre.Tax.Margin Pre.Tax.ROE Profit.Margin
## Min. :-8.716e+07 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000e+00 1st Qu.: 8.00 1st Qu.: 13.00 1st Qu.: 6.00
## Median : 7.173e+07 Median : 14.00 Median : 22.00 Median : 10.00
## Mean : 8.688e+08 Mean : 17.75 Mean : 59.65 Mean : 13.96
## 3rd Qu.: 6.080e+08 3rd Qu.: 22.00 3rd Qu.: 36.00 3rd Qu.: 17.00
## Max. : 5.487e+10 Max. :442.00 Max. :9089.00 Max. :369.00
##
## Quick.Ratio Research.and.Development Retained.Earnings
## Min. : 10.00 Min. :0.000e+00 Min. :-1.990e+10
## 1st Qu.: 77.25 1st Qu.:0.000e+00 1st Qu.: 1.100e+09
## Median : 115.00 Median :0.000e+00 Median : 3.337e+09
## Mean : 146.95 Mean :3.503e+08 Mean : 9.207e+09
## 3rd Qu.: 180.00 3rd Qu.:6.541e+07 3rd Qu.: 9.012e+09
## Max. :1197.00 Max. :1.274e+10 Max. : 4.124e+11
## NA's :299
## Sale.and.Purchase.of.Stock Sales..General.and.Admin.
## Min. :-5.885e+10 Min. :-4.870e+08
## 1st Qu.:-7.495e+08 1st Qu.: 5.598e+08
## Median :-2.102e+08 Median : 1.338e+09
## Mean :-7.652e+08 Mean : 3.981e+09
## 3rd Qu.: 2.385e+06 3rd Qu.: 3.430e+09
## Max. : 5.410e+09 Max. : 9.704e+10
##
## Short.Term.Debt...Current.Portion.of.Long.Term.Debt Short.Term.Investments
## Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:4.278e+06 1st Qu.:0.000e+00
## Median :2.131e+08 Median :0.000e+00
## Mean :3.055e+09 Mean :1.124e+09
## 3rd Qu.:9.560e+08 3rd Qu.:2.550e+08
## Max. :3.240e+11 Max. :1.067e+11
##
## Total.Assets Total.Current.Assets Total.Current.Liabilities
## Min. :2.705e+06 Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:6.553e+09 1st Qu.:1.044e+09 1st Qu.:5.641e+08
## Median :1.517e+10 Median :2.747e+09 Median :1.702e+09
## Mean :5.571e+10 Mean :6.726e+09 Mean :4.700e+09
## 3rd Qu.:3.600e+10 3rd Qu.:6.162e+09 3rd Qu.:4.381e+09
## Max. :2.572e+12 Max. :1.397e+11 Max. :9.028e+10
##
## Total.Equity Total.Liabilities Total.Liabilities...Equity
## Min. :-1.324e+10 Min. :2.577e+06 Min. :2.705e+06
## 1st Qu.: 2.201e+09 1st Qu.:3.843e+09 1st Qu.:6.553e+09
## Median : 4.983e+09 Median :9.141e+09 Median :1.517e+10
## Mean : 1.189e+10 Mean :4.380e+10 Mean :5.569e+10
## 3rd Qu.: 1.081e+10 3rd Qu.:2.390e+10 3rd Qu.:3.600e+10
## Max. : 2.562e+11 Max. :2.341e+12 Max. :2.572e+12
##
## Total.Revenue Treasury.Stock For.Year Earnings.Per.Share
## Min. :1.514e+06 Min. :-2.297e+11 Min. :1215 Min. :-61.200
## 1st Qu.:3.714e+09 1st Qu.:-3.041e+09 1st Qu.:2013 1st Qu.: 1.590
## Median :8.023e+09 Median :-3.068e+08 Median :2014 Median : 2.810
## Mean :2.029e+10 Mean :-3.952e+09 Mean :2013 Mean : 3.354
## 3rd Qu.:1.749e+10 3rd Qu.: 0.000e+00 3rd Qu.:2015 3rd Qu.: 4.590
## Max. :4.857e+11 Max. : 0.000e+00 Max. :2016 Max. : 50.090
## NA's :173 NA's :219
## Estimated.Shares.Outstanding
## Min. :-1.514e+09
## 1st Qu.: 1.493e+08
## Median : 2.929e+08
## Mean : 6.024e+08
## 3rd Qu.: 5.492e+08
## Max. : 1.611e+10
## NA's :219
library(ggplot2)
library(dplyr)
# Φιλτράρουμε ακραίες τιμές για καθαρότερο γράφημα
clean_df <- exchange_data %>%
filter(
`Total.Revenue` < 1e11,
`Net.Income` < 5e10,
!is.na(`Total.Revenue`),
!is.na(`Net.Income`)
)
# Δημιουργία scatter plot με γραμμή παλινδρόμησης
ggplot(clean_df, aes(x = `Total.Revenue`, y = `Net.Income`)) +
geom_point(alpha = 0.5, color = "steelblue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(
title = "Σχέση Εσόδων και Καθαρού Κέρδους",
x = "Σύνολο Εσόδων",
y = "Καθαρό Κέρδος"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
##Scatterplot Total Revenue vs Net Income Βλέπουμε σαφή θετική γραμμική
συσχέτιση: οι εταιρείες με περισσότερα έσοδα έχουν γενικά και υψηλότερα
καθαρά κέρδη. Η κόκκινη γραμμή παλινδρόμησης δείχνει αυτή τη σχέση.
Παρόλα αυτά, υπάρχουν αρκετές διακυμάνσεις, δείχνοντας ότι τα έσοδα δεν
αρκούν από μόνα τους για την πρόβλεψη των καθαρών κερδών.
# Προσθήκη κατηγορίας βάσει τεταρτημορίων εσόδων
clean_df <- clean_df %>%
filter(!is.na(`Earnings.Per.Share`)) %>%
mutate(Revenue_Quartile = ntile(`Total.Revenue`, 4)) %>%
mutate(Revenue_Quartile = factor(Revenue_Quartile,
levels = 1:4,
labels = c("Χαμηλά", "Μέτρια", "Υψηλά", "Πολύ Υψηλά")))
# Boxplot του EPS ανά κατηγορία εσόδων
ggplot(clean_df, aes(x = Revenue_Quartile, y = `Earnings.Per.Share`, fill = Revenue_Quartile)) +
geom_boxplot() +
labs(
title = "EPS κατά Κατηγορία Εσόδων",
x = "Επίπεδο Εσόδων",
y = "Earnings Per Share"
) +
theme_minimal() +
theme(legend.position = "none")
##Boxplot Earnings Per Share κατά Κατηγορία Εσόδων Οι εταιρείες
χωρίστηκαν σε 4 κατηγορίες εσόδων (τεταρτημόρια). Το Earnings Per Share
είναι πιο ασταθές στα χαμηλά έσοδα, ενώ στα υψηλά επίπεδα δείχνει
σταθερότητα και μεγαλύτερες τιμές. Παρατηρούνται αρνητικά EPS στα
χαμηλότερα επίπεδα εσόδων, δείγμα ζημιών.
# Φιλτράρισμα και histogram
clean_df <- exchange_data %>%
filter(`Profit.Margin` > -200, `Profit.Margin` < 200, !is.na(`Profit.Margin`))
ggplot(clean_df, aes(x = `Profit.Margin`)) +
geom_histogram(bins = 40, fill = "darkgreen", color = "white") +
geom_density(aes(y = ..count..), color = "black", linewidth = 1) +
labs(
title = "Κατανομή του Profit Margin",
x = "Profit Margin (%)",
y = "Αριθμός Παρατηρήσεων"
) +
theme_minimal()
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Histogram Κατανομή Profit Margin Η κατανομή του δείκτη καθαρού
περιθωρίου κέρδους είναι ελαφρώς ασύμμετρη. Οι περισσότερες εταιρείες
έχουν Profit Margin μεταξύ 0% και 30%, ενώ υπάρχουν και αρνητικές τιμές,
δηλαδή ζημιογόνες χρήσεις. Η μορφή του histogram υποδηλώνει πως τα κέρδη
είναι συγκεντρωμένα κοντά στον μέσο όρο, αλλά υπάρχουν αρκετά
outliers.
# Μοντέλο 1: Net Income ~ Total Revenue
m1 <- lm(`Net.Income` ~ `Total.Revenue`, data = exchange_data)
summary(m1)
##
## Call:
## lm(formula = Net.Income ~ Total.Revenue, data = exchange_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.433e+10 -4.740e+08 -1.977e+08 2.272e+08 3.782e+10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.872e+08 7.878e+07 4.916 9.67e-07 ***
## Total.Revenue 6.497e-02 1.724e-03 37.684 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.979e+09 on 1779 degrees of freedom
## Multiple R-squared: 0.4439, Adjusted R-squared: 0.4436
## F-statistic: 1420 on 1 and 1779 DF, p-value: < 2.2e-16
tidy(m1)
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.87e+8 7.88e+7 4.92 9.67e- 7
## 2 Total.Revenue 6.50e-2 1.72e-3 37.7 5.85e-229
SSE1 <- sum(m1$residuals^2)
m2 <- lm(`Net.Income` ~ `Total.Revenue` + `Earnings.Per.Share`, data = exchange_data)
summary(m2)
##
## Call:
## lm(formula = Net.Income ~ Total.Revenue + Earnings.Per.Share,
## data = exchange_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.620e+10 -5.224e+08 -6.619e+07 2.630e+08 3.699e+10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.056e+08 9.515e+07 -3.212 0.00134 **
## Total.Revenue 6.257e-02 1.753e-03 35.691 < 2e-16 ***
## Earnings.Per.Share 2.247e+08 1.584e+07 14.182 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.912e+09 on 1559 degrees of freedom
## (219 observations deleted due to missingness)
## Multiple R-squared: 0.5131, Adjusted R-squared: 0.5125
## F-statistic: 821.5 on 2 and 1559 DF, p-value: < 2.2e-16
tidy(m2)
## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -3.06e+8 9.52e+7 -3.21 1.34e- 3
## 2 Total.Revenue 6.26e-2 1.75e-3 35.7 1.96e-204
## 3 Earnings.Per.Share 2.25e+8 1.58e+7 14.2 4.98e- 43
m3 <- lm(`Net.Income` ~ `Total.Revenue` + `Earnings.Per.Share` + `Profit.Margin`, data = exchange_data)
summary(m3)
##
## Call:
## lm(formula = Net.Income ~ Total.Revenue + Earnings.Per.Share +
## Profit.Margin, data = exchange_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.633e+10 -4.769e+08 -7.240e+06 3.077e+08 3.651e+10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.817e+08 1.249e+08 -5.457 5.62e-08 ***
## Total.Revenue 6.330e-02 1.749e-03 36.191 < 2e-16 ***
## Earnings.Per.Share 2.461e+08 1.642e+07 14.993 < 2e-16 ***
## Profit.Margin 2.121e+07 4.606e+06 4.606 4.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.894e+09 on 1558 degrees of freedom
## (219 observations deleted due to missingness)
## Multiple R-squared: 0.5197, Adjusted R-squared: 0.5187
## F-statistic: 561.9 on 3 and 1558 DF, p-value: < 2.2e-16
tidy(m3)
## # A tibble: 4 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -6.82e+8 1.25e+8 -5.46 5.62e- 8
## 2 Total.Revenue 6.33e-2 1.75e-3 36.2 1.13e-208
## 3 Earnings.Per.Share 2.46e+8 1.64e+7 15.0 1.43e- 47
## 4 Profit.Margin 2.12e+7 4.61e+6 4.61 4.45e- 6
m4 <- lm(`Net.Income` ~ `Total.Revenue` + `Earnings.Per.Share` + `Profit.Margin` + `Total.Assets`, data = exchange_data)
summary(m4)
##
## Call:
## lm(formula = Net.Income ~ Total.Revenue + Earnings.Per.Share +
## Profit.Margin + Total.Assets, data = exchange_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.457e+10 -3.677e+08 7.071e+07 3.769e+08 3.706e+10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.723e+08 1.157e+08 -6.678 3.36e-11 ***
## Total.Revenue 5.486e-02 1.699e-03 32.296 < 2e-16 ***
## Earnings.Per.Share 2.461e+08 1.518e+07 16.210 < 2e-16 ***
## Profit.Margin 1.741e+07 4.266e+06 4.080 4.72e-05 ***
## Total.Assets 5.532e-03 3.398e-04 16.279 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.676e+09 on 1557 degrees of freedom
## (219 observations deleted due to missingness)
## Multiple R-squared: 0.5895, Adjusted R-squared: 0.5885
## F-statistic: 559 on 4 and 1557 DF, p-value: < 2.2e-16
tidy(m4)
## # A tibble: 5 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -7.72e+8 1.16e+8 -6.68 3.36e- 11
## 2 Total.Revenue 5.49e-2 1.70e-3 32.3 1.37e-175
## 3 Earnings.Per.Share 2.46e+8 1.52e+7 16.2 9.95e- 55
## 4 Profit.Margin 1.74e+7 4.27e+6 4.08 4.72e- 5
## 5 Total.Assets 5.53e-3 3.40e-4 16.3 3.83e- 55
m5 <- lm(`Net.Income` ~ `Total.Revenue` + `Earnings.Per.Share` + `Profit.Margin` + `Total.Assets` + `Total.Liabilities`, data = exchange_data)
summary(m5)
##
## Call:
## lm(formula = Net.Income ~ Total.Revenue + Earnings.Per.Share +
## Profit.Margin + Total.Assets + Total.Liabilities, data = exchange_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.698e+10 -4.332e+08 7.572e+07 4.536e+08 3.207e+10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.940e+08 9.778e+07 -8.121 9.35e-16 ***
## Total.Revenue 2.666e-02 1.827e-03 14.591 < 2e-16 ***
## Earnings.Per.Share 2.237e+08 1.286e+07 17.392 < 2e-16 ***
## Profit.Margin 2.524e+06 3.655e+06 0.690 0.49
## Total.Assets 1.252e-01 4.805e-03 26.056 < 2e-16 ***
## Total.Liabilities -1.321e-01 5.297e-03 -24.949 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.262e+09 on 1556 degrees of freedom
## (219 observations deleted due to missingness)
## Multiple R-squared: 0.7068, Adjusted R-squared: 0.7059
## F-statistic: 750.2 on 5 and 1556 DF, p-value: < 2.2e-16
tidy(m5)
## # A tibble: 6 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -7.94e+8 9.78e+7 -8.12 9.35e- 16
## 2 Total.Revenue 2.67e-2 1.83e-3 14.6 2.72e- 45
## 3 Earnings.Per.Share 2.24e+8 1.29e+7 17.4 4.74e- 62
## 4 Profit.Margin 2.52e+6 3.66e+6 0.690 4.90e- 1
## 5 Total.Assets 1.25e-1 4.80e-3 26.1 1.67e-124
## 6 Total.Liabilities -1.32e-1 5.30e-3 -24.9 7.57e-116
mfull <- lm(`Net.Income` ~ ., data =exchange_data)
# Υπολογισμός R² και SSE
get_r2_sse <- function(model) {
c(R_squared = round(summary(model)$r.squared, 4),
SSE = round(deviance(model), 2))
}
# Πίνακας σύνοψης
model_summary <- data.frame(
Μεταβλητές = c(
"Total Revenue",
"+ Earnings Per Share",
"+ Profit Margin",
"+ Total Assets",
"+ Total Liabilities",
"All selected variables"
),
t(sapply(list(m1, m2, m3, m4, m5, mfull), get_r2_sse))
)
# Εμφάνιση πίνακα
kable(model_summary, caption = " Σύγκριση Μοντέλων Παλινδρόμησης για το Net Income")
| Μεταβλητές | R_squared | SSE |
|---|---|---|
| Total Revenue | 0.4439 | 1.578554e+22 |
| + Earnings Per Share | 0.5131 | 1.322323e+22 |
| + Profit Margin | 0.5197 | 1.304561e+22 |
| + Total Assets | 0.5895 | 1.114823e+22 |
| + Total Liabilities | 0.7068 | 7.962778e+21 |
| All selected variables | 0.9999 | 1.305225e+18 |
##ΣΥΜΠΕΡΑΣΜΑΤΑ Οι μεταβλητές που σχετίζονται με απόδοση (όπως EPS και Profit Margin) έχουν ισχυρότερη σχέση με το Net Income από ό,τι τα στοιχεία του ισολογισμού. Το R² αυξάνεται σταδιακά αλλά με φθίνουσα απόδοση όσο προσθέτουμε μεταβλητές (diminishing returns). Υπάρχει κίνδυνος υπερπροσαρμογής (overfitting) αν προστεθούν πάρα πολλές μεταβλητές χωρίς σαφή οικονομική λογική.