\[Fakulteti\ i\ Shkencave\ te\ Natyres\\ Qershor\ 2024\]
\[RStudio\]
Cfare eshte RStudio?
R eshte nje gjuhe programimi e specializuar ne llogaritje statistikore dhe vizualizimin e te dhenave, e miratuar nga fushat e nxjerrjes se te dhenave, bioinformatikes dhe analizes se te dhenave.
\(RStudio\) eshte nje mjedis zhvillimi i integruar per gjuhen \(R\)
Kjo platforme eshte e disponueshme ne dy formate: \(RStudio\) \(Desktop\) i cila eshte nje aplikacion i rregullt desktop dhe \(Rstudio\) \(Server\) i cili funksionon ne nje server te madh dhe lejon hyrjen ne \(RStudio\) duke perdorur nje shfletues web. \(RStudio\) \(IDE\) eshte nje produkt i \(Posit\) \(PBC\) (dikur \(RStudio\) \(PBC\), me pare \(RStudio\) \(Inc.\)).
RStudio perfshin:
• nje mjet efektiv per trajtimin dhe ruajtjen e te dhenave;
• nje grup operatoresh per llogaritjet ne vargje, ne vecanti matrica;
• nje koleksion te madh, koherent dhe te integruar te mjeteve te ndermjetme per analizen e te dhenave;
• pajisje grafike per analizen dhe shfaqjen e te dhenave ne ekran ose ne kopje fizike;
• nje gjuhe programimi te zhvilluar mire, e thjeshte dhe efektive, e cila perfshin kushte, funksione rekursive te percaktuara nga perdoruesi dhe pajisje hyrese-dalese;
\[Objektivat:\]
Ne kete projekt kemi si synim:
~ Te shqyrtojme ne menyre analitike, statistikore dhe grafike te dhenat e datasetit ne gjuhen \(R\);
~ Te jemi te afte te lexojme dhe interpretojme kodet e perdorura dhe te shpjegojme qellimin e perdorimit te tyre;
~ Te analizojme cdo grafik te ndertuar;
~ Te evidentojme dhe perdorim te gjithe funksionet dhe librarite e ndryshme te mesuara ne seancat laboratorike dhe me gjere;
\[Student\ Spending\ Dataset\]
Dataseti i zgjedhur titullohet \(Student\) \(Spending\) dhe ne te listohen te ardhurat dhe shpenzimet e 1000 studenteve te zgjedhur ne nje zgjedhje te rastit, ku ne thelbin e studimit qendron fakti se ne varesi te moshes, formimit arsimor dhe gjinise, sa fiton, sa shpenzon dhe si i shpenzon nje student te ardhurat mujore te tij. Nje rendesi e vecante i eshte kushtuar edhe menyres sesi parate transaksionohen, duke na lene te kuptojme se si nje student preferon te paguaje dhe paguhet gjate veprimtarise se tij te perditshme.
CREDITS: \(Student\) \(Spending\) \(Dataset\) eshte shkarkuar ne \(kaggle.com/datasets\), duke qene brenda rregullave dhe kritereve me te cilat web-faqja \(kaggle.com\) funksionon!
Le te njihemi me datasetin \(Student\) \(Spending\):
Ky dataset perbehet nga variabla dhe vlera si me poshte:
Fillimisht na duhet te importojme datasetin nga file explorer dhe per ta bere kete na duhet libraria \(readxl\) e cila do te lexoje skedarin excel dhe me pas perdorim funksionin \(setwd()\) i cili do te marre si parameter path-in e file qe duam te importojme:
library(readxl)
Warning: package ‘readxl’ was built under R version 4.3.3
setwd("C:/Users/Perdorues/Downloads")
Warning: The working directory was changed to C:/Users/Perdorues/Downloads inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
getwd()
[1] "C:/Users/Perdorues/Downloads"
data = read.csv("student_spending.csv")
data
\(Student\) \(Spending\) permban 17 lloje te ndryshme ndryshoresh, ku seciles ndryshore i korrespondojne 1000 te dhena;
Dataseti ka gjithsej 4 \(ndryshore\) \(cilesore\) dhe 13 \(ndryshore\) \(sasiore\);
\(Ndryshoret\) \(Cilesore:\)
Gjinia;
Viti i Studimeve;
Diplomimi;
Metoda e Pageses;
\(Ndryshoret\) \(Sasiore:\)
Mosha
Te ardhurat mujore
Ndihma financiare
Shkollimi
Strehimi
Ushqimi
Transporti
Mjetet shkollore
Argetim
Kujdesi personal
Teknologji
Mireqenia shendetesore
Te ndryshme/ Te tjera
Nje \(histogram\) eshte nje paraqitje vizuale e shperndarjes se te dhenave sasiore. Ata japin nje kuptim te perafert te densitetit dhe shpesh here vleren e sakte te tij.
Me poshte do te ndertojme nje histogram per variablin tone sasior \(mosha\). Do te na duhen 2 librari kryesore: \(ggplot2\) e cila sherben per ndertimin e grafikeve dhe libraria \(scales\) e cila do te na sherbeje per kostumizimin e grafikut.
Per te paraqitur grafikisht nje ndryshore sasiore te vazhdueshme ndertohet nje histogram.
Per kete qellim intervali I vlerave te mundshme te tiparit ndahet ne nje numer intervalesh ose klasash.
Histogrami paraqet se sa eshte numri i studenteve per secilen grupmoshe duke filluar nga 18 vjec deri ne 25. Nga lartesia e kolonave veme re se pjesen me te madhe e ze mosha 25 vjec (moda) dhe me te voglen mosha 19 vjec.
library(ggplot2)
Warning: package ‘ggplot2’ was built under R version 4.3.3
library(scales)
Warning: package ‘scales’ was built under R version 4.3.3
g1 <- hist(data$mosha, breaks=seq(17.5, 25.5, by=1), density = 40, angle=60, xlab="Grupmoshat e studenteve", ylab="Efektivat", main="Histogrami i Moshes", las=1, col="purple4", fill = "purple4",
ylim=c(0,200),xlim=c(18,25), cex.main=1.4, cex.lab=1.2, axt='n')
Warning in plot.window(xlim, ylim, "", ...) :
"fill" is not a graphical parameter
Warning in plot.window(xlim, ylim, "", ...) :
"axt" is not a graphical parameter
Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
"fill" is not a graphical parameter
Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) :
"axt" is not a graphical parameter
Warning in axis(1, ...) : "fill" is not a graphical parameter
Warning in axis(1, ...) : "axt" is not a graphical parameter
Warning in axis(2, at = yt, ...) : "fill" is not a graphical parameter
Warning in axis(2, at = yt, ...) : "axt" is not a graphical parameter
#kustomizojme grafikun me disa vija te pjerreta 60 degrees
axis(1, at=18:25, labels=18:25)
text(g1$mids, g1$counts, labels=ifelse(g1$counts != 0, g1$counts, ""), pos=3, cex=0.8)
Per te ndertuar histogramin na sherben libraria \(ggplot2\) dhe \(scales\).
Me ane te te funksionit \(hist()\) percaktojme llojin e grafikut qe duam te ndertojme, ne kete rast histogram.
\(data\)$\(mosha\): therrasim te dhenat te cilat do te
paraqesim ne histogram.
\(breaks=seq(17.5, 25.5, by=1)\) percakton kufijte e histogramit, qe variojne nga 17.5 ne 25.5, me hap 1.
\(density=40\): denduria e vijave brenda shtyllave te histogramit;
\(angle=60\): vendosim nje kend per vijat brenda histogramit;
\(xlab\), \(ylab\) dhe \(main\): percaktojne emertimin e boshteve x, y dhe titullin e histogramit;
\(las=1\) pozicionon emertimet e boshteve;
Vendosim ngjyrat me ane te funksionit \(col=""\);
\(ylim=c(0,200)\) dhe \(xlim=c(18, 25)\) vendos kufijte e vlerave te histogramit;
\(cex.main=1.4\) dhe \(cex.lab=1.2\) rrit madhesine e shkrimit te titullit kryesore dhe emertimeve te boshteve;
\(axt=n\) heq emertimet e meparshme te boshtit x;
\(axis(1, at=18:25, labels=18:25)\) shton shenjat dhe emertimet e bushtit x nga 18 ne 25;
\(text(g1\)$\(mids\), \(g1\)$\(counts\), \(labels=ifelse\)(\(g1\)$\(counts\) \(!=0\), \(g1\)$\(counts\), “”), \(pos=3\), \(cex=0.8)\) shton vlerat numerike siper cdo
shtylle, por vetem ne rastet kur vlerat jane te ndryshme nga zero.
Parametri \(pos=3\) pozicionon vlerat
ndersa \(cex=0.8\) percakton madhesine
e shkrimit.
Interpretimi i histogramit eshte relativisht i thjeshte. Shohim qe shperndarja e studenteve eshte e tille: 124 individe kane moshen 18 vjecare, 108 jane 19 vjecare, 111 jane 20 vjecare, 118 jane 21 vjecare, 130 jane 22 vjecare, 128 kane moshen 23 vjec, 136 jane 24 vjecare dhe 145 jane 25 vjecare. Ajo qe vihet re tjeter eshte edhe se cila nga grupmoshat eshte me e shpeshte: eshte ajo 25 vjecare me 145 individe, ndersa ajo me pak eshte grupmosha 19 vjecare me 108 individe.
library(ggplot2)
library(gridExtra)
Warning: package ‘gridExtra’ was built under R version 4.3.3
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
violinplot1 <- ggplot(data, aes(x = "", y=data$te_ardhurat_mujore)) +
geom_violin(fill="dodgerblue2", color="black") +
labs(title = "Violin PLot per te Ardhurat Mujore", x="Te Ardhurat Mujore", y="Shperndarja")
violinplot2 <- ggplot(data, aes(x = "", y=data$ndihma_financiare)) +
geom_violin(fill="dodgerblue2", color="black") +
labs(title = "Violin PLot per Ndihmen Financiare", x="Ndihma Financiare", y="Shperndarja")
grid.arrange(violinplot1, violinplot2, ncol=2)
Warning: Use of `data$te_ardhurat_mujore` is discouraged.
ℹ Use `te_ardhurat_mujore` instead.
Warning: Use of `data$ndihma_financiare` is discouraged.
ℹ Use `ndihma_financiare` instead.
Forma e grafikut violine tregon shperndarjen e te ardhurave mujore dhe te ndihmes financiare. Ne pjeset ku grafiku eshte me i gjere do te thote se ka me shume te dhena rreth atyre vlerave. Kur eshte me i ngushte atehere ka me pak te dhena. Vlera mesatare tregohet nga brezi me i gjere qe ndodhet ne qender. Nje shperndarje me te madhe kane vlerat qe jane teorikisht midis 1250-1500, pasi aty dallohet nje brez me i gjere. Ndersa ne rastin e ndryshores tjeter vlerat kane nje shperndarje me te larte ne intervalin 500-600. Pra ,te ardhurat mujore mesatare sipas grafikut jane afersisht $1020.65 ,ndersa ndihma financiare mesatare eshte afersisht $504
Nje \(box\) \(plot\) eshte nje lloj grafiku statistikor i cili ofron nje permbledhje vizuale te shperndarjes se nje grupi te dhenash. Ai shfaq informacionin kryesor te meposhtem:
1. Mesoren: Vlera mesore perfaqesohet nga vija horizontale brenda kutise;
2. Intervali Nderkuartilor: kutia perfaqeson 50% te vlerave te mesit te te dhenave, fundi I saj tregon 25% te te dhenave dhe pjesa e siperme tregon 75% te tyre. IQR eshte diferenca Q3 – Q1.
3. Vlerat maksimale dhe minimale: jane ato vija vertikale qe dalin nga kutia.
library(ggplot2)
ggplot(data)+aes(x=mosha, y=gjinia, fill=viti_i_studimeve, colour = te_ardhurat_mujore, size=ndihma_financiare)+
geom_boxplot()+ scale_fill_hue(direction=2) + scale_color_gradient() +
labs(x="Mosha", y="Gjinia", fill="Viti i Studimeve")+
theme_classic()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Warning: The following aesthetics were dropped during statistical transformation: colour and size.
ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
Ne kete grafik eshte paraqitur nje boxplot i cili tregon lidhjen midis gjinise dhe vitit te studimit te studenteve. Per gjinine femerore jane paraqitur 4 box-plote per kategorine e Freshman, Junior, Senior, Sophomore. Nga box-plotet arrijme te dallojme qe vlera min per te katert eshte 18, ndersa vlera max 25. Keto jane dy vlerat ne skaje. Ana e majte tregon kuartilin e pare qe ne rastin e Sophomore dhe Junior eshte 20, ndersa per Senior dhe Freshman eshte 19. Vija ne mes tregon \(mesoren\), perkatesisht 21 per Junior dhe Senior dhe 22 per Freshman dhe Sophomor. Ana e djathte tregon kuartilin e trete, 24 per Sophomore dhe 23 per Junior dhe Senior. Meqe jashte box-plotit nuk kemi pika te tjera atehere themi se nuk ka \(outliers§\). Njesoj veprojme dhe per te tjeret.
library(ggplot2)
ggplot(data, aes(x=gjinia, y=mosha, fill=gjinia)) +
geom_boxplot(outlier.shape = NA)+
geom_jitter(aes(color=gjinia, shape=gjinia), width=0.2, size=1.5) +
scale_color_manual(values=c("black", "gold", "red"))+
scale_fill_manual(values = c("blue", "red", "green"))+
labs(title = "Box Plot Gjinia~Mosha", x="Gjinia", y="Mosha", fill="Gjinia", color="Gjinia", color="Gjinia", shape="Gjinia")+ theme_classic()
Rezultatet dhe interpretimi I tyre:
Sipas grafikut te mesiperm, kutite e krijuara shprehin shperndarjen e moshes sipas gjinise. Meset e kutise se box-ploteve bien ne moshen 22, per te treja rastet. Pikat jitter tregojne shperndarjen e te dhenave ne secilin grup.
Kodi:
Fillimisht therrasim librarite qe na nevojiten: \(ggplot2\), e cila na ndihmon per vizualizimin e te dhenave.
Krijojme nje variable te quajtur \(data\) e cila do te permbaje dy parametra kryesore:
i: \(mosha\);
ii: \(gjinia\);
Krijimi I box plot pergjithesisht eshte I njejte si ne rastin
e mesiperm, me perjashtim te funksionit \(geom\)_\(jitter()\) I cili shton nje sasi te vogel
pikash qe tregojne nje shperndarje sipas kutise se box
plot.
Shtimi i linjave te references:
\(scale\)_\(color\)_\(manual()\) vendos ngjyren e pikave jitter
ne box plot. Ngjyra blu perdoret per femrat, e kuqja perdoret per
meshkujt ndersa gold per ata non-binary;
\(scale\)_\(fill\)_\(manual()\) vendos ngjyren e mbushjes se
kutive;
\(labs()\) shton titullin, emertimet e boshteve \(x\),\(y\) dhe legjenden.
library(ggplot2)
ggplot(data, aes(x=ushqimi))+
geom_density(aes(fill="Ushqimi"), alpha=0.5)+
geom_density(aes(x=strehimi, fill="Strehimi"), alpha=0.5)+
geom_density(aes(x=transporti, fill="Transporti"), alpha=0.5)+
geom_density(aes(x= mjetet_shkollore, fill="Mjetet Shkollore"), alpha=0.5)+
geom_density(aes(x=argetim, fill="Argetimi"), alpha=0.5) +
labs(x="Vlera", y="Densiteti", title="Grafiku Densitet per 5 variabla numerike", fill="Legjenda:")+
scale_fill_manual(values=c("dodgerblue2", "red", "purple3", "green2", "yellow2"))+
theme_classic()
Rezultati:
Ketu kemi paraqitur grafikun e densiteteve per 5 ndryshore numerike (argetimi,mjetet shkollore ,strehimi,transporti,ushqimi).
Boshti i y paraqet densitetin e secilit variabel brenda intervalit te vlerave te dhena(boshti x). Dendesia i referohet sa shpesh ndodhin vlerat brenda nje intervali specifik. Zona nen cdo kurbe perfaqeson perqindjen e te te dhenave qe ndodhen brenda nje intervali. Nje kurbe me e larte (densitet me te larte) do te thote qe me shume te dhena kane vlera rreth asaj pike ne boshtin X. Ne grafikun tone shohim qe kategoria \(argetim\) ka densitet me te madh duke qene se kurba eshte me e larte ,ndersa \(strehimi\) ka densitet me te vogel. Lartesia e kurbave eshte shkallezuar ne menyre qe siperfaqja nen kurbe te jete e barabarte me 1.
Nje nje projekt Data Science eshte shume e rendesishme nxjerja e sa me shume rezultateve dhe perfundimeve te nevojshme nga te dhenat tona. Ne nje nga fazat e analizes se te dhenave eshte edhe ajo e vizualizimit te cilen jemi duke e trajtuar akualisht.
Nje karakteristike e vecante eshte marredhenia midis dy variablave. Nese te dyja variablat jane cilesore (kategorike), atehere, per te shprehur nje marredhenie mes tyre ne perdorim ate qe quhet \(Grafiku\) \(Mozaik\). Ne disa disiplina nuk njihet, por ne R eshte i shpeshte dhe teper i nevojshem ne disa raste.
Grafiku Mozaik bazohet ne te dhena probabilitare. Per ti marre keto te dhena na nevojiten tabelat; \(tabelat\) \(e\) \(kontigjences\).
Me siper, ne kreun \(Tabelat\) \(e\) \(Kontigjences\) ne kemi ndertuar keto tabela per variablat tona kategorike dhe gjithashtu me ane te funksionit \(prop.table\) ne konvertuam keto tabela kontigjence ne tabela me vlera probabilitare.
Jane pikerisht keto tabela te cilat do te na hyjne ne pune.
Me poshte eshte ndertuar nje grafik mozaik per tabelen e
katert te kontigjences \(diplomimi\)~\(metoda\)_\(e\)_\(pageses\):
library(ggplot2)
ggplot(data) + aes(x=diplomimi, y=metoda_e_pageses, fill=gjinia, colour=viti_i_studimeve, group = metoda_e_pageses)+
labs(x="Diplomimi", y="Metoda e Pageses", title = "Grafiku Mozaik i Metodes se Preferuar te Pageses", fill="Legjenda:")+
geom_tile()+
scale_fill_manual(values=c(Female="gold", Male="red",`Non-binary`="cyan" ))+
scale_color_manual(values = c(Female="gold", Male="red", `Non-binary`="cyan"))+
theme_minimal()
Warning: No shared levels found between `names(values)` of the manual scale and the data's colour values.
Warning: No shared levels found between `names(values)` of the manual scale and the data's colour values.
Analizojme kodin:
Variabli \(data\) permban te dhenat tona;
Funksioni \(aes()\) do te kombinoje te dhenat ne estetikat e ndryshme ne grafik;
\(x = diplomimi\): I atribon variablin \(diplomimi\) boshtit x;
\(y=metoda\)_\(e\)_\(pageses\): I atribon variablin \(metoda\) $$e \(pageses\) boshtit y;
\(fill = gjinia\): do te ndihmoje ne ngjyrosjen e copezave te grafikut mozaik;
\(color=viti I studimeve\): I atribon variablin \(viti\) \(i\) \(studimeve\) ngjyres se copezave;
\(group =
metoda\)_\(e\)_\(pageses\): grupon copezat bazuar ne
variablin \(metoda\) \(e\) \(pageses\);
Etiketimi: funksioni \(labs()\) vendos titujt per boshtet x, y.
\(geom\)_\(tile()\): krijon grafikun mozaik, ku cdo
copez perfaqeson kombinimin e ndryshoreve \(diplomimi\) dhe \(metoda\) \(e\) \(pageses\);
Ngjyrat: funksionet \(scale\)_\(fill\)_\(manual()\) dhe \(scale\)_\(color\)_\(manual()\) perdoren per te ngjyrosur
secilen nga pllakat e grafikut bazuar ne percaktimet e
mesiperme.
Grafiku mozaik paraqet nje pamje vizuale te marredhenies midis Variablave \(diplomimi\), \(metoda\) \(e\) \(pageses\), \(gjinia\) dhe \(viti\) \(i\) \(studimeve\). Madhesia, ngjyrimi dhe pozicionimi I copezave te grafikut percjell frekuencat relative te secilit kombinim prej ketyre Variablave cilesore.
Rezultatet:
Metoda e Pageses: boshti \(y\) shfaq tre metodat e pageses; \(Cash\), \(Aplikacion\) dhe \(Karte\) \(Krediti\).
Gjinia: cdo pjesez ne grafik eshte e ngjyrosur per te perfaqesuar gjinine.
Diplomimi: boshti \(x\) I kategorizon njerezit sipas diplomimit te tyre.
Nga grafiku dalim ne perfundimin se:
Metoda me e popullarizuar e pageses eshte ajo \(Cash\), ndjekur nga \(Karta\) \(e\) \(Kreditit/Debitit\) dhe me pak ajo nepermjet aplikacioneve.
Kategoria gjinore me e shpeshte eshte ajo \(Non-binary\) e cila ndeshet me teper ne grafik (ngjyra blu e celet).
Mes te diplomuarve aplikacionet jane me pak te perdorura se kartat e kreditit.
library(plotly)
Warning: package ‘plotly’ was built under R version 4.3.3
Attaching package: ‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
plot_ly (data, x=~diplomimi, color =~ metoda_e_pageses, colors="Accent")
No trace type specified:
Based on info supplied, a 'histogram' trace seems appropriate.
Read more about this trace type -> https://plotly.com/r/reference/#histogram
No trace type specified:
Based on info supplied, a 'histogram' trace seems appropriate.
Read more about this trace type -> https://plotly.com/r/reference/#histogram
Grafikut te cilit i referohemi eshte nje grafik me shtylla qe tregon perdorimin e metodave te ndryshme te pageses ne disiplina te ndryshme akademike.
Nga grafiku arrijme te dallojme qe studentet e shkencave kompjuterike dhe inxhinierise duken kryesisht indiferente, duke shfaqur pak ndryshime ne zgjedhjet e tyre te metodes se pageses. Nje kontrast i forte shfaqet midis studenteve te ekonomise dhe biologjise ,ku egzistojne pabarazi te konsiderueshme ne preferencat e tyre midis pagesave elektronike dhe atyre me para.
Gjithashtu dhe studentet e psikologjise shfaqin pabarazi te dukshme ne zgjedhjen e metodes se pageses ,ku me e preferuar rezulton \(Mobile\) \(Payment\) \(App\). Kjo metode rezulton gjithashtu e preferuar dhe nga studentet e inxhinierise. Ndersa per studentet e ekonomise dhe biologjise me e preferuara rezulton metoda \(Credit/Debit\) \(Card\).
Kodi I meposhtem afishon nje \(pie\) \(chart\) qe vizualizon shpenzimet totale ne kategori te ndryshme shpenzimesh per te dhenat tona numerike.
library(ggplot2)
library(tidyverse)
Warning: package ‘tidyverse’ was built under R version 4.3.3
Warning: package ‘tibble’ was built under R version 4.3.3
Warning: package ‘readr’ was built under R version 4.3.3
Warning: package ‘purrr’ was built under R version 4.3.3
Warning: package ‘stringr’ was built under R version 4.3.3
Warning: package ‘forcats’ was built under R version 4.3.3
Warning: package ‘lubridate’ was built under R version 4.3.3
── Attaching core tidyverse packages ────────────────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ readr 2.1.5
✔ lubridate 1.9.3 ✔ stringr 1.5.1
✔ purrr 1.0.2 ✔ tibble 3.2.1
── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ readr::col_factor() masks scales::col_factor()
✖ gridExtra::combine() masks dplyr::combine()
✖ purrr::discard() masks scales::discard()
✖ plotly::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
shpenzimet_totale <- data.frame( kategorite_e_shpenzimit = c("shkollimi","strehimi","ushqimi","transporti","mjetet_shkollore","argetim","kujdesi_personal","teknologji","mireqenia_shendetesore","te_ndryshme"),
shpenzimet = c(sum(data$shkollimi), sum(data$strehimi), sum(data$ushqimi), sum(data$transporti), sum(data$mjetet_shkollore), sum(data$argetim), sum(data$kujdesi_personal), sum(data$teknologji), sum(data$mireqenia_shendetesore), sum(data$te_ndryshme)))
#llogaritim perqindjen per secilen kategori:
shuma_totale <- sum(shpenzimet_totale$shpenzimet)
shpenzimet_totale$percentage <- round(shpenzimet_totale$shpenzimet / shuma_totale * 100, 2)
#grafiku
ggplot(shpenzimet_totale, aes(x="", y=shpenzimet, fill=kategorite_e_shpenzimit, type='3D pie')) +
geom_bar(stat="identity", width = 2) +
geom_text(aes(label=paste0(percentage, "%")), position=position_stack(vjust=0.6), size = 2, color="black", fontface="bold") +
coord_polar("y", start=1)+
labs(title="Shpenzimet Totale sipas Kategorive", x=NULL, y=NULL)+
scale_fill_manual(values = c("lightblue", "orange", "green", "red","blue","yellow","maroon3", "cyan", "pink2","purple") )+
theme_void()+
theme(plot.title = element_text(hjust=0.5, size=15, face="bold"),
legend.title=element_blank(),
legend.text = element_text(size=8),
legend.position = "right")
NA
Interpretimi:
Kodi krijon nje permbledhje te te gjitha shumave te shpenzuara per te gjitha kategorite (shkollimi, ushqimi, transporti etj.).
1. \(shpenzimet\)_\(totale\) paraqet shumen totale te te gjitha
shpenzimeve te kryera.
2. \(Llogaritja\) \(e\) \(perqindjeve\): kodi llogarit perqindjen e seciles nga shpenzimet totale te kryera, cfare pjese do te zere secila pjese e shpenzimeve ne pie chart. Ato do te paraqiten te rrumbullakosura ne 2 shifra pas presjes dhjetore.
3. Vizualizimi I pie chart-it: Funksioni \(ggplot()\) perdoret per te krijuar kete grafik.
Funksioni \(aes()\) lidh kategorite e shpenzimeve me \(fill()\) aesthetic dhe shpenzimet totale me y aesthetic.
Funksioni \(geom\)_\(bar()\) perdoret per te krijuar
copetimet/copat e grafikut pie, ku se bashku me parametrin \(width\), jep nje efekt 3D.
\(geom\)_\(text()\) perdoret per te shtuar vlerat e
perqindjes ne copat perkatese. Parametri \(position()\) vendos keto vlera brenda
copave, pra I pozicionon ato.
Funksioni \(coord\)_\(polar()\) perdoret per te transformuar
krafikun drejtkendor ne nje grafik rrethor (rreth). \(scale\)_\(fill\)_\(manual()\) perdoret per te vendosur ngjyra
te vecanta per secilen nga copat e pie chart-it.
Funksioni \(theme\)_\(void()\) perdoret per te ndertuar legjenden
e cila na sherben per orientim ndaj te dhenave te paraqitura ne pie
chart.
Ky lloj vizualizimi eshte I rendesishem per te kuptuar shperndarjen e shpenzimeve dhe identifikimin e kategorive me te rendesishme te kostos per te dhenat qe ne disponojme.
Korrelacioni eshte cdo marredhenie statistikore midis dy ndryshoreve te rastit. Zakonisht i referohet shkalles ne te cilen nje pale ndryshoresh jane te lidhura ne menyre lineare.
\[Formula\ matematike:\\ {𝜌(𝑋,𝑌)} = {\frac{\displaystyle\ S_{xy}} {{S_x}{S_y}}}\]
Me poshte do te ndertojme matricen e korrelacionit ne baze te se ciles do te ndertohet drejteza e regresit linear.
Ngjyra blu do te tregoje nje lidhje te dobet mes variablave, ndersa ngjyra e kuqe do te tregoje nje lidhje te forte mes tyre.
correlation_matrix <- cor(numeric_df)
library(pheatmap)
Warning: package ‘pheatmap’ was built under R version 4.3.3
pheatmap(correlation_matrix,
color = colorRampPalette(c("darkblue", "white", "red3"))(200),
fontsize = 10,
fontsize_row = 8,
fontsize_col = 8,
main = "Correlation Heatmap i Variablave Numerike",
display_numbers = TRUE,
number_color = "white" )
Per te arsyetuar mbi varesine ndermjet ndryshoreve numerike shikojme grafikun dhe vlerat e koeficienteve te korrelacionit.
Nga matrica verejme se te dhenat nuk kane nje lidhje te forte me njera tjetren, dhe kjo percaktohet nga ngjyrimi blu i kufizave te matrices.Ne do te zgjedhim disa ndryshore qe mendojme se kane pak varesi me njera tjetren (dy qe e kane ngjyren me te zbehte).
library(ggplot2)
library(ggpubr)
Warning: package ‘ggpubr’ was built under R version 4.3.3
ggplot(data, aes(x = argetim, y = te_ardhurat_mujore)) +
geom_point(color = "#FF0000",size=0.9) +
labs(x = "Shpenzimet per argetim", y = "Te ardhurat mujore", title = "Shpenzimet per argetim krahasuar me te ardhurat mujore") +
geom_smooth(method = "lm", color = "black", formula = y ~ x) +
stat_cor(label.x = 100, label.y = 1400, size = 5) +
stat_regline_equation(label.x = 100, label.y = 1300, size = 5)
library(ggplot2)
library(ggpubr)
ggplot(data, aes(x = transporti, y = te_ardhurat_mujore)) +
geom_point(color = "black",size=0.9) +
labs(x = "Shpenzimet per transport", y = "Te ardhurat mujore", title = "Shpenzimet per transport krahasuar me te ardhurat mujore") +
geom_smooth(method = "lm", color = "black", formula = y ~ x) +
stat_cor(label.x = 100, label.y = 1400, size = 5) +
stat_regline_equation(label.x = 100, label.y = 1300, size = 5)
library(ggplot2)
library(ggpubr)
ggplot(data, aes(x = shkollimi, y = te_ardhurat_mujore)) +
geom_point(color = "blue", size = 1) +
labs(x = "Shkollimi", y = "Te ardhurat mujore", title = "Shpenzimet per shkollim krahasuar me te ardhurat mujore") +
geom_smooth(method = "lm", color = "black", formula = y ~ x) +
stat_cor(label.x = 4000, label.y = 1400, size = 5, alpha = 2) +
stat_regline_equation(label.x = 4000, label.y = 1300, size = 5, alpha = 2)
library(ggplot2)
library(ggpubr)
ggplot(data, aes(x = kujdesi_personal, y = te_ardhurat_mujore)) +
geom_point(color = "purple3", size = 1) +
labs(x = "Shpenzimet per kujdes personal", y = "Te ardhurat mujore", title = "Shpenzimet per kujdes personal krahasuar me te ardhurat mujore") +
geom_smooth(method = "lm", color = "black", formula = y ~ x) +
stat_cor(label.x = 15, label.y = max(data$te_ardhurat_mujore) * 0.95, size = 5) +
stat_regline_equation(label.x = 15, label.y = max(data$te_ardhurat_mujore) * 0.90, size = 5)
Nga grafiket e ndertuar me siper se bashku me drejtezat e regresit shohim qe lidhja ndermjet te ardhurave mujore dhe shpenzimeve sipas kujdesit personal, transportit, shkollimit dhe argetimit eshte shume pak lineare. Ne po marrim ne shqyrtim njerin grafik prej tyre.
\[Koeficenti\ i\ Korrelacionit\]
Koeficenti i korrelacionit na ndihmon te kuptojme nese ndermjet te dhenave ka lidhje lineare apo jo.
Vetite e tij:
Koeficenti i korrelacionit merr vlera nga -1 ne 1.
Nese vlera e ketij koeficenti eshte afer 0 (midis -0.2 dhe 0.2 ) na tregon se ndermjet variablave nuk ka nje varesi lineare.
Mund te perdorim dy funksione per te pare vleren e korrelacionit qe jane pjese e librarise status:
1.\(cor.test\)
2.\(cor\);
cor(data$transporti, data$te_ardhurat_mujore)
[1] 0.04615186
koeficentet <- cor.test(data$transporti,data$te_ardhurat_mujore)
koeficentet$estimate
cor
0.04615186
Shohim qe vlera e koeficentit te korrelacionit eshte 0.04615186 qe eshte shume prane 0.
Pra midis variablave nuk ka nje lidhje lineare.
Po te veme re dhe grafiket e tjere dhe drejtezat e tyre, gjithashtu nuk ka nje varesi te dukshme.
\[Y\ = \ β_0\ +\ β_1\ +\ gabimi\\ ku\ β_0\ dhe\ β_1\ jane\ parametrat\ e\ vijes\]
Perdorim funksionin \(lm()\) per te ndertuar nje model linear.
set.seed(123)
zgjedhja<-sample(c(TRUE,FALSE),nrow(data),replace=T,prob=c(0.6,0.4))
Specifikojme qe te dhenat do te ndahen 40% testues dhe 60% trajnues; ndajme te dhenat ne test dhe train:
train<-data[zgjedhja,]
test<-data[!zgjedhja,]
Ndertohet modeli duke marre si te dhena te dhenat train (trajnuese)
model1<-lm(transporti~te_ardhurat_mujore,data= train)
model1
Call:
lm(formula = transporti ~ te_ardhurat_mujore, data = train)
Coefficients:
(Intercept) te_ardhurat_mujore
1.181e+02 5.277e-03
\[Miresia\ e\ modelit\] Per te kontrolluar miresine e nje modeli shikojme:
1.\(RSE->residual\) \(standart\) \(error\): eshte distanca mesatare qe vlerat e vezhguara te jene larg vleres se modelit. Sa me e vogel kjo vlere aq me i mire modeli sepse kuptojme qe vlerat qe jep modeli ndodhen afer atyre qe jane vlera te verteta.
2.\(R^2\)-> na tregon sasa perqind te dhenave te shpjegohen nga modeli.
3.\(p-value\)->nese kjo vlere eshte me e vogel se 0.05 atehere themi qe kemi nje lidhje te rendesishme ndermjet ndryshoreve.
Bejme nje summary te modelit per te pare perfundimet:
summary(model1)
Call:
lm(formula = transporti ~ te_ardhurat_mujore, data = train)
Residuals:
Min 1Q Median 3Q Max
-74.835 -37.737 -0.988 39.736 78.186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.181e+02 6.601e+00 17.896 <2e-16 ***
te_ardhurat_mujore 5.277e-03 6.120e-03 0.862 0.389
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 43.93 on 605 degrees of freedom
Multiple R-squared: 0.001228, Adjusted R-squared: -0.0004232
F-statistic: 0.7436 on 1 and 605 DF, p-value: 0.3888
Shohim qe:
Residual standard error: 43.93. Kjo do thote qe shpenzimet reale per transport devijojne nga modeli linear ne vleren 43.93. Mund ta kthejme ne %:
mesatarja_RSE<-sigma(model1)/mean(train$transporti)
sprintf("Gabimi ne perqindje i modelit:%s%%",round(mesatarja_RSE*100,2))
[1] "Gabimi ne perqindje i modelit:35.54%"
Gabimi ne % eshte 35.54%. Kjo do te thote se modeli yne nuk pershtatet shume mire me te dhenat e verteta.
Vlera e R^2 = 0.001228 modeli shpjegon 0.1228% te te dhenave, pothuajse aspak.
Kontrollojme mbetjet e modelit:
Mbetja eshte distanca ndermjet vleres aktuale dhe asaj qe jep modeli qe ne kemi ndertuar.
vlerat_e_modelit <- predict(model1, data = train)
vlerat_e_modelit
1 3 6 7 9 10 12 14 15 17 18 19 27 28
123.1910 122.0089 120.8954 125.2808 125.5341 125.6449 123.7715 124.3415 126.0302 125.6238 125.8085 125.9827 121.6553 122.3361
29 30 35 36 38 39 40 41 42 43 44 45 46 47
122.9113 120.9376 125.3283 125.1436 125.0433 123.2280 121.3018 121.3281 122.3783 124.8586 120.8215 124.9167 123.0380 124.2729
48 49 51 52 54 55 56 57 60 62 63 64 66 70
124.4417 123.6026 125.4022 125.7927 124.9642 123.1013 124.1040 122.6422 121.8031 123.8560 122.0036 123.7346 123.8982 121.1012
74 75 76 77 79 80 81 83 85 86 90 91 93 95
124.3415 125.1753 125.7452 123.3229 122.8744 124.2781 125.9669 123.9668 123.0433 125.2386 122.3572 124.4628 124.9695 125.4391
96 98 99 100 101 102 103 105 109 110 112 113 116 117
123.5974 122.1831 123.0591 122.3730 125.1700 124.0195 122.9905 122.7899 123.2280 122.2728 125.2544 122.1461 124.9220 124.1251
119 120 122 123 124 125 127 128 129 135 140 141 142 143
124.8111 124.3362 125.9827 125.1383 123.9140 125.8402 122.5472 124.5314 125.9616 121.7028 120.9112 123.1541 125.0011 123.9562
144 146 147 148 149 152 153 154 155 156 157 158 159 160
123.8507 125.2808 122.7636 121.5603 122.8691 121.5498 121.5973 124.1040 125.0961 122.9746 122.7372 122.8902 120.9640 120.8057
161 162 164 165 166 168 169 170 172 177 178 180 182 184
124.4734 124.4101 124.3678 122.4100 122.6475 123.7082 121.4073 124.5051 123.5868 124.9220 123.9615 124.6264 125.5816 121.3440
185 186 187 188 191 192 196 197 199 200 201 204 205 207
121.1223 125.0486 125.2808 123.5182 125.7769 123.2438 125.9880 122.8480 122.3783 121.0221 120.8954 120.7951 124.4839 122.6580
208 209 210 211 212 213 215 217 218 221 225 226 227 228
122.4205 120.9851 122.8269 123.3704 122.1461 124.4628 123.3810 125.7558 124.4839 123.8718 122.2200 125.3389 122.4100 122.6527
232 233 234 235 236 237 239 241 243 245 247 251 252 253
122.8533 122.7477 122.6211 122.7741 122.4153 122.4047 122.6527 124.4945 124.8270 124.4998 124.7320 122.2306 122.3572 123.8243
254 255 257 258 259 265 266 267 268 269 270 272 273 274
125.5447 123.9246 122.1197 123.8718 123.0485 124.8692 122.7794 125.6502 124.8692 121.8136 125.4127 124.0776 124.0776 121.3967
278 279 282 283 284 285 286 287 288 289 291 292 293 298
125.1964 124.0512 125.4655 124.0565 124.0090 122.7741 125.9932 122.7213 125.4233 125.1278 121.7978 121.3809 125.4919 121.6553
299 302 306 307 308 309 310 311 312 314 315 318 319 322
125.8297 122.0247 120.8321 124.2623 124.0037 123.1910 124.3415 121.9720 125.9616 125.2439 121.9614 122.7213 121.0009 121.6659
323 325 326 328 329 332 335 336 341 342 343 344 345 346
125.1858 125.2017 121.3492 125.0433 125.8983 122.9219 121.6131 125.9880 121.2068 125.8824 123.5235 122.1461 123.2491 123.3599
348 349 351 353 354 358 359 361 362 364 365 367 368 369
122.3836 124.3573 122.5841 121.1962 123.2807 123.4971 123.2491 123.3177 123.5235 125.7558 121.5445 125.3125 124.0618 123.5182
370 371 372 374 375 378 379 381 383 385 387 388 389 390
122.0195 123.0485 125.7505 121.0959 124.5314 121.8400 125.6977 120.9640 121.8348 122.5789 124.1198 123.3704 125.5658 125.3547
392 395 396 397 398 399 402 404 405 406 408 409 411 413
124.5948 123.2702 122.0195 121.1487 122.5314 121.8981 124.8270 123.4390 124.4945 125.2914 121.8611 123.9509 125.8244 125.0433
414 415 416 418 419 420 421 423 424 427 428 429 432 433
125.9563 125.8824 122.7213 125.7505 122.1778 125.5394 125.3758 125.0117 121.1540 123.0063 121.4126 121.6184 125.6661 120.9165
435 437 438 439 440 441 442 444 448 449 451 452 453 454
124.5684 125.9616 121.5445 122.4522 124.8270 121.1065 124.7478 122.6052 122.0036 121.7450 125.2386 125.8613 124.7742 123.0169
455 459 460 462 463 464 465 466 467 469 471 473 476 478
123.0274 122.9377 125.1172 123.4338 125.6027 123.1382 123.4654 122.7424 122.9958 121.6817 126.0091 123.2068 125.1594 125.2703
479 481 483 486 487 489 492 493 495 497 498 501 502 503
125.7188 125.2122 124.4417 122.6791 121.1329 122.4680 120.7846 122.6738 120.9851 120.7899 121.6870 123.7399 125.5183 125.5341
504 505 506 507 508 510 511 512 514 516 517 519 521 522
124.3837 122.0564 125.4391 123.3968 121.3967 123.1963 123.1488 124.9220 124.0776 122.3730 121.6395 123.4390 125.6713 122.9166
523 524 525 528 530 533 535 537 539 540 542 544 547 550
123.3652 124.7109 124.8534 122.9694 123.1013 124.9695 122.5102 124.8797 120.9060 123.4443 121.0326 124.4523 123.9879 120.7899
551 552 553 555 556 558 559 560 561 564 567 569 571 573
124.9589 122.9008 124.1990 123.7082 124.7056 124.6053 123.2280 125.1964 125.5975 125.4022 124.9378 123.6238 122.8585 123.3177
574 577 578 579 580 584 586 587 590 591 592 594 595 597
125.5288 123.1277 122.5366 121.8031 125.9405 121.9720 125.4814 126.0038 125.6027 124.1304 123.2596 120.8954 120.9482 122.0353
598 600 601 603 604 605 607 609 610 612 613 615 620 624
123.9351 125.0222 124.8006 123.2807 125.8349 125.0064 122.4416 123.6976 125.3758 124.7742 125.5183 121.0907 120.9957 123.5235
625 626 627 629 630 633 635 636 640 641 644 645 648 649
124.6212 121.6184 125.2228 121.0643 120.9904 124.6581 121.0801 123.0169 123.0380 124.7056 123.4127 121.2595 124.9853 124.1568
653 654 658 659 660 663 665 666 668 670 671 672 673 674
125.3072 124.2676 125.1383 123.6501 121.4020 123.1013 123.8982 125.7294 125.5922 121.0854 124.1726 124.9747 123.1910 122.2253
675 676 678 679 680 686 687 688 689 690 691 693 694 695
125.9616 125.0170 125.7347 125.8402 125.4127 123.8612 121.6448 125.6397 125.3441 121.9034 124.7795 125.6291 121.3229 124.7320
696 697 698 699 700 702 703 705 707 708 709 710 712 715
121.1276 121.0168 124.3045 122.5366 124.0301 125.6977 122.8269 121.5392 122.0933 121.5867 124.1568 122.4153 124.7584 122.4364
716 717 718 720 721 730 731 735 736 739 740 742 743 744
123.5921 123.5288 125.2122 124.5631 125.1647 125.9458 120.8901 121.3704 122.5894 123.3916 123.9721 125.6238 121.3545 125.7822
745 746 748 751 752 753 754 755 756 757 759 760 761 762
124.8375 122.1989 125.0117 122.3783 124.8270 126.0091 125.5341 121.6870 124.3309 121.4865 123.4021 122.7055 125.5658 121.9984
763 764 765 766 767 768 775 776 777 778 781 783 786 787
123.9773 124.2834 125.0803 125.8983 120.9640 123.6501 124.7689 123.7768 122.1303 122.3255 120.8479 121.4759 125.6977 122.4839
788 789 792 793 794 796 797 798 799 800 801 802 803 804
123.2385 124.0776 123.4602 125.9352 121.0221 125.1331 122.1514 125.2280 122.8585 123.6026 121.6395 125.5394 123.9087 122.2992
805 807 808 811 813 816 817 818 819 820 822 826 827 828
121.2331 125.4286 125.0011 122.7055 125.3758 121.5076 125.5077 123.0696 122.7899 125.5130 125.6819 120.9798 123.5393 125.5500
829 831 833 836 839 840 844 845 846 848 849 850 851 855
121.7820 121.0643 124.6106 121.0696 123.0538 121.7398 122.5894 125.3441 123.0063 124.0987 122.6105 123.5288 124.4101 124.7689
856 858 859 861 862 864 866 867 868 869 870 871 872 874
125.0644 121.6659 121.9772 123.2016 124.9325 125.5922 123.8560 124.8375 122.5947 125.6449 121.7398 125.0433 122.8216 123.2280
877 881 884 887 888 889 890 891 894 897 898 899 902 904
120.7793 123.6079 125.7980 125.1700 123.2966 124.1568 125.5183 124.4681 123.9087 120.8426 124.3309 124.5948 122.8533 125.9563
906 910 911 912 914 915 917 918 919 920 922 923 927 928
125.9246 125.7716 123.5710 123.5921 123.8665 125.1753 125.9616 122.8216 124.4892 123.6607 121.3704 124.8903 125.4761 125.2333
929 932 934 936 937 940 942 943 944 945 946 947 948 949
125.2280 123.6132 125.2333 124.3415 122.5472 124.1937 122.2411 124.1515 121.0748 123.1382 124.7267 123.8982 121.9192 122.6580
950 951 952 954 955 958 959 960 961 962 964 966 967 968
123.5235 123.7082 122.1356 121.5287 121.6606 124.0882 124.3942 120.9218 122.2200 124.6581 124.3045 120.8479 123.4918 122.0564
970 972 974 975 978 979 980 981 987 988 989 990 991 992
124.1620 125.8613 125.1331 124.5684 125.4444 121.0537 123.4813 123.5129 125.4233 123.8454 124.2570 123.4707 125.5869 125.4761
993 994 995 998 1000
124.9589 125.4180 122.1672 123.1857 120.9904
mbetjet_e_modelit <- resid(model1)
mbetjet_e_modelit
1 3 6 7 9 10 12 14 15 17
-0.1910156 14.9910944 1.1045998 -24.2808171 -43.5341264 -8.6449492 76.2284840 -74.3414619 -63.0301904 74.3761599
18 19 27 28 29 30 35 36 38 39
-15.8085448 -56.9826949 63.3446720 56.6639033 8.0886801 58.0623816 40.6716874 -32.1436079 22.9566603 -35.2279565
40 41 42 43 44 45 46 47 48 49
56.6982495 67.6718631 24.6216851 -69.8586350 71.1784817 57.0833150 72.9620255 -38.2728573 -67.4417301 -37.6026431
51 52 54 55 56 57 60 62 63 64
11.5978055 -58.7927130 -27.9641805 73.8986981 17.8960156 22.3578212 -17.8030918 -47.8559524 41.9963717 -65.7345751
66 70 74 75 76 77 79 80 81 83
19.1018294 -16.1012140 -27.3414619 60.8247284 40.2547825 4.6770525 6.1256210 0.7218654 -51.9668631 40.0332248
85 86 90 91 93 95 96 98 99 100
-69.0432518 26.7614011 -25.3572058 -42.4628392 19.0305422 48.5608646 -66.5973659 77.8169443 12.9409164 -68.3730377
101 102 103 105 109 110 112 113 116 117
-74.1699943 11.9804520 20.0095210 -19.7899425 -67.2279565 -12.2727694 3.7455692 -38.1461148 69.0780377 43.8749065
119 120 122 123 124 125 127 128 129 135
-2.8111395 45.6638154 20.0173051 53.8616693 43.0859975 50.1597915 37.4528122 -71.5314438 12.0384142 -46.7028235
140 141 142 143 144 146 147 148 149 152
62.0887680 -55.1540746 -74.0011215 25.0437793 -43.8506752 -50.2808171 -26.7635561 39.4396629 2.1308983 -60.5497825
153 154 155 156 157 158 159 160 161 162
66.4027220 52.8960156 -52.0961125 61.0253528 43.2628302 -38.8902108 -38.9640048 54.1943135 -68.4733938 -45.4100665
164 165 166 168 169 170 172 177 178 180
12.6321517 -71.4099786 23.3525439 70.2918113 75.5927040 18.4949425 -59.5868113 4.0780377 61.0385020 -25.6264348
182 184 185 186 187 188 191 192 196 197
-48.5816219 64.6560313 -57.1223231 -65.0486170 -65.2808171 32.4817933 46.2231189 -67.2437883 67.0120278 69.1520074
199 200 201 204 205 207 208 209 210 211
44.6216851 -27.0220548 -55.8954002 -51.7951319 32.5160516 3.3419894 66.5794668 26.0148861 -9.8268835 -39.3704430
212 213 215 217 218 221 225 226 227 228
-28.1461148 -1.4628392 -69.3809975 63.2442280 -16.4839484 62.1282157 -49.2199966 -73.3388672 57.5900214 -10.6527333
232 233 234 235 236 237 239 241 243 245
4.1467301 -11.7477243 -20.6210697 66.2258893 -16.4152559 53.5952987 0.3472667 65.5054971 -71.8269713 -58.4997802
247 251 252 253 254 255 257 258 259 265
-32.7319804 -27.2305512 -62.3572058 51.1757112 -21.5446810 1.0754430 3.8802716 -37.8717843 63.9514709 -27.8691896
266 267 268 269 270 272 273 274 278 279
-22.7793880 19.3497735 19.1308104 78.1863537 17.5872509 -61.0775980 10.9224020 23.6032585 13.8036193 -23.0512117
282 283 284 285 286 287 288 289 291 292
-34.4655218 -20.0564889 53.9910065 -17.7741107 61.0067505 4.2786621 -60.4233036 74.8722239 -29.7978145 -12.3809097
293 298 299 302 306 307 308 309 310 311
-41.4919082 1.3446720 14.1703461 37.9752626 31.1679271 30.7376973 -31.0037162 -45.1910156 -31.3414619 -45.9719646
312 314 315 318 319 322 323 325 326 328
-22.9615858 8.7561238 -60.9614101 -10.7213379 31.9990543 54.3341174 5.8141738 -12.2016580 56.6507540 -10.0433397
329 332 335 336 341 342 343 344 345 346
-12.8982585 47.0781256 -44.6131098 -5.9879722 66.7932405 31.1175733 64.4765160 -34.1461148 -64.2490656 -34.3598884
348 349 351 353 354 358 359 361 362 364
-21.3835922 -6.3572937 73.4158713 48.8037950 53.7192707 -69.4970976 58.7509344 -5.3176702 -37.5234840 -55.7557720
365 367 368 369 370 371 372 374 375 378
-47.5445052 27.6875192 -2.0617662 14.4817933 57.9805399 22.9514709 43.2495052 -33.0959367 -59.5314438 28.1599673
379 381 383 385 387 388 389 390 392 395
-74.6977220 -16.9640048 -56.8347554 33.4211485 -28.1198163 -9.3704430 -47.5657901 -66.3546990 -69.5947712 59.7298253
396 397 398 399 402 404 405 406 408 409
-47.0194601 -65.1487095 -32.5313560 -1.8980828 34.1730287 24.5609524 17.5054971 44.7086283 61.1388582 -1.9509434
411 413 414 415 416 418 419 420 421 423
13.1756234 26.9566603 18.0436914 -51.8824267 38.2786621 26.2495052 51.8222216 53.4605963 58.6241919 53.9883240
424 427 428 429 432 433 435 437 438 439
-49.1539868 -6.0063109 -12.4125733 -33.6183871 35.3339417 13.0834907 1.4316152 33.0384142 -44.5445052 37.5478032
440 441 442 444 448 449 451 452 453 454
-5.8269713 59.8935087 53.2521878 42.3947622 -51.0036283 18.2549583 -0.2385989 -64.8613176 70.2258014 -43.0168654
455 459 460 462 463 464 465 466 467 469
22.9725800 41.0622937 -65.1172216 55.5662297 -11.6027310 -12.1382428 46.5345660 -2.7424470 -44.9957563 -57.6817144
471 473 476 478 479 481 483 486 487 489
-50.0090813 -67.2068474 -0.1594398 70.7297374 35.2811689 57.7877875 58.5582699 67.3208803 -29.1328776 10.5319713
492 493 495 497 498 501 502 503 504 505
-66.7845774 48.3261576 14.0148861 56.2101453 -26.6869917 -6.7398523 56.4817054 8.4658736 43.6163199 -71.0564011
506 507 508 510 511 512 514 516 517 519
32.5608646 6.6031706 -3.3967415 -56.1962928 -26.1487973 -64.9219623 -24.0775980 50.6269623 -51.6394962 -67.4390476
521 522 523 524 525 528 530 533 535 537
58.3286644 3.0834028 -25.3651657 -17.7108713 72.1466423 34.0306301 -10.1013019 12.0305422 -34.5102469 -43.8797441
539 540 542 544 547 550 551 552 553 555
-35.9059547 -49.4443248 -55.0326094 -33.4522847 -0.9878843 0.2101453 24.0410967 -45.9007653 46.8010246 33.2918113
556 558 559 560 561 564 567 569 571 573
11.2944060 67.3946743 67.7720435 -46.1963807 13.4025463 7.5978055 -65.9377942 -42.6237523 -28.8585471 2.6823298
574 577 578 579 580 584 586 587 590 591
23.4711509 55.8723118 -20.5366333 -44.8030918 -32.9404767 60.0280354 -24.4813537 17.9961960 30.3972690 -3.1303708
592 594 595 597 598 600 601 603 604 605
10.7403798 67.1045998 45.0518270 -11.0352920 -25.9351116 -5.0222306 -7.8005850 -30.2807293 -74.8349312 -37.0063987
607 609 610 612 613 615 620 624 625 626
-19.4416423 46.3023659 -49.3758081 -3.7741986 43.4817054 0.9093406 -53.9956685 -66.5234840 -39.6211576 -62.6183871
627 629 630 633 635 636 640 641 644 645
-50.2227671 -52.0642730 -41.9903912 20.3419015 -32.0801049 -17.0168654 -50.0379745 3.2944060 -19.4126612 -28.2595323
648 649 653 654 658 659 660 663 665 666
43.0147104 59.8432428 67.6927965 -7.2675800 27.8616693 -52.6501386 -32.4020188 11.8986981 -30.8981706 -39.7293857
668 670 671 672 673 674 675 676 678 679
-54.5921765 -18.0853822 5.8274110 -55.9747351 32.8089844 59.7747261 42.0384142 -23.0169533 -50.7346629 -63.8402085
680 686 687 688 689 690 691 693 694 695
-6.4127491 58.1387703 -71.6447735 22.3603280 1.6558555 -55.9033600 -30.7794759 64.3708826 -55.3228596 -26.7319804
696 697 698 699 700 702 703 705 707 708
-25.1276004 -21.0167776 -41.3045209 -34.5366333 -37.0301026 -48.6977220 -40.8268835 -52.5392280 68.9066580 -65.5867234
709 710 712 715 716 717 718 720 721 730
-41.1567572 -2.4152559 -56.7583667 42.5636350 -13.5920886 43.4712387 -3.2122125 6.4368925 73.8352829 -25.9457540
731 735 736 739 740 742 743 744 745 746
-40.8901229 -64.3703551 66.4105940 -2.3915521 42.0279475 25.3761599 42.6454767 -68.7821584 7.1624741 -67.1988875
748 751 752 753 754 755 756 757 759 760
57.9883240 50.6216851 62.1730287 59.9909187 12.4658736 -18.6869917 66.6690927 74.5135448 -52.4021066 -22.7055061
761 762 763 764 765 766 767 768 775 776
42.4342099 32.0016490 7.0226702 71.7165882 59.9197194 54.1017415 7.0359952 40.3498614 22.2310787 -40.7767933
777 778 781 783 786 787 788 789 792 793
-53.1302829 -49.3255422 -28.8479047 6.5240994 -2.6977220 4.5161395 -16.2385111 34.9224020 -6.4601567 -48.9351995
794 796 797 798 799 800 801 802 803 804
-63.0220548 -34.1330534 -69.1513921 41.7719556 34.1414529 10.3973569 -46.6394962 3.4605963 -33.9087252 5.7008442
805 807 808 811 813 816 817 818 819 820
49.7668541 54.5714191 -10.0011215 58.2944939 18.6241919 -1.5075643 -2.5077400 -1.0696382 30.2100575 55.4869827
822 826 827 828 829 831 833 836 839 840
-36.6818902 -13.9798366 -11.5393158 -73.5499583 -63.7819827 73.9357270 12.3893970 8.9304497 -39.0538064 33.2602355
844 845 846 848 849 850 851 855 856 858
-51.5894060 37.6558555 -13.0063109 35.9012928 44.3894849 -50.5287613 -72.4100665 -17.7689213 64.9355512 -43.6658826
859 861 862 864 866 867 868 869 870 871
56.0227581 4.7984299 -59.9325169 68.4078235 -34.8559524 35.1624741 -71.5946833 31.3550508 64.2602355 71.9566603
872 874 877 881 884 887 888 889 890 891
-1.8216062 -35.2279565 21.2206999 52.3920796 -60.7979903 57.8300057 22.7034389 -3.1567572 -32.5182946 -73.4681165
894 897 898 899 902 904 906 910 911 912
48.0912748 40.1573726 38.6690927 -23.5947712 21.1467301 -0.9563086 -52.9246449 -12.7716039 -24.5709795 -7.5920886
914 915 917 918 919 920 922 923 927 928
-70.8665070 3.8247284 -35.9615858 59.1783938 -4.4892256 73.3393068 -22.3703551 -25.8902987 -17.4760764 6.7666784
929 932 934 936 937 940 942 943 944 945
16.7719556 9.3868023 -24.2333216 -31.3414619 26.4528122 34.8063019 48.7588942 58.8485201 36.9251724 57.8617572
946 947 948 949 950 951 952 954 955 958
57.2732969 34.1018294 72.0808081 -27.6580106 37.4765160 55.2918113 -46.1355602 60.4713266 -33.6606053 -59.0881526
959 960 961 962 964 966 967 968 970 972
-65.3942347 15.0782134 -47.2199966 -9.6580985 34.6954791 -36.8479047 23.5081797 -17.0564011 41.8379655 -38.8613176
974 975 978 979 980 981 987 988 989 990
-38.1330534 27.4316152 -6.4444127 13.9462815 -61.4812658 -58.5129294 16.5766964 7.1546021 57.7429745 63.5292888
991 992 993 994 995 998 1000
-6.5868992 -70.4760764 67.0410967 61.5819737 -61.1672239 15.8142617 70.0096088
Ne te dhenat tona trajnuese mund te shtojme nje kolone te re qe permban vlerat qe parashikon modeli.
train$vlerat_e_modelit<-predict(model1)
train$mbetjet_e_modelit<-residuals(model1)
View(train)
grafik1<-ggplot(train,aes(transporti , te_ardhurat_mujore)) + geom_point() + geom_point (aes(y = vlerat_e_modelit,color = "red2"))
grafik1
Mund te ndertojme segmentet qe paraqesin gabimet:
library(ggplot2)
grafik2 <- ggplot(train,aes(transporti ,te_ardhurat_mujore))+geom_point() + labs(x="Transporti", y= "Te Ardhurat Mujore") + geom_point(aes(y=vlerat_e_modelit),color="blue")+geom_segment(aes(xend= transporti, yend=vlerat_e_modelit),color="red")
grafik2
library(correlation)
Warning: package ‘correlation’ was built under R version 4.3.3
plot(data$ushqimi, data$mjetet_shkollore)
cor(data$ushqimi,data$mjetet_shkollore)
[1] 0.07548524
model1<-lm(ushqimi ~ mjetet_shkollore, data=data)
model1
Call:
lm(formula = ushqimi ~ mjetet_shkollore, data = data)
Coefficients:
(Intercept) mjetet_shkollore
236.80006 0.09065
library(ggplot2)
ggplot(data) + aes(x = ushqimi, y= mjetet_shkollore) +
geom_point(size = 1L, colour = "#000000") +
geom_smooth(method = "lm", formula = y~x, se = TRUE, color = "blue") +
stat_cor(label.x=2, label.y=4) +
stat_regline_equation(label.x=1, labe.y=2, size=3) +
labs(x = "Ushqimi", y = "Mjetet Shkollore",
title = "Shpenzimet per ushqim vs Shpenzimet per Mjete Shkollore",
subtitle = "A ndikojne shpenzimet ne ushqyerje ne ato per mjete shkollore?") +
theme_classic()
Warning in stat_regline_equation(label.x = 1, labe.y = 2, size = 3) :
Ignoring unknown parameters: `labe.y`
Le te analizojme nje nga rezultatet e regreseve te ndertuara me siper:
REZULTATI: Nga pergjigjja e mesiperme ne modelin \(Shpenzimet\) \(per\) \(ushqim\) \(vs\) \(Shpenzimet\) \(per\) \(Mjete\) \(Shkollore\) kemi se ekuacioni qe i pershtatet ketij modeli regresi eshte
\[Vleresimi\ = \ 160\ +\ 0.063X\] qe do te thote se per cdo njesi shpenzimi ne ushqim, kemi nje rritje me 0.063 njesi ne mjete shkollore.
Mundemi te gjejme edhe intervalin e besimit nepermjet funksionit \(confint()\):
confint(model1, level = 0.95)
2.5 % 97.5 %
(Intercept) 222.73043149 250.8696790
mjetet_shkollore 0.01626667 0.1650317
Sic duket edhe nga rezultati, intervali i besimit eshte [222.73 ; 250.86]
Funksioni \(predict()\) parashikon se sa do te jete vlera e ndryshores se varur kur jepen disa vlera te ndryshores se pavarur.
summary(model1)
Call:
lm(formula = ushqimi ~ mjetet_shkollore, data = data)
Residuals:
Min 1Q Median 3Q Max
-160.913 -77.172 3.852 74.670 155.764
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 236.80006 7.16981 33.027 <2e-16 ***
mjetet_shkollore 0.09065 0.03790 2.391 0.017 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 86.74 on 998 degrees of freedom
Multiple R-squared: 0.005698, Adjusted R-squared: 0.004702
F-statistic: 5.719 on 1 and 998 DF, p-value: 0.01696
Interpretimi i rezultateve te modelit:
1. \(Interpretimi\) \(i\) \(mbetjeve\): mbetjet paraqesin diferencat midis vlerave te vezhguara te ndryshores se varur \(ushqimi\) dhe vlerave te parashikuara nga modeli. Rangu i mbetjeve te modelit sugjeron qe modeli nuk i pershtatet ne menyre te persosur te dhenave. Mbetja minimale eshte -160.913, Kuartili i pare eshte -77.172, mesatarja eshte 3.852, Q3 eshte 74.67 dhe vlera e mbetur 155.764 eshte mbetja maksimale.
2. \(Intercept\) \((nderprerja)\) 236.80006 paraqet vleren e parashikuar te ndryshores se varur \(ushqimi\) kur ndryshorja e pavarur \(mjetet\) \(shkollore\) eshte 0. Koeficienti per variablin e pavarur eshte 0.09065. Kjo do te thote qe per cdo rritje prej 1 njesi ne variablin e pavarur, ndryshorja e varur parashikohet te rritet me 0.09065 njesi, duke mbajtur te gjithe variablat e tjere konstante.
3. \(P-value\): vlera \(p\) per variablin e pavarur eshte 0.01696, e cila eshte me e vogel se niveli i rendesise 0.05. Kjo tregon se lidhja midis variablit te pavarur dhe ndryshores se varur eshte statistikisht e rendesishme, qe do te thote se ndryshorja e pavarur ka nje efekt te rendesishem ne variablin e varur.
4. \(Vlera\) \(R^2\) eshte 0.005698, qe do te thote se ndryshorja e pavarur shpjegon rreth 0.57% te variances ne variablin e varur.
Ne menyre te permbledhur rezultatet sugjerojne se variabli i pavarur \(mjetet\) \(shkollore\) ka nje efekt statistikisht domethenes por relativisht te vogel ne variablin e varur \(ushqimi\).
Duke qene se modeli yne ka 3 yje, aq me shume te rendesishme jane rezultatet e modelit te ndertuar.
Nese bejme plot te modelit qe kemi krijuar ate do te marrim 4 grafike:
1.Residuals vs Fitted;
2.Normal Q-Q;
3.Fitted vs sqrt (standardized residuals);
4.Leverage vs standardized residuals;
\[Scale\ Location\ Plot\]
model <- lm(ushqimi~mjetet_shkollore, data = data)
plot(model)
NA
\[Residuals\ vs\ Fitted\ Plot\] Grafiku i pare na tregon qe nuk kemi nje varesi lineare ndermjet te dhenave tona. Pra nuk jemi ne homoskedasticitet.
\[Q-Q\ Plot\] Nje menyre tjeter paraqitje per te dhenat tona eshte dhe grafiku Normal Q-Q. Ky lloj grafiku perdoret per te percaktuar nese nje grup te dhenash jane apo jo te shperndara normalisht. Nese ato jane, atehere te gjitha pikat e tij duhet te shtrihen ne nje vije te drejte diagonale. Ne rastin tone te dhenat nuk perputhen shume mire me shperndarjen normale, sepse nje pjese e konsiderueshme e pikave nuk shtrihen mbi drejtez. Kjo do te thote se shpenzimet minimale per ushqim nuk jane mjaftueshem te ulta sa duhen per tu perputhur me drejtezen dhe shpenzimet max nuk jane mjaftueshem te larta sa te shtrihen mbi drejtez. Kjo tregon se bishtat e shperndarjes jane shume “te dobet” ne krahasim me shperndarjen.
\[Scale\ Location\ Plot\]
Grafiku Scale-Location eshte i njejte me grafikun e pare, por me ndryshimin qe ne boshtin vertikal kemi mbetjet e standartizuara. Shikohet dhe ketu lidhja ndermjet mbetjeve te standartizuara dhe vlerave te pershtatura dhe shikohet homoskedasticiteti. Ne rastin tone po te shikohet vija e kuqe ajo eshte thuajse horizontale pra nuk ka ndonje lidhje mes vlerave te perafruara dhe mbetjeve te standartizuara.
\[Residuals\ vs.\ Leverage\ Plot\]
Grafiku Residuals vs. Leverage na lejon te shikojme nese ka vezhgime qe ndikojme ne modelin e regresit qe kemi ndertuar. E thene ndryshe, shikojme nese ka vezhgime qe po te mos merren ne konsiderate ndryshojne modelin tone.
y_i_perafruar <- fitted.values(model1)
View(y_i_perafruar)
y_i_parashikuar <- predict(model1, newdata = data, interval='confidence')
head(y_i_parashikuar, 500)
fit lwr upr
1 253.8421 248.3698 259.3144
2 259.6437 251.7707 267.5166
3 245.7743 237.9812 253.5674
4 257.0148 250.5456 263.4840
5 254.3860 248.8161 259.9559
6 248.6751 242.3845 254.9657
7 256.1083 250.0201 262.1965
8 259.5530 251.7341 267.3719
9 250.8507 245.2707 256.4307
10 247.9499 241.3318 254.5680
11 253.3889 247.9711 258.8066
12 257.4681 250.7854 264.1507
13 248.2219 241.7310 254.7127
14 261.6379 252.5020 270.7738
15 245.1398 236.9623 253.3173
16 252.6637 247.2807 258.0466
17 249.5816 243.6417 255.5215
18 253.8421 248.3698 259.3144
19 241.4232 230.7592 252.0871
20 256.3803 250.1847 262.5759
21 242.8736 233.2183 252.5289
22 262.3631 252.7400 271.9862
23 263.4509 253.0759 273.8259
24 249.7629 243.8843 255.6414
25 243.6894 234.5822 252.7966
26 255.9270 249.9069 261.9472
27 262.9070 252.9109 272.9032
28 260.3688 252.0517 268.6860
29 249.5816 243.6417 255.5215
30 245.4117 237.4008 253.4226
31 249.2190 243.1473 255.2907
32 248.9470 242.7690 255.1251
33 250.1255 244.3601 255.8909
34 258.9185 251.4686 266.3683
35 245.7743 237.9812 253.5674
36 255.2925 249.4868 261.0982
37 262.3631 252.7400 271.9862
38 253.6608 248.2133 259.1083
39 259.3717 251.6601 267.0833
40 242.8736 233.2183 252.5289
41 250.4881 244.8224 256.1537
42 250.6694 245.0484 256.2904
43 259.3717 251.6601 267.0833
44 260.7314 252.1853 269.2776
45 257.6494 250.8775 264.4213
46 253.3889 247.9711 258.8066
47 253.5702 248.1336 259.0067
48 244.7772 236.3734 253.1810
49 246.3182 238.8415 253.7950
50 254.7486 249.0949 260.4023
51 253.7515 248.2921 259.2108
52 248.4032 241.9943 254.8120
53 243.5987 234.4314 252.7660
54 243.2361 233.8264 252.6459
55 250.0348 244.2424 255.8273
56 258.2839 251.1840 265.3838
57 248.8564 242.6415 255.0713
58 263.9042 253.2095 274.5988
59 256.8335 250.4456 263.2214
60 254.9299 249.2290 260.6308
61 250.3068 244.5930 256.0206
62 249.5816 243.6417 255.5215
63 258.6465 251.3491 265.9439
64 243.7800 234.7327 252.8274
65 254.0234 248.5224 259.5244
66 260.0969 251.9486 268.2452
67 249.4003 243.3960 255.4046
68 262.8164 252.8828 272.7499
69 252.3011 246.9109 257.6913
70 246.6808 239.4074 253.9542
71 258.5559 251.3085 265.8032
72 257.4681 250.7854 264.1507
73 245.0491 236.8155 253.2828
74 246.4089 238.9836 253.8342
75 258.1933 251.1417 265.2449
76 246.3182 238.8415 253.7950
77 260.2782 252.0176 268.5388
78 242.3297 232.3008 252.3585
79 250.2161 244.4769 255.9553
80 254.8392 249.1624 260.5161
81 241.7858 231.3776 252.1940
82 254.2954 248.7441 259.8466
83 246.6808 239.4074 253.9542
84 243.5987 234.4314 252.7660
85 252.5730 247.1898 257.9563
86 250.9413 245.3804 256.5022
87 253.1169 247.7199 258.5139
88 247.7686 241.0629 254.4744
89 260.7314 252.1853 269.2776
90 241.4232 230.7592 252.0871
91 258.3746 251.2260 265.5232
92 261.0940 252.3148 269.8733
93 242.6016 232.7603 252.4429
94 242.5110 232.6073 252.4146
95 244.5052 235.9289 253.0816
96 245.4117 237.4008 253.4226
97 244.1426 235.3327 252.9526
98 247.2247 240.2436 254.2058
99 246.4089 238.9836 253.8342
100 250.0348 244.2424 255.8273
101 256.6522 250.3432 262.9613
102 247.3154 240.3814 254.2493
103 243.7800 234.7327 252.8274
104 259.5530 251.7341 267.3719
105 244.1426 235.3327 252.9526
106 256.7429 250.3947 263.0910
107 255.0205 249.2947 260.7464
108 261.8192 252.5627 271.0758
109 243.2361 233.8264 252.6459
110 249.4909 243.5192 255.4627
111 247.1341 240.1054 254.1627
112 244.7772 236.3734 253.1810
113 242.5110 232.6073 252.4146
114 261.9099 252.5928 271.2270
115 250.6694 245.0484 256.2904
116 253.2982 247.8884 258.7080
117 245.5930 237.6917 253.4944
118 255.6551 249.7315 261.5787
119 242.5110 232.6073 252.4146
120 242.5110 232.6073 252.4146
121 260.0969 251.9486 268.2452
122 253.3889 247.9711 258.8066
123 253.7515 248.2921 259.2108
124 258.8278 251.4292 266.2264
125 251.1226 245.5972 256.6481
126 248.1312 241.5985 254.6639
127 254.2954 248.7441 259.8466
128 247.6780 240.9276 254.4283
129 261.7286 252.5325 270.9247
130 246.1369 238.5562 253.7177
131 248.2219 241.7310 254.7127
132 246.7715 239.5479 253.9951
133 261.4566 252.4405 270.4728
134 262.5444 252.7977 272.2912
135 257.3774 250.7386 264.0163
136 249.4003 243.3960 255.4046
137 251.7572 246.3255 257.1889
138 253.3889 247.9711 258.8066
139 243.8707 234.8831 252.8583
140 254.6579 249.0266 260.2893
141 245.4117 237.4008 253.4226
142 262.0005 252.6226 271.3785
143 248.8564 242.6415 255.0713
144 258.1026 251.0988 265.1064
145 247.4060 240.5187 254.2933
146 248.2219 241.7310 254.7127
147 241.5138 230.9140 252.1136
148 254.9299 249.2290 260.6308
149 263.0883 252.9665 273.2101
150 257.1961 250.6432 263.7490
151 256.1083 250.0201 262.1965
152 256.1990 250.0757 262.3223
153 245.4117 237.4008 253.4226
154 247.4967 240.6555 254.3378
155 257.5587 250.8317 264.2857
156 261.9099 252.5928 271.2270
157 250.6694 245.0484 256.2904
158 263.7229 253.1565 274.2893
159 246.8621 239.6879 254.0363
160 244.0520 235.1830 252.9209
161 254.8392 249.1624 260.5161
162 242.7829 233.0658 252.5000
163 241.6045 231.0687 252.1403
164 244.8678 236.5210 253.2146
165 245.6837 237.8366 253.5307
166 252.5730 247.1898 257.9563
167 246.9528 239.8275 254.0780
168 253.7515 248.2921 259.2108
169 262.6351 252.8262 272.4439
170 258.6465 251.3491 265.9439
171 262.0005 252.6226 271.3785
172 260.2782 252.0176 268.5388
173 261.0034 252.2828 269.7240
174 263.9948 253.2358 274.7538
175 244.5959 236.0773 253.1145
176 249.0377 242.8958 255.1796
177 241.8764 231.5318 252.2210
178 244.9585 236.6684 253.2486
179 246.6808 239.4074 253.9542
180 252.1198 246.7198 257.5197
181 242.9642 233.3706 252.5578
182 256.6522 250.3432 262.9613
183 260.7314 252.1853 269.2776
184 251.2133 245.7042 256.7224
185 262.9977 252.9388 273.0566
186 243.3268 233.9779 252.6756
187 254.8392 249.1624 260.5161
188 258.5559 251.3085 265.8032
189 250.0348 244.2424 255.8273
190 254.5673 248.9573 260.1773
191 246.1369 238.5562 253.7177
192 248.4938 242.1250 254.8626
193 249.1283 243.0219 255.2348
194 257.5587 250.8317 264.2857
195 255.0205 249.2947 260.7464
196 251.1226 245.5972 256.6481
197 256.9242 250.4959 263.3524
198 253.5702 248.1336 259.0067
199 250.4881 244.8224 256.1537
200 250.1255 244.3601 255.8909
201 255.6551 249.7315 261.5787
202 254.5673 248.9573 260.1773
203 242.4203 232.4541 252.3865
204 254.8392 249.1624 260.5161
205 263.2696 253.0215 273.5177
206 262.8164 252.8828 272.7499
207 251.3039 245.8102 256.7977
208 263.6322 253.1298 274.1347
209 247.0434 239.9667 254.1202
210 247.0434 239.9667 254.1202
211 261.8192 252.5627 271.0758
212 244.5959 236.0773 253.1145
213 243.1455 233.6746 252.6164
214 242.0577 231.8398 252.2756
215 243.9613 235.0332 252.8895
216 243.2361 233.8264 252.6459
217 248.1312 241.5985 254.6639
218 262.9070 252.9109 272.9032
219 251.2133 245.7042 256.7224
220 262.5444 252.7977 272.2912
221 247.3154 240.3814 254.2493
222 242.3297 232.3008 252.3585
223 244.8678 236.5210 253.2146
224 257.5587 250.8317 264.2857
225 261.3660 252.4094 270.3226
226 247.7686 241.0629 254.4744
227 246.9528 239.8275 254.0780
228 254.5673 248.9573 260.1773
229 249.0377 242.8958 255.1796
230 242.0577 231.8398 252.2756
231 257.2868 250.6912 263.8824
232 243.0549 233.5227 252.5870
233 258.2839 251.1840 265.3838
234 256.0177 249.9639 262.0715
235 256.9242 250.4959 263.3524
236 243.5081 234.2805 252.7357
237 256.4709 250.2382 262.7037
238 248.2219 241.7310 254.7127
239 254.7486 249.0949 260.4023
240 254.7486 249.0949 260.4023
241 243.9613 235.0332 252.8895
242 250.5787 244.9358 256.2216
243 246.0463 238.4130 253.6796
244 260.9127 252.2505 269.5749
245 247.6780 240.9276 254.4283
246 245.4117 237.4008 253.4226
247 248.8564 242.6415 255.0713
248 252.9356 247.5473 258.3239
249 258.8278 251.4292 266.2264
250 247.0434 239.9667 254.1202
251 256.6522 250.3432 262.9613
252 248.1312 241.5985 254.6639
253 242.6016 232.7603 252.4429
254 247.8593 241.1976 254.5209
255 261.0034 252.2828 269.7240
256 258.8278 251.4292 266.2264
257 257.7400 250.9228 264.5573
258 252.8450 247.4595 258.2305
259 245.5024 237.5464 253.4583
260 260.0063 251.9136 268.0989
261 255.6551 249.7315 261.5787
262 262.8164 252.8828 272.7499
263 254.6579 249.0266 260.2893
264 261.0034 252.2828 269.7240
265 261.3660 252.4094 270.3226
266 263.2696 253.0215 273.5177
267 250.9413 245.3804 256.5022
268 263.8135 253.1831 274.4440
269 244.8678 236.5210 253.2146
270 255.9270 249.9069 261.9472
271 242.0577 231.8398 252.2756
272 241.6045 231.0687 252.1403
273 244.4146 235.7802 253.0490
274 255.1112 249.3596 260.8628
275 245.1398 236.9623 253.3173
276 245.2304 237.1087 253.3521
277 254.1141 248.5973 259.6308
278 252.6637 247.2807 258.0466
279 252.2104 246.8158 257.6050
280 244.7772 236.3734 253.1810
281 260.9127 252.2505 269.5749
282 258.0120 251.0555 264.9684
283 248.9470 242.7690 255.1251
284 243.9613 235.0332 252.8895
285 262.1818 252.6817 271.6820
286 247.8593 241.1976 254.5209
287 256.1990 250.0757 262.3223
288 245.5930 237.6917 253.4944
289 257.6494 250.8775 264.4213
290 253.6608 248.2133 259.1083
291 256.2896 250.1305 262.4487
292 242.5110 232.6073 252.4146
293 249.5816 243.6417 255.5215
294 251.8478 246.4256 257.2701
295 252.3011 246.9109 257.6913
296 243.4174 234.1293 252.7056
297 248.8564 242.6415 255.0713
298 255.2925 249.4868 261.0982
299 260.8221 252.2180 269.4261
300 252.4824 247.0978 257.8669
301 250.5787 244.9358 256.2216
302 246.3182 238.8415 253.7950
303 256.1083 250.0201 262.1965
304 256.1990 250.0757 262.3223
305 247.5873 240.7918 254.3828
306 247.4060 240.5187 254.2933
307 256.1083 250.0201 262.1965
308 262.9070 252.9109 272.9032
309 245.5024 237.5464 253.4583
310 246.5902 239.2665 253.9138
311 245.6837 237.8366 253.5307
312 262.0005 252.6226 271.3785
313 250.1255 244.3601 255.8909
314 256.9242 250.4959 263.3524
315 254.2047 248.6711 259.7383
316 259.0998 251.5463 266.6532
317 247.1341 240.1054 254.1627
318 243.2361 233.8264 252.6459
319 244.5959 236.0773 253.1145
320 252.4824 247.0978 257.8669
321 241.6045 231.0687 252.1403
322 256.5616 250.2910 262.8322
323 258.2839 251.1840 265.3838
324 256.7429 250.3947 263.0910
325 243.1455 233.6746 252.6164
326 261.4566 252.4405 270.4728
327 258.2839 251.1840 265.3838
328 260.6408 252.1523 269.1293
329 243.6894 234.5822 252.7966
330 254.7486 249.0949 260.4023
331 247.4060 240.5187 254.2933
332 242.5110 232.6073 252.4146
333 247.1341 240.1054 254.1627
[ reached getOption("max.print") -- omitted 167 rows ]
Ky kod na sherben per te identifikuar vlerat e parashikuara. Do te na hapet nje dritare e re e cila ne nje tabele do te kete te afishuara te gjitha vlerat e parashikuara.
Le te ndertojme nje grafik per vlerat tona te parashikuara:
plot(y_i_parashikuar)
Nisur nga grafiku i ndertuar me siper, shperndarja e vlerave ne ate grafik ngjason me nje shperndarje logaritmike. Mund te themi se me rritjen e vlerave ne boshtin ‘fit’ nuk kemi nje rritje lineare te vlerave ne boshtin ‘lwr’. Pra vlerat y nuk rriten ne menyre te drejteperdrejte me ato ne boshtin horizontal.
Libraria \(GGally\) na ofron nje opsion te ngjashem me \(ggplot\): ate te quajtur \(ggpairs\). Nga fjala pairs menjehere nenkuptojme cifte, pra cifte ose grupime grafikesh.
Me poshte kemi ndertuar nje paired plot per 9 variabla te datasetit tone.
library(GGally)
Warning: package ‘GGally’ was built under R version 4.3.3
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
ggpairs(data[, c(1, 2, 3, 4, 5, 6, 7, 8, 9)], ggplot2::aes(color="red"))
plot: [1, 1] [>--------------------------------------------------------------------------------------------------] 1% est: 0s
plot: [1, 2] [=>-------------------------------------------------------------------------------------------------] 2% est: 9s
plot: [1, 3] [===>-----------------------------------------------------------------------------------------------] 4% est:15s
plot: [1, 4] [====>----------------------------------------------------------------------------------------------] 5% est:17s
plot: [1, 5] [=====>---------------------------------------------------------------------------------------------] 6% est:16s
plot: [1, 6] [======>--------------------------------------------------------------------------------------------] 7% est:15s
plot: [1, 7] [========>------------------------------------------------------------------------------------------] 9% est:15s
plot: [1, 8] [=========>-----------------------------------------------------------------------------------------] 10% est:14s
plot: [1, 9] [==========>----------------------------------------------------------------------------------------] 11% est:14s
plot: [2, 1] [===========>---------------------------------------------------------------------------------------] 12% est:14s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [2, 2] [============>--------------------------------------------------------------------------------------] 14% est:16s
plot: [2, 3] [==============>------------------------------------------------------------------------------------] 15% est:16s
plot: [2, 4] [===============>-----------------------------------------------------------------------------------] 16% est:15s
plot: [2, 5] [================>----------------------------------------------------------------------------------] 17% est:16s
plot: [2, 6] [=================>---------------------------------------------------------------------------------] 19% est:16s
plot: [2, 7] [===================>-------------------------------------------------------------------------------] 20% est:16s
plot: [2, 8] [====================>------------------------------------------------------------------------------] 21% est:15s
plot: [2, 9] [=====================>-----------------------------------------------------------------------------] 22% est:15s
plot: [3, 1] [======================>----------------------------------------------------------------------------] 23% est:15s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [3, 2] [=======================>---------------------------------------------------------------------------] 25% est:15s
plot: [3, 3] [=========================>-------------------------------------------------------------------------] 26% est:15s
plot: [3, 4] [==========================>------------------------------------------------------------------------] 27% est:15s
plot: [3, 5] [===========================>-----------------------------------------------------------------------] 28% est:14s
plot: [3, 6] [============================>----------------------------------------------------------------------] 30% est:14s
plot: [3, 7] [==============================>--------------------------------------------------------------------] 31% est:14s
plot: [3, 8] [===============================>-------------------------------------------------------------------] 32% est:13s
plot: [3, 9] [================================>------------------------------------------------------------------] 33% est:13s
plot: [4, 1] [=================================>-----------------------------------------------------------------] 35% est:13s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [4, 2] [==================================>----------------------------------------------------------------] 36% est:14s
plot: [4, 3] [====================================>--------------------------------------------------------------] 37% est:14s
plot: [4, 4] [=====================================>-------------------------------------------------------------] 38% est:14s
plot: [4, 5] [======================================>------------------------------------------------------------] 40% est:14s
plot: [4, 6] [=======================================>-----------------------------------------------------------] 41% est:14s
plot: [4, 7] [=========================================>---------------------------------------------------------] 42% est:13s
plot: [4, 8] [==========================================>--------------------------------------------------------] 43% est:13s
plot: [4, 9] [===========================================>-------------------------------------------------------] 44% est:13s
plot: [5, 1] [============================================>------------------------------------------------------] 46% est:13s
plot: [5, 2] [=============================================>-----------------------------------------------------] 47% est:12s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [5, 3] [===============================================>---------------------------------------------------] 48% est:12s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [5, 4] [================================================>--------------------------------------------------] 49% est:13s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [5, 5] [=================================================>-------------------------------------------------] 51% est:13s
plot: [5, 6] [==================================================>------------------------------------------------] 52% est:12s
plot: [5, 7] [====================================================>----------------------------------------------] 53% est:12s
plot: [5, 8] [=====================================================>---------------------------------------------] 54% est:12s
plot: [5, 9] [======================================================>--------------------------------------------] 56% est:11s
plot: [6, 1] [=======================================================>-------------------------------------------] 57% est:11s
plot: [6, 2] [========================================================>------------------------------------------] 58% est:10s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6, 3] [==========================================================>----------------------------------------] 59% est:10s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6, 4] [===========================================================>---------------------------------------] 60% est:10s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [6, 5] [============================================================>--------------------------------------] 62% est:10s
plot: [6, 6] [=============================================================>-------------------------------------] 63% est:10s
plot: [6, 7] [===============================================================>-----------------------------------] 64% est: 9s
plot: [6, 8] [================================================================>----------------------------------] 65% est: 9s
plot: [6, 9] [=================================================================>---------------------------------] 67% est: 8s
plot: [7, 1] [==================================================================>--------------------------------] 68% est: 8s
plot: [7, 2] [===================================================================>-------------------------------] 69% est: 8s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7, 3] [=====================================================================>-----------------------------] 70% est: 8s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7, 4] [======================================================================>----------------------------] 72% est: 7s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [7, 5] [=======================================================================>---------------------------] 73% est: 7s
plot: [7, 6] [========================================================================>--------------------------] 74% est: 7s
plot: [7, 7] [==========================================================================>------------------------] 75% est: 6s
plot: [7, 8] [===========================================================================>-----------------------] 77% est: 6s
plot: [7, 9] [============================================================================>----------------------] 78% est: 6s
plot: [8, 1] [=============================================================================>---------------------] 79% est: 5s
plot: [8, 2] [==============================================================================>--------------------] 80% est: 5s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [8, 3] [================================================================================>------------------] 81% est: 5s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [8, 4] [=================================================================================>-----------------] 83% est: 4s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [8, 5] [==================================================================================>----------------] 84% est: 4s
plot: [8, 6] [===================================================================================>---------------] 85% est: 4s
plot: [8, 7] [=====================================================================================>-------------] 86% est: 3s
plot: [8, 8] [======================================================================================>------------] 88% est: 3s
plot: [8, 9] [=======================================================================================>-----------] 89% est: 3s
plot: [9, 1] [========================================================================================>----------] 90% est: 2s
plot: [9, 2] [=========================================================================================>---------] 91% est: 2s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [9, 3] [===========================================================================================>-------] 93% est: 2s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [9, 4] [============================================================================================>------] 94% est: 1s
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot: [9, 5] [=============================================================================================>-----] 95% est: 1s
plot: [9, 6] [==============================================================================================>----] 96% est: 1s
plot: [9, 7] [================================================================================================>--] 98% est: 1s
plot: [9, 8] [=================================================================================================>-] 99% est: 0s
plot: [9, 9] [===================================================================================================]100% est: 0s
Ne grafik verejme pjeset perberese te tij. Ai perbehet nga grafike densiteti, box plote, histograma, dhe gjithashtu koeficiente korrelacioni te cilet na tregojne varesine midis ndryshoreve cilesore. Pra me ane te ketij grafiku ne marrim ne nje fare menyre nje permbledhje te gjithe vizualizimit te kryer.
Cfare tregojne vlerat ne perqindje?
\(plot: [2, 1] [===========>---------------]\) 12% tregon qe ne nje moment te kohes (gjate ekzekutimit te kodit) eshte duke u krijuar grafiku i pozicionuar ne (2,1) ne matrice dhe perben 12% te saj.
\(stat\)_\(bin()\) \(using\) \(bins =
30\) do te thote se funksioni \(stat\)_\(bin\) po perdor 30 pika per histogramet dhe
tenton te zgjedhi nje vlere sa me te mire \(binwidth\) per te permiresuar paraqitjen e
tyre.
Ne menyre te ngjashme arsyetojme per cdo element tjeter te afishuar.
Ashtu sic e kemi cituar ne hyrje te projektit, \(Student\) \(Spending\) \(Dataset\) ishte nje zgjedhje e jona per te studiuar dhe analizuar nje fenomen (nese do te quhet i tille) i cili na karakterizon te gjitheve ne si studente ne jeten tone te perditshme studentore. Pra per te analizuar shpenzimet e nje grupi studentesh me ane te metodave pergjithesisht te njohura statistikore, ku ne ndihme na vjen gjuha R dhe mjedisi i saj i zhvillimit \(RStudio\).
Duke qene projekti i pare i mirefillte per ne, gjate gjithe kohes jemi perpjekur qe te tregohemi te sakte dhe te qarte ne strukturimin, analizimin dhe interpretimin e te dhenave tona. Jemi perpjekur tu kushtojme rendesi dhe vemendje te gjitha elementeve statistikore, madhesive te njohura, koncepteve.
Mbyllja e ketij projekti sigurisht qe do te perfshije ato cka ne kishim si objektiv: nje permbledhje te strukturuar te gjithe rezultateve te nxjerra nga cdo llogaritje statistikore e kryer mbi dataset.
Mosha mesatare e studenteve te zgjedhur eshte 21 vjec, ndersa gjinia dominuese eshte ajo mashkullore.
Te ardhurat mujore mesatare te nje studenti rezultojne $1020.65.
Nga rezultatet e nxjerra arritem ne perfundimin se studentet shpenzonin me shume per shkollim, ndjekur nga strehimi dhe ushqimi.
Sipas gjinise me shume shpenzime benin meshkujt ne kategorite shkollim ,strehim , ushqim dhe teknologji ,ndersa vajzat shpenzonin me shume ne kategorite e tjera si: transport, mjete shkollore, argetim, kujdes personal, mireqenie shendetesore dhe te ndryshme.
Sipas vitit te studimit per shkollim me teper shpenzonin studentet \(Junior\) ndjekur nga \(Sophomore\), \(Freshman\) dhe me pak \(Senior\).
Per strehim me shume shpenzonin \(Senior\) dhe me pak \(Sophomore\).
Per ushqim dhe kujdes personal me shume shpenzojne \(Seniors\), ndersa me pak \(Junior\).
Per transport, mjete shkollore, argetim, teknologji me shume shpenzojne \(Freshman\), ndersa me pak \(Junior\).
Sipas deges se diplomuar me shume per shkollim shpenzojne studentet e psikologjise dhe me pak ata te inxhiniersie.
Per strehim me shume shpenzojne ata te shkencave kompjuterike dhe me pak te ekonomikut.
Per ushqim me shume shpenzojne studentet e ekonomikut, ndersa me pak ata per shkenca kompjuterike.
Per transport me shume shpenzojne studentet e psikologjise, ndersa me pak ata te inxhinierise.
Per mjete shkollore, argetim, teknologji, mireqenie shendetesore shpenzojne me shume studentet e biologjise.
Per varesine midis ndyshoreve te datasetit nga matrica e korrelacionit dhe testeve Hi-katror dolem ne perfundim qe ndryshoret jane te pavarura me njera-tjetren.
Nga kjo pavaresi edhe modelet e testeve parashikuese rezultuan me nje marzh shpjegimi shume te ulet.
Ne analizen e te dhenave gjate punes se pavarur shpesh jemi ndeshur ne situata per te cilat kishim nevoje per nje shpjegim dhe orientim. Per kete arsye i jemi referuar tekstit te statistikes, por edhe faqeve si \(google.com\), platformave si \(wikipedia.com\), \(geekforgeeks.com\), \(youtube.com\) etj.. Me poshte listohen disa nga sitet ne te cilat perfituam nje sasi informacioni:
Credits to: \(Hyrje\) \(ne\) \(Statistiken\) \(e\) \(Zbatuar\) \(3\) nga \(Prof.\) \(Dr.\) \(Llukan\) \(Puka\);
Link to dataset: https://www.kaggle.com/datasets/sumanthnimmagadda/student-spending-dataset/data
https://posit.co/download/rstudio-desktop/
https://search.yahoo.com/search?fr=mcafee&type=E210US91215G0&p=imi+fshn
Chi-squared Test: https://search.yahoo.com/search?fr2=p%3ads%2cv%3aomn%2cm%3asa%2cbrws%3achrome%2cpos%3a1&fr=mcafee&type=E210US91215G0&p=chi-square+test
Mosaic Plot: https://search.yahoo.com/search?fr=mcafee&type=E210US91215G0&p=mosaic+plot
Word Cloud in R: https://www.marsja.se/how-to-create-a-word-cloud-in-r/#google_vignette
Visualization using \(ggplot2\): https://www.dataquest.io/blog/data-visualization-in-r-with-ggplot2-a-beginner-tutorial/
Residuals vs Leverage Plot: https://images.search.yahoo.com/search/images;_ylt=AwrEq4z_UlRmRAQAoeJXNyoA;_ylu=Y29sbwNiZjEEcG9zAzEEdnRpZAMEc2VjA3BpdnM-?p=residuals+vs+leverage+plot+in+r&fr2=piv-web&type=E210US91215G0&fr=mcafee
How to interpret a Scale-Location Plot: https://www.statology.org/scale-location-plot/
Predicted y values: https://search.yahoo.com/search?fr=mcafee&type=E210US91215G0&p=predicted+value+of+y+in+plots
R installation & more: https://youtu.be/FIrsOBy5k58?si=fDkTBu06nhf8UA0A
https://youtu.be/2_tW7e4e_dM?si=-sRLFJpb7hPEGoTf
https://youtu.be/YrEe2TLr3MI?si=fLCYGjaiW6VH6ZRg
https://youtu.be/48b4BzxHHH8?si=LOR1OgqnS-fAiAiL
https://youtu.be/Uo1C7Iligw0?si=5Eg3vqXA1aisYarv
https://youtu.be/ePD96i0YHII?si=zHZdVeTEwm6rMjzw
https://youtu.be/svgEpRzhG7M?si=n1IEDUfyOIgy9DT3
Link to esquisser: esquisse::esquisser(data, viewer = “browser”)
library(wordcloud)
Warning: package ‘wordcloud’ was built under R version 4.3.3
Loading required package: RColorBrewer
library(wordcloud2)
Warning: package ‘wordcloud2’ was built under R version 4.3.3
library(RColorBrewer)
fjalet <- c("Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup", "Adisa", "Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa",
"Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data", "Science","Science","Science","Science","Science","Science","Science","Science","Engineer","Engineer","Engineer","Engineer","Engineer","Engineer","Engineer", "FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI", "2024","2024","2024","2024","2024","2024","2024","2024","2024","2024","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup","Jakup", "Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Adisa","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data","Data", "Science","Science","Science","Science","Science","Science","Science","Science","Engineer","Engineer","Engineer","Engineer","Engineer","Engineer","Engineer", "FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","FSHN","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI","IMI", "2024","2024","2024","2024","2024","2024","2024","2024","2024","2024")
frekuencat <- rep(frekuencat, length.out = length(fjalet))
data <- data.frame(word = fjalet, freq = frekuencat)
wordcloud2(data = data, size = 0.1, shape='random', color='random-dark')