A dissertation submitted in partial ful llment of the requirements for the degree of M.Sc. Data Science (School of Engineering & Informatics)
Author
Affiliation
Rabin Thapa
University of Wolverhampton
Published
02/18/2025 04:55:32 AM +0545
1 ABSTRACT
2 ACKNOWLEDGEMENT
3 INTRODUCTION
This report, created using Quarto in RStudio(Bauer and Landesvatter 2023), provides a visual analysis of a snapshot extracted from data of 2021 household census in England. Using ggplot2, we explore key demographic trends, focusing on factors like age, income, marital status and ethnicity. The main objective is to process the data which can be used to obtain interesting patterns and linear relationships between the variables through clear visualizations (Hoffmann 2021). The report offers insights that could inform future policies and improve understanding of the correlation between the variables.
4 Literature Review
5 Methodology
6 Data Pre processing
After installation and loading the necessary packages, data pre-processing in RStudio, data analysis begins with understanding the variables. As Kandel mentions, which is followed by cleaning and organizing the raw data to make it ready for analysis and visualization(Kandel et al. 2012).
6.1 Data Exploration:
To start the data exploration, we load the necessary library-tidyverse and read the data using the read_csv() function from the specified file path.
Code
reticulate::py_install("jupyter")
Using virtual environment "C:/Users/Dell/OneDrive/Documents/.virtualenvs/r-reticulate" ...
Cleaning the data includes dealing with missing values, changing categorical data into factors and renaming columns for better clarity. ID and Person_ID are variables with minimal feature importance which are removed from the data. The data was filtered to remove irrelevant or unusual entries.
Out of three categorical variables; we are changing Mar_Stat and Highest Ed to nominal numeric variable except Eth as illustrated below. This feature transformation is later applicable in regression analysis to understand the trends between the variables(Zeileis and Hothorn 2002).
7 Relation between the variables
Grouping the selected variables plays an important role to identify the strength of relation in the analysis of the demographic data(Yusuf, Martins, and Swanson 2014). Therefore, when, age is grouped into two parts, age up to 50 (Age <= 50) and above 50 (Age > 50). Following algorithm is applied to check its correlation coefficient with average income(INC).
def create_sequences(data, time_steps=30): X, y = [], []for i inrange(len(data) - time_steps): X.append(data[i:i+time_steps]) y.append(data[i+time_steps][0]) # Predicting FTSE index (Close Price)return np.array(X), np.array(y)time_steps =30X, y = create_sequences(df_scaled, time_steps)
8 Model Development:
Now, for diagrams, these two opposite linear relations can be visualized through scatter plot along with their best fitting regression line by using library-ggplot2 as illustrated in Fig 4.1.
future_steps =60X_future = X_test[-1:].copy() future_predictions = []for _ inrange(future_steps): pred = model.predict(X_future, verbose=0)# injecting the Noise to Increase Non-Linearity noise = np.random.normal(0, 0.02, size=pred.shape) pred += noise future_predictions.append(pred[0][0]) X_future = np.roll(X_future, -1, axis=1) X_future[0, -1, 0] = pred[0][0]
9 Data Visualization:
Now, for diagrams, these two opposite linear relations can be visualized through scatter plot along with their best fitting regression line by using library-ggplot2 as illustrated in Fig 4.1.
Now, for diagrams, these two opposite linear relations can be visualized through scatter plot along with their best fitting regression line by using library-ggplot2 as illustrated in Fig 4.1.
This analysis shows clear patterns between age, marital status, education and income among British citizens, but it has limitations. The data lacks details on regional, industry and socio-economic factors that could impact income differences(Howe et al. 2012). Furthermore, the simplified categories for ethnicity and marital status may overlook complex social influences on income. Future research would benefit from including more socio-economic factors and regional details. The policies supporting education of elderly people and relationship stability could help improve financial well-being across demographics.
13 CONCLUSION AND FUTURE WORKS
Up to the age of 50, income shows a strong positive link with age, but after 50, income tends to fall. This suggests, elderlfy people in UK at risk of low income. Marriage and stable relationships appear to support financial success, with married individuals generally earning more. There is also a clear income gap, with White individuals earning more than other ethnic groups, although women tend to earn more than men across all groups. These findings point to areas where future government policies could focus, such as supporting elderly education, and lunching social programmes to promote financial stability and equality across age, gender and ethnicity.
Howe, L. D., B. Galobardes, A. Matijasevich, D. Gordon, D. Johnston, O. Onwujekwe, R. Patel, E. A. Webb, D. A. Lawlor, and J. R. Hargreaves. 2012. “Measuring Socio-Economic Position for Epidemiological Studies in Low- and Middle-Income Countries: A Methods of Measurement in Epidemiology Paper.”International Journal of Epidemiology 41 (3): 871–86. https://doi.org/10.1093/ije/dys037.
Kandel, Sean, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012. “Enterprise Data Analysis and Visualization: An Interview Study.”IEEE Transactions on Visualization and Computer Graphics 18 (12): 2917–26. https://doi.org/10.1109/tvcg.2012.219.
@online{thapa2025,
author = {Thapa, Rabin},
title = {Integrative {Application} of {Neural} {Networks} for
{Predicting} {Global} {Stock} {Market} {Trends:} {A} {Data}
{Science} {Investigation} {Using} {Historical} {Data}},
date = {2025-02-18},
url = {https://www.researchgate.net/profile/Rabin-Thapa-8},
langid = {en}
}
For attribution, please cite this work as:
Thapa, Rabin. 2025. “Integrative Application of Neural Networks
for Predicting Global Stock Market Trends: A Data Science Investigation
Using Historical Data.” February 18, 2025. https://www.researchgate.net/profile/Rabin-Thapa-8.