May 13, 2019

Hello,

I am Hung, a geophysicist with a strong interest to apply data science and machine learning skills into solving geoscience problems.

I used to work in the Oil and Gas industry as a seismic attribute analyst for 3 years before pursuing a graduate study program in 2012. In 2017, during my internship, I met a genius mathematician named Anton. After reviewing my neural network to predict reservoir properties, he suggested me take some advanced courses about Machine learning. I started my first Machine learning course on Coursera (Stanford University) in October 2017. Upon completion, I just loved it and decided to take the more comprehensive specializations: Deep learning (deeplearning.ai) and Data science (Johns Hopkins University). Along with writing up my PhD thesis, I have carried out the study. In the summer 2018, I have been so self proud of completing these specializations.

At the end of the year, I have graduated from the University of Alberta (Canada) with an accepted offer to work in a geoscience service company in the UK. I took the first 3 months of 2019 to travel and started working since April. Here, I again started looking for more Data sicence courses to self study in the evening. Currently, I’m finishing the IBM Data science professional certificate (IBM) and plan to obtain the Applied data science with Python (University of Michigan) afterward.

On this page, I would like to introduce some of my projects on data science and applied machine learning. Currently, I do not have them on GitHub, but I will to do that at my earliest convenience. Thanks for your visit today – actually it’s my birthday :)

Update on March 2020:

Cheers,

Hung

City livability clustering

In this project I find the living condition similarity between the most 358 popular cities in the world. The similarity is measured based on 6 main living indexes, including crime, health, culture, nature, education, and infrastructure. Three main city groups are determined associating to living, working and resting conditions.

https://github.com/nhohung/CityLivabilityClustering

Weather forecast

This is a small app showing weather forecast for my 2 interested cities: Hanoi (Vietnam) and Crawley (UK). Hanoi is where my family live, and Cralwey is my resident city. The app involves collecting, organizing and representing the json files returned from api calls to a preferred weather provider. It is made to display on phone screens with minimalistic but informative style. Try it out here:

https://nhohung.shinyapps.io/Weather1/

The text predictor

This is the final Data science project where I knew nothing about the topic and literature material: build the next-word predictor when the user starts with some words. You must be familiar with this function as it is integrated into the virtual keyboard on the phone you use every day. However, implementing it is so challenging. I did a research on the topic of Natural language processing, how to do it, and how to optimize it. Please give it a try here:

https://nhohung.shinyapps.io/TextPrediction/

And also, I recommend having a look at what behind the scene: the processing part (http://rpubs.com/nhohung/NLP_processing) and the prediction model (http://rpubs.com/nhohung/NLP_prediction).

Body movement predictor

In this project, I got to use a machine learning method to predict the type of movement that a volunteer is exercising based on data collected from sensors attached to his body. When reading about how the data is acquired, and where the sensors are positioned, I think it is an easy task, because intuitively some movements would be separated from others simply by the differences of sensor placement. However, a quick exploratory analysis has killed all my hope, and I have to do a more sophisticated classification to solve it. See how I tackle the problem here:

http://rpubs.com/nhohung/HumanActivitiesNew

Analyzing the natural disaster impact on human

This project is a typical example of what data science involves: data cleaning. I was provided a real-life dataset which lists the cost/damage of different types of natural catastrophe on human lives and economy and was asked to summarize which type causes the most damage to human. The task sounds simple enough, except a problem that the inputs are heavily contaminated by errors: typo, duplication, and poor categorization. I got to reprocess the whole data before sorting them out. Find the details how I did that here:

http://rpubs.com/nhohung/Health_Weather