Objectives & Motivation:
Objective is to find a convenient place for students to stay based on data collected in Singapore
To any international student coming into Singapore needs robust information about convenient places to rent. To proceed with this we analyzed student’s movement data which was collected by students of NUS belonging to ISS school, exploratory spatial data analysis was done on that to find the pattern and insights. We considered that as a student basic amenities would be economical stay, cycling path, library, MRT closeness, parks, hawker center.
|Dataset description||Data Source URL|
|Data points of all NUS-ISS students||IVLE (Apps used: Openpaths)|
|Park Connector Loop||https://data.gov.sg/dataset/park-connector-loop|
OpenPath data with student’s personal location information was collected for the month of April. Data Cleaning was done to refine the data. As the data was stored on multiple mobile devices the date and time formats were inconsistent, thus, all the date and time field values were converted to a single standard format. To do more analysis, separate columns were created for date and time. The outliers were treated using R and the data points outside Singapore were ignored. Modified Name and MailID columns to obtain missing details using EXCEL.
Exploratory Analysis of Students Data (with derived fields):
From the above exploration, it is evident that most student stay at the university, travel to the residence and they use MRT most of the time. As students would prefer to stay near university, place near to MRT station and have the potential to use Cycling path, our aim is to find and suggest better zones for student life. Thus, we need to add these layers to the student data and carry out analysis.
To get more insights into a suitable dwelling, we tried to separately analyze various datasets such as HDB dwelling population data, Hawker center location, MRT station, and cycling paths.
Geographically Weighted Regression:
To find insights from student data using HDB and population data layer, geographically weighted regression is performed using variables as shown below with an assumption that data points having the timestamp of late night represent student’s home. Spatial join was done on hawker center, dwelling and open path student data layers to obtain final GWR model.
The count_ variable is the number of student data points per polygon, Count_1 is the number of Hawker centers per polygon, showsSHAPE_Area of the dwelling layer and HDB the total count of HDB per person.
The GWR results show that model can perform with moderate accuracy having adjusted R2 of 0.31.
Spatial Autocorrelation (Moran’s I) tool on the regression residuals was run to ensure that the model residuals are spatially random. Statistically significant clustering of high and/or low residuals (model under- and overprediction) indicates that our GWR model is not accurate enough to predict.
The below map indicates regions with localized R square to find HDB population using students data. The Dark Red regions are the places where the model predicts with higher accuracy and the blue regions are the place where its prediction is poor.
As the model residuals are not random based on the spatial autocorrelation ( Morons I ), it cannot be used for prediction purposes. This model was just build to study the insights of students data with another layer of data.
Thus, to find convenient places for students, the places were ranked using student data points and factors like MRT, cycling path and hawker centers availability.
Ranking zones based on Hawker Centres:
To get a density of hawker centers in each we calculated a new field which was used for ranking:
Ranking of the classified field by using reclassify tool to rank the regions based on the hawker center density which shows places ranked according to economic zones for food.
Ranking zones Based on proximity to cycling paths:
The areas are ranked as per its proximity of cycling paths and data is converted to raster data and the ranked.
Ranking zones based on MRT by reclassify:
The easy access to public transport is considered one of the major consideration while choosing a place to stay and have used proximity to MRT as a factor to rank areas. MRT location data is converted to raster data to rank areas based on its proximity to MRT stations
The final rank for each zone in Singapore was calculated based on the average of other three ranks (MRT, Hawker center, and Cycling path). Areas which are ranked good can be most favorable for staying. This final ranking can help to choose a place for staying based on individual priority.
Using the final ranking, we can recommend to a new student coming to study at NUS a convenient place to stay, considering MRT, Cycling and Food places in the ranking.
As expected the better ranking zones are crowded near NUS itself and there are other places also being suggested by the ranking.
As a future scope of this story, we can add a configurable element which can replace Hawker center layer with many another layer like Libraries, Parks, HDB rental prices, Bus stop layer to form an ideal tool for the upcoming student to use it.
For detailed analysis, please read the entire report: – Spatial-Temporal Analytics with Students Data
Explored and submitted by:
ARUN KUMAR BALASUBRAMANIAN (A0163264H)
DEVI VIJAYAKUMAR (A0163403R)
RAGHU ADITYA (A0163260N)
SHARVINA PAWASKAR (A0163302W)
LI MEIYAO (A0163379U)