- Model 1: We use classmates’ tracking data to analyze the hot spot and recommend a bus route for ISS staff and students use including purchasing stuff, going sightseeing, relaxing in park, etc.
- Model 2: We look forward to finding out the hot spot of relaxation with track data and furthermore identifying locations to invest in building entertainment and relax places.
|Dataset Name||Dataset Source Links|
|Track data of all classmates (Openpath, Moves, etc.)||IVLE|
|Singapore Tourist attraction, Supermarket, Hawker, Secondhand collection, Park, Museum, Library, Historical Places, Gym, Bus Stop, Subzone Population||https://data.gov.sg/|
- Language: R
- Tool: R Studio
- Load and check data
- Data Cleaning & Feature engineering
Based on the location and time, we use gsub() to find the day, hour of each row.
|data$time = gsub(“.*\\s”,””,data$date)
data$hour = gsub(“:.*”,””,data$time)
data_elephant = data[data$ID == “email@example.com”,]
data_others = data[data$ID != “firstname.lastname@example.org”,]
data_others$day = gsub(“(/.*)|(-.*)”,””,data_others$date)
data_elephant$day = gsub(“(^[0-9]/)?|(/.*$)?”,””,data_elephant$date)
data = rbind(data_elephant,data_others)
data$day = gsub(“”,””,data$day)
data$latR = floor(data$lat * 100) / 100
data$lonR = floor(data$lon * 100) / 100
We create a new column “des”, which represents the attributes of each row (Home or Outside). We assume that if the data are collected between 0:00 AM to 5:00 AM, the location must be the user’s home. We then extract the data (0:00 AM ~ 5:00AM) out. By calculating the mean value of the longitude and latitude, we finally get the home’s longitude and latitude for each user. It is absolutely that Openpath and Moves will have some error when record the data. So we also have an error interval when assign the des value for each row.
|data$des[(data$hour <= 5) & (data$hour >= 0)] = “Home”
data_home = data[data$des == “Home”,]
data_home$home_lon = “null”
data_home$home_lat = “null”
##find home_lat & home_lon
data_home[data_home$ID == “email@example.com”,]$home_lon = mean(data_home[data_home$ID == “firstname.lastname@example.org”,]$lonR)
data_home[data_home$ID == “email@example.com”,]$home_lat = mean(data_home[data_home$ID == “firstname.lastname@example.org”,]$latR)
data_home = subset(data, !duplicated(ID))
data[data$ID == “email@example.com”,]$homelon = data_home[data_home$ID == “firstname.lastname@example.org”,]$home_lon
data[data$ID == “email@example.com”,]$homelat = data_home[data_home$ID == “firstname.lastname@example.org”,]$home_lat
#assign des value for other rows
data$des[abs((data$latR – data$home_lat) <= 0.007) & abs((data$lonR – data$home_lon) <= 0.007 )] = “Home”
We build two models for the two objectives.
- Model 1:
The picture below shows the structure of the first model’s process steps.
First， add all places’ data as layers and merge them as one layer. Then do the Hot Spot Analysis to calculate the hot spot (The red area in the map below).
Follow the model, we get this hot spot map. The red area means it’s a hot spot, and yellow means they are cold spots. Using the hotspot layer to clip the relax in SG layer, we get the focus area layer with relaxing places. But since there is no school bus service around ISS, it’s time-consuming to reach for a distant school bus stop. So, our group plans the bus route for us to easily get to relaxation places nearby. So, we create some clusters and do plan routes. The purple circles stand for the clusters we analyzed including parks, supermarkets, museums, etc. The purple line in the map shows the bus route we get to be built.
- Model 2:
The picture below represents the general idea of the second model.
The inputs for suitable areas are bus stop density, tracking data heat and subzone population. We use different analysis in ArcGIS to deal with our original data, then convert to raster, and reclassify the value of each area from 1 to 5. The darker color is, the higher value is
We set up weight as 30% for the bus stop, 40% for tracking data, and 30% for subzone population to form suitable area. The darker red area represents the place that has the more efficient transport and larger population, and more people visit.
Then we come to the second part of our model for restriction area. We used the data of all relaxation places(such as the park, library, tourism,entertainment, etc) that we concluded before as the restriction factors. We assume that every existing place could satisfy within 1.5 km nearby, so we set 1.5 km as the buffer radius. Then assign the value 0 as restriction area.
We use times function to calculate the value between suitable and restrict areas, select the area which value larger than 4, and finally narrow down to two locations to build entertainment facility: the northeast part of Woodlands and the Changi Airport.
The limitation is data accurateness. We believe our models could be commonly used, and the result may be more precise if the rough and partial original data from the various domain is fixed. With better data source, the outcome would be improved and solve more business objectives.
Our published map is shown below:
This link is our published map: http://arcg.is/2oCj5NO
NUS- ISS Course – M.Tech. Full Time
(National University of Singapore – Institute of Systems Science)