Analytics And Intelligent Systems

NUS ISS AIS Practice Group

Promoting Healthy Lifestyle in Singapore — May 21, 2017

Promoting Healthy Lifestyle in Singapore

In line with HPB’s action plan to promote healthy lifestyle and active ageing to build a nation of healthy people and mitigate the ageing population issues, our team has made use of geospatial temporal intelligence to provide useful information and analysis to support this initiative and make it easier for us to take ownership of our own health.

We do complete view of the health related amenities on common visited places followed by the analysis such as daily behavior of people, steps analysis, ellipses, directional distribution, hotspots to get more insights.

Average working hours in Singapore:

01

  • Based on the statistics provided by the Ministry of Manpower*, the average working hours in Singapore for a full time is about 48 hours in a week (highlighted in red), which has exceeded the normal working hours of 40 hours in a week.
  • Although the average working hour has an improving trend over the years, people in Singapore still spend a substantial amount of time at work.
  • This leads to a challenge for working individuals to maintain a healthy lifestyle and work life balance.

*http://stats.mom.gov.sg/Pages/Hours-Worked-Summary-Table.aspx

Awareness of Health related Amenities:

  • To make it easier for working individuals in Singapore to maintain a healthy lifestyle amidst their busy work schedule, there is a need to raise awareness on the list of health related amenities based on their travel patterns.
  • In particular, in their course of travel between their workplace and home, it would be useful for them to be more aware of the complete list of health promoting amenities which are near and could be accessed easily to help them to overcome the time constraints and obstacles to achieve a health lifestyle.

Data Collection & Preparation

02

To kick start our initiative, we collected location data using both Openpath and moves phone application for our team. We then merged our dataset with available datasets from other teams. As the majority of the other team’s dataset is Openpath data, we decided to focus on Openpath data. Based on the merged dataset using Openpath, we performed the following actions:

Data Cleansing

  • For records with missing names, we tried to identify the names using the type of phone and populate these records with dummy names.
  • For records with email ids appearing under names, we populated the names as dummy names.
  • There were records where the location points are outside of Singapore (e.g. Hong Kong, Batam, etc). As the scope of our analysis is in Singapore, we decided to remove such location records.

Data Transformation

  • As the timestamp was in UTC format, we performed a data transformation to translate the time to Singapore time (UTC+8) to facilitate better analysis.

Feature Creation

  • We created a date and Day of week field (Eg Monday, Tuesday etc) from the transformed timestamp.
  • We created a list of GIS shape files e.g. gyms, healthier eateries, parks, sports fields to support our healthy life style analysis
  • We also needed data from moves to support our analysis to optimize and improve the number of walking steps achieved. As the scope of our steps analysis is on working professionals, we decided to leverage on the data collected from moves (e.g. number of steps) and attempt to derive the no of steps with the location coordinates and time. As the required data was in separate tables in moves (e.g. summary, activities, places, storyline tables) and there was no single table with all required data attributes, we use SQL to extract the data from different tables and handle records with null location coordinates.

ETL (Extract, Transform, and Load):

03

During the data preparation stage using data from moves apps, we extracted the activities, places and storylines from the full summary. As the data is being split across multiple tabs and is not a 1 to 1 join, the ETL process is used to merge and load back into the ArcGIS platform.

Location by Day of Week

The locations are segregated by the day of the week (e.g. Monday, Tuesday, etc.) to analyze travel patterns according the the day of the week.

04050607

Mon-Thursday : Similar patterns can be observed.

08

Friday : Dispersal of points can be concluded.

0910

Saturday-Sunday: Concentration of points at NUS are visible.

Mean Centre and Directional Distribution

12131415

  • For each day of the week, the mean center and directional distribution analysis are applied on the location data to investigate how the locations are dispersed over Singapore.
  • We can see that there is a spread in the northeast/southwest direction during the weekdays and greater concentration over the southwest region during weekends.
  • The potential application based on the directional trend is that it allows us to know which area to concentrate our healthy life style efforts based on days of week, weekdays and weekends respectively. Eg we could organize exercise events at east coast park as a big team during weekday.

Easy Access to Health Promoting Facilities

Across Singapore, there are many amenities which could encourage people living in Singapore to live healthy. The amenities considered here are:

  1. Parks
  2. Gym
  3. Healthy eateries
  4. Sport Facilities

16

Buffering Travelling Pattern

17

  • A 500 meters buffer area around the travelling pattern of the class is generated as a starting point to facilitate further analysis of travelling pattern intersecting with available amenities/facilities in the subsequent slide.
  • This buffer symbolizes the deviation of an individual from his/her usual traveling pattern.
  • 500m is chosen because it is a comfortable distance for walking to the next desired location.

Intersection With Amenities/Facilities

18

  • The 500m buffer which was created is used to intersect the available facilities.
  • Those facilities which are out of the buffer zone will not be considered and removed from the map.
  • The remaining facilities are the ones which people can be easily accessed based on their travel patterns.

Individual Travel Patterns-Person A

19

  • The previous analysis made used of location data derived from many people. This analysis could also be applied to a single individual to explore the facilities which he/she can potentially use.
  • For example, the travelling patterns of person A is plotted on the map and the same workflow is applied. The results will show the health promoting amenities/facilities which person A can access with ease.

Individual Travel Pattern – Person B (by day of week)

20

For another example, this is the typical travelling pattern for person B on Thursday. For this individual on Thursday, the available easy accessible facilities are shown on the map which he can go to. This visualization can be extended to include any days of week.

Step Analysis

  • Usually step challenges consider only the total number of steps within a single day, without any location based analysis of the steps. Our analysis attempt to analyse steps of an individual comparing to other individuals:22
  • Based on the visualization of the number of steps (based on the size of the circle) for our team after work(ie after 6:30 pm) over a period of two weeks, it shows that our team mate TC and Yubo has a greater number of walking steps as compared with the rest of our team.
  • Upon further interview with them, we realized that Yubo tends to walk 15 minutes from Buona vista MRT to his home although there is a bus service to his home. As for TC, he has a tendency to walk from the customer’s office to his office after work. For the rest of our team, the number of steps was relatively small due to the tendency to travel by public transport /car.
  • Hence, in order to improve our walking steps to achieve a healthy number of steps amidst our busy schedule, it is recommended that we learn from the above model team members and refine our travelling patterns such that there is more walking activity or not driving to work during certain days of the week.
  • It is important to note that walking has numerous benefits. It strengthens our heart, lowers disease risk, helps to lose weight, prevents dementia, tones our legs, bums, tums, gives us more energy, and makes us happy. Hence, with a slight tweak in our travelling patterns and lifestyle, it would lead to a big change in our health condition.

Step Analysis – Gertis Ord Gi (Hotspot)

  • Applying the Gertis Ord Gi analysis on the step feature class for all the location point. We pinpoint the hotspots and coldspots in terms of number of steps recorded.
  • Cold spot would mean that an individuals is recording lesser steps at a particular location. Hot spot would mean that an individual is taking greater steps at a particular location.

2324

For example, we can see that there are some cold spot around the Marina Bay Financial Centre while there are hotspots around the Tanjong Pagar area.

25

  • These two sets of hotspot and coldspots are derived from different individuals.
  • Relating this to the hotspot analysis earlier, we can see that the individual situating at Marina Bay Financial Centre is rather stationary compared to the other individual which is situated around the Tanjong Pagar area.
  • In this situation, reminder can be given to encourage the individual to move more rather than to stay stationary for long periods of time. As per advice from health experts, long periods of sitting day-in and day-out can seriously impact our health and shorten our lives.

Limitation and Possible improvements

  • Lack of dataset features/attributes
    • Age, Gender, etc. are not available in the current phase, but it will be helpful to include these attributes for analysis in future studies.
  • Limited data
    • The sample can be considered small due to the time constraints. Increase of sample size would be good for future studies.
  • Data are taken from Openpath and Moves as the only limited sources.
  • Ratings for health facilities (eg Accessibility, price, health grade, etc. ) could be added as attributes for data fields for better analysis in future studies.

Submitted by team:

  • Lee Tai Ngiap
  • Lwi Tiong Chai
  • Gello Mark Vito
  • Zheng Kaiyuan
  • Huang Yubo
Advertisements
Spatial – Temporal Analytics of Students Desirability Level for Living in Singapore — May 7, 2017

Spatial – Temporal Analytics of Students Desirability Level for Living in Singapore

THE THOUGHT PROCESS

MOTIVATION: “WHY THIS”

Singapore is a highly-developed city in the South East-Asian region and houses many excellent Universities. Hence, it attracts thousands of students from around the world annually. These students are from a diverse set of socio-economic backgrounds and hence have a varied taste in choosing an accommodation.

We were very interested in trying to understand the influential role played by the amenities or services in the selection of accommodation by Students. With this study, we would like to model the most desirable areas that our target group (ISS EBAC 04 associates) should find most suitable for their needs.

THE OBJECTIVE

Our objective is to derive the most desirable places to live in Singapore for NUS students based on certain set of amenities

The selection of the variables was done keeping the general psyche of the students in mind. We arrived at the following Key service variables that can influence the decision made by the students:

➢ House Rental Prices

➢ Distance to:

  • MRT and LRT Stations
  • Community Clubs
  • Park Connectors
  • Healthy Dining Places

 DATA COLLECTION AND CLEANSING

The Starting point was the cumulative data collected by our class mates. A total of “7747” data points were collected by all the students taken together. We segregated the data points to derive the following

Home Location: Using Time between 12 AM and morning 8 AM. We found a total of “233” Data Points as a good approximate representation of the entire class

Lunch Location: Using Time between 1 PM and 3 PM. “731” points were found for this category

The external data of Amenities/Facilities that we used to overlay on this base data were:

  • Community Clubs
  • Park Connectors
  • Health Dining Options
  • MRT and LRT
  • House Rental Prices
  • Singapore District

Data Sources

✓  https://data.gov.sg

✓  ArcGIS online

THE IMPLEMENTATION

  THE PROCESS

Screen Shot 2017-05-08 at 1.14.52 AM

 THE STORY

 House Rents

House Rent data was collected and we joined the attribute table of House rent layer with that of the district layer to give the district wise rental. We guessed approximate values for rent for locations for which the data was not available. However, note that this was for the completeness of map only and does not represent true rental of those locations.

The Most Expensive areas are among the least desirable of places for renting out a   home.   

Screen Shot 2017-05-07 at 11.19.44 PM

 Separation into Zones

We partitioned Singapore map based on the rental prices into 3 “zones” so that we can easily manage 3 mutually exclusive zones according to affordability. 3 Sub Zones were:

  • $1500 – $1850 – Low Priced Zone
  • $1900 – $2200 – Medium Priced Zone
  • $2200 – $2700 – High Priced Zone

Partitioning was done using the Lasso Tool and we Rasterised the 3 zones to be able to demarcate them individually. We could then intersect any of these zones with the buffered polygons of shortlisted Amenities/Facilities. The zones are as displayed in the map below:

Screen Shot 2017-05-07 at 11.19.58 PM

 MRT concentration

The MRT concentration Heat Map gives us an insight into the Governments’ prioritisation for residential and business areas. We can clearly see that the services are concentrated in the “Raffles Place”, “Punggol” and “Choa Chu Kang” areas.“Raffles Place” is the financial hub of the island and hence this is a highly unlikely area for residentialproperties.Screen Shot 2017-05-07 at 11.20.08 PM

 MRT Vs Home

We made a polygon buffer taking 1.5 km distance for the radius from the exact coordinates of the MRT stops. This, when over-laid with the “Home Locations of Students” gave us a good insight that MRT locations are definitely one of the major deciding factors as most students were found to be residing with in the 1.5 km raster of MRT. However, on closer Inspection some densely packed home locations near “Pasir Panjang road” were found to be far from the MRT Locations.

Screen Shot 2017-05-07 at 11.20.21 PM

The map below is a definite indication that we would have to find some other decisive services or amenities in addition to the MRT services to give us a better understanding about students residing in Pasir Panjang.

Screen Shot 2017-05-07 at 11.20.31 PM

Healthy Eating Places

To stay healthy, it is essential to eat healthy! So we wanted to get an Insight into the eating habits of the NUS Students and their decisions to take the accommodation.

The following Map gives the “Lunch locations” layer over-laid on polygon buffer of Healthy Eateries taking 1 Km distance from the exact coordinates of the eateries. This gives us the locations for lunch of the students on a college day as well as a non-college day. So, we can infer that most of the students are within 1 km radius of healthy dining options

Screen Shot 2017-05-07 at 11.20.43 PM

Community Clubs

Community Clubs (CCs) are recreational centers, having various activities related to hobby, fitness, sports and short courses on different topics. Even for CCs, we created a polygon buffer of 1 km radius which gives the ideal reach to a person in the neighborhood.

The following map gives an overlay of home locations of students over the Community Center buffer. It was found that most students resided within 1 Km distance from the nearest community clubs.

Screen Shot 2017-05-07 at 11.21.14 PM

Park Connectors

People prefer park connectors near their homes and therefore we have rated the students’ residences about 1Km within park connectors as more desirable for the people. The Following Map Does Cover most of the home locations of students with in the 1 km distance to park connectors

Screen Shot 2017-05-07 at 11.21.29 PM

Low Price Desirability Map

For deriving the following map, we took all the layers above and considered intersections of rental zones with any two or all the facilities/amenities in the vicinity.

Hence, we took multiple intersects of all the polygon buffers derived for amenities/facilities (2.2.4 – 2.2.7) in various combinations.

This helped us geographically isolate areas in the map which had all or some of the facilities contained within them.

Most Desirable Area (in Dark Green) is the intersection of:

  1. MRT+
  2. Healthy Dining Places +
  3. Community Clubs +
  4. Park Connectors +
  5. Polygon for Low Price Zone (from derivation 2.2.2)

The other colors representations are as follows:
Light Green: intersection of Low Rent Zone + MRT + Healthy Dining Places                Yellow: Intersection of Low Rent Zone + Community Clubs + Healthy Dining Places

Surprisingly, the most desirable areas with all amenities with lowest rent zones had just few students residing in them

Screen Shot 2017-05-08 at 12.18.34 AM

 Medium Price Desirability Map

We repeated the above intersections for medium rental zones. This time we found several students residing on medium rent locations with all amenities or at least 2 amenities. The colours representing different intersections are same as above (where low rent zones are now replaced with medium rent zones)

Screen Shot 2017-05-08 at 12.18.46 AM

High Price Desirability Map

We now applied intersections with “High Rent Zone”. Now, we see only a few students’ homes in the most desirable but highly expensive areas.

Screen Shot 2017-05-08 at 12.18.56 AM

Overall Desirability Map

We Overlay the Maps in 2.2.9 to 2.2.11 on each other. This gives a union of three maps as the zones are disjoint.

From this map, we can clearly see that we have captured most number of students within most desirable areas will all amenities in around 1 km radius of the amenities (ignoring rent) in dark green. This was the advantage of color coding locations as per the intersection of the number of amenities and not price.

This map shows that:

  • Majority of students reside in dark green areas, i.e. locations which have all amenities.
  • Some students reside in light green regions indicating they have facilities like MRTs andHealthy Dining places closer to their houses and

 

Only a few students live in brown colored locations where in Community Clubs and Healthy

Dining options are in about 1 km distance of their houses. However, these houses were spread across all (high, medium and low) rental prices

Screen Shot 2017-05-08 at 12.19.11 AM

However, it was interesting to see that the area near Pasir Panjang was highly populated despite not being the most desirable, owing to the fact that all amenities are away from this location according to our model. This brings us to the final addition to our analysis.

Screen Shot 2017-05-08 at 12.19.21 AM

Accommodation Proximity to NUS

We added a polygon to represent NUS which acts as a Geo-Fence (in brown) for the spread of NUS. We made another buffered polygon with a range of 1 Km (in Blue) extending out from this geo fenced area. This area accounts for an extremely desirable area for our target group.

The group of students residing in this area have given up most other amenities in favour of proximity to the university. This explains the query raised above of why some students stay at Pasir Panjang even when the amenities are far-offScreen Shot 2017-05-08 at 12.19.30 AM.png

THE INSIGHTS

The entire analysis of the geospatial data gives us the following insights into the student choice of accommodation:

  1. Most Students prefer being close to the University at the cost of lesser amenities
  2. Most students reside in locations of medium rent. Again, this is a Proximity based decision forthe students. The locations closer to the university have higher desirability irrespective of rent
  3. Very few students live in locations of low rent, however these places are at longer distance fromNUS
  4. The few student houses are found in high rent locations and these mainly because they arewithin a kilometer radius from NUS
  5. Majority of the students’ residences have proximity to amenities to at least some of theamenities from MRT station, Community center, Healthy dining facilities and Park connectors.
  6. However, those staying closest to NUS campus must travel more than a kilometer to avail all theamenities and are in general paying high rent. They are still able to utilize amenities likecommunity centers and healthy dining facilities.
  7. The Priority that we found in decreasing order of preference are:
    1. Proximity to NUS
    2. Proximity to MRT
    3. Proximity to Healthy Eateries
    4. Others

     

To read our complete report please refer to the link below:

1) Report: Spatial_Temporal_Analytics_of_Students_Desirability_Level

Team:

1.Abhilasha Kumari

2.Ashok Eapen

3.Pranav Agarwal

4.Rohit Pattnaik

5.Snehal Singupalli

 

Traffic Accident Event Processing Network using Node-RED — November 14, 2016

Traffic Accident Event Processing Network using Node-RED

This is a short description on implementing event processing networks using IBM Node-RED. Although designed for wiring Internet of Things (IoT), the programming model of Node-RED follows closely to that of an event processing network, making it an excellent open source tool to prototype an EPN. To install Node-RED, follow the installation guide on its website. Node-RED programming guide website is also a good resource.

We shall assume that we already have an EPN in mind to implement. In this tutorial, we will model the following EPN designed to detect traffic accidents.

traffic_accident_epn
EPN to detect traffic accidents

The basic flow of events is as follows: on-board unit (OBU) of a car detects a possible collision and sends out a “Possible Crash Event – OBU” event. The event is then enriched with more information from the “Vehicle Registration Database”, which is a global state element. When this event reaches the event processing agent “Compose Accident Info”, the agent will open up a spatial and temporal context window and wait for the image detection confirmation from the traffic camera. When a crash image from the same location and same time frame is detected, the EPA logs the accident in a database. Affected area is then calculated based on domain expertise / knowledge base and alert is sent out to the relevant dashboard and on-board units.

We implement the network in IBM Node-Red using websockets as event producers and channels, functions as processing agents and PostgreSQL database connection nodes as global states.

epn
Node-RED representation of the EPN

Due to time constraint, we substituted a toggle switch in place of image detection nodes. When building the “Compose Accident Events,” we found that the “context” concept of Node-RED similar to the “stateful agent” concept in EPNs. Putting incoming message values in the context of a node allows it to remember it when the next message arrives.

We also found Node-RED to be able to connect to IBM Watson using node extensions, and could definitely serve as a future enhancement by using IBM Watson image recognition API.

Note: This post is based on the work done for KE5208 Sense Making and Insight Discovery CA project completed in November 2016.

Team members: Randy Phoa (A0135933A), Chan Chia Hui (A0135940H), Zay Yar Lin (A0090806E)

Finding Interconnected Road Segments From LTA Speed Band Data — November 13, 2016

Finding Interconnected Road Segments From LTA Speed Band Data

In the recent Sense Making and Insights Discovery project, project teams were tasked to analyze data from publicly available datasets like the LTA data mall and come up with EPN (Event Processing Network) designs to reduce the occurrences of traffic jams.

One area that our team wanted to explore was to look at how we can creatively exploit the speed band data to find interconnected roads and segments which can then be used in other areas of analysis or computation. For example, in a routing algorithm or finding correlations between neighboring segments of roads.

The speed band data is updated in five minutes’ intervals. The response contains attributes such as the road name, road category, speed band information and start and end location in latitudes and longitudes. Every road is made up of one to many segments and long roads like expressways may be broken up into hundreds of segments. There are more than 50000 segments in the whole of Singapore. We imported one slice of data into PostgreSQL database with the PostGIS plugin enabled, creating a line string geometry field out of the start and end latitude and longitude from the data.

To easily visualize the data in our database, we can export a shapefile from any tables in PostgreSQL with a geo column using the bundled PostGIS Shapefile Import/Export Manager tool. We can then import it into ArcGIS or MapShaper (http://mapshaper.org/).

map1

In our first attempt, we realized that there are some gaps between roads. To help the intersection work better, we created a three-meter buffer around each road segment line string. The effect can be seen from the zoomed in figure below.

intersections

Intersections can now be easily found by running the ST_Intersect SQL command on the table. (http://postgis.net/docs/ST_Intersects.html)

Having found intersections, we proceeded to generate a graph database for both road and road segment levels. We used the python library networkX (https://networkx.github.io/) for creating the graphs and Graphviz (http://www.graphviz.org/) for visualization.

The road graph below is created using Graphviz using the sfdp (scalable force directed placement) command to layout the nodes. It contains all the roads in Singapore and contains about 4000 nodes. The red nodes are expressways and the green ones are major arterial roads.

graph

With the graph object, we can now perform some queries easily, for example if we wanted to find the roads that are connected to Heng Mui Keng Terrance (the road outside ISS):

neighbors = G.neighbors(‘HENG MUI KENG TERRACE’)

We can also get the segment level view.

graph2

After all the pre-processing, we are now ready to make use of our graph objects. In our EPN, we have proposed using the speed band information to help in routing vehicles in the event of a traffic jam or accident. If you look at the road segment graph, you can see that compared to the road graph, it is both directed and weighted. As we can easily find the distance for each road segment and we also have the speed band at any given time from the speed band data, we can use this information to derive the estimated time to travel across the segment at any given time. Using this as the weight, we can then find the shortest path (in terms of time to travel) between any two nodes in the graph using any shortest path algorithm like Dijkstra’s algorithm.

Another analysis that we can do is that we can find correlations between neighboring road segments in the graph to derive some insights. As mentioned earlier, there are more than 50000 road segments and it might be infeasible to perform correlation analysis on all the segments, especially since we are looking across time. For example, we might collect data across five days which results in 1440 data points for a single segment. It would probably make more sense (statistically and computationally) to zoom into clusters of segments of interest and look at correlations there, which can be done by getting n degrees of neighbors.

This is a short excerpt from our project and highlights some of the work done, thanks for reading.

By Team 5 (Chan Chia Hui, Randy Phoa and Zay Yar Lin)