Global Soil Degradation data for the GIS Analysis. The GLASOD data base contains information on soil degradation within map units as reported by numerous soil experts around the world through a questionnaire. It includes the type, degree, extent, cause and rate of soil degradation. From these data, GRID produced digital and hardcopy maps and made area calculations

The dataset, a shpfile ,  is originally found from United Nations Environment Programme, Environmental Data Explorer. It contains wide range of data from the United Nations Environment Programme including Night-time Lights, Pollutant Emissions, Commercial Shipping Activity, Protected Areas and Administrative Boundaries.

We have used ARCGIS 10.4 for the analysis part and R for the modelling part.


The following sections of the document illustrate the different stages of the analysis performed on the soil degradation data of India. The results of the same help decide on feature engineering. The final set of features so obtained is used to build several statistical models. Ensemble of models is also designed to improve on accuracy of prediction.

Accuracy of the models is measured by the accuracy percentage handling statistical metrics.


The original glasod shpfile did not have any reference to coordinate system when it was added to ARCGIS. As the shpfile is basically a vector dataset consisting of polygons, we did Geo-Referencing, by tagging it to GCS_WGS_1984 coordinate system under Geographic Coordinate System.

Once the Geo-referencing was done, we could view the shpfile global dataset in ARCGIS as below:

Outcome after Geo referencing

The shpfile contained details of polygons and each polygon represents some reference with respect to Soil degradation in that area.

Geo Processing

Being a Global Dataset for Soil erosion, we tried to study for a particular country, such as India, to have more visibility. As part of this objective, we employed Clipping, Merging as Geoprocessing options to achieve the same.

To start with clipping, we first identified the polygons Geo IDs, Glas_Geo_ID for every polygon in India and selected them using SQL queries at attribute table.

To get our hands on to other Geo-processing options, we used features such as Merge, Buffer, Dissolve etc. but they weren’t handy in outcome.

To study individual features in India according to soil type Degradation we tried to split the attributes. For this operation, we installed a new add-in called X-tools pro.

This tool has facility to perform multiple operations on vector datasets, especially overlay operations. We used this to perform Split by Attributes of India according to TYP1 and TYP2.


This operation helped in studying features at a deeper level to get more visibility.

For example, split by attribute 3, the soil type which is prevalent across most part of India running from North to South across the East is considered the most fertile and thus is not degraded according to the soil data.

At the same time, the prevalent soil type in the west is degraded mostly due to forest degradation (fg) which correctly represents as loss of top soil (Wt). The same can be confirmed from the expert’s perspective at the below link:

Hot Spot Analysis:

Using ArcMap’s the areas with high likelihood of being impacted are visualized by default. Thus for Hot spots areas with 99% confidence interval are displayed. Similarly, the areas with 90% confidence interval are displayed in cold spots.

Hot Spot Analysis – Severity – 2

Inference for Sev-2

The Sev-2 means that this is applicable for type-2

In the above figure we have considered the confidence interval as 99%. The red colour area signifies the hot spots. So in these areas the soil degradation is highly possible.

Recommendations for Sev-2

Soil Degradations causes the soil to lose all its nutrients. Using this predictive analysis, we recommend to take measures to control the degradation by the following ways:

  • Afforestation of the area before it gets impacted.
  • Plant grass & shrubs in the area
  • Use mulch matting to hold vegetation of shrubs
  • Improve drainage


As a part of modelling, we intended to perform a Multi – Class Classification Modelling on rate of Soil Degradation across the world Degradation data.

Since the target variable is a categorical variable with 4 levels (Rate – 0 ,1 ,2 ,3) we did a multi -class classification using algorithms such as Recursive Partitioning (RPART), Conditional Tree (CTREE) and Adaptive Boosting (ADA Boost –  Ensemble of Trees).

Classification Models

3 types of models are used for predicting the target class.

Accuracy of the models stood as below:

Model Accuracy

The best performing model achieved around 80% accuracy, which is good. Also all the models had no mispredictions for slow degradation (rate – 0) soil polygon data. Also the other side, there were slight mispredictions for the rapid degradation data. The incorrectly predicted classes were identified, checked in the map and they didn’t follow any pattern.

Output of RPART
Variable Importance



  • Geo-Referencing to WCS Geographic Coordinate System.
  • Geo-Processing options such as Clipping, Merging were tried to study a separate country such as India.
  • To study the attributes separately, Split by Attributes from X Tools pro was done.
  • Also to identify the significance of degradation of soils in India, Hot and Cold spot analysis was carried out.
  • Finally to generalize the rate of Soil of Degradation around the model, Classification modelling was done

Created by:

G V Kamaraju A0148570B
Priyak Banyopadhyay A0148379M
Anil Kumar Kondaveeti A0148461A
Mahendra Prakash Subramanian A0148562Y