Scientific- Research Quarterly of Geographical Data (SEPEHR)

Geographic Information System (GIS)

Comparison of efficiency of random forest and support vector machine methods for mineral potential mapping of copper deposits, - Case study: Dahaj-Bazman

Mohammad Karimi; Parastoo Pilehforooshha; Ali Safari

Articles in Press, Accepted Manuscript, Available Online from 20 April 2024

https://doi.org/10.22131/sepehr.2024.2012344.3023

Abstract

Extended Abstract Introduction:The exploration and preparation of the potential map of mineral reserves requires the use of various methods and techniques, based on the geological and mining knowledge of the investigated area, and the use of predictive models of mineral potential (Bonham-Carter, ... Read More Extended Abstract Introduction:The exploration and preparation of the potential map of mineral reserves requires the use of various methods and techniques, based on the geological and mining knowledge of the investigated area, and the use of predictive models of mineral potential (Bonham-Carter, 1994; Carranza et al., 2008a). According to the investigations, the common models of map integration that are used in the discovery of mineral reserves in the initial exploration stage include index overlap model, fuzzy operators, weighted indicators and smart methods such as random forests and artificial networks. Determining the values of weights and scores that show the relative importance of the effective factors is the primary requirement in combining the maps and preparing the mineral potential map (Agterberg, 1992; Brown et al., 2000).The purpose of this research is to prepare a potential map of copper deposits in Dehj-Bazman region using two methods of random forest and support vector machine. In addition, in order to compare the potential map of porphyry copper reserves resulting from the random forest method, the support vector machine method and the knowledge-based methods of index overlap and fuzzy logic were used.Materials & Methods:The area studied in this research is a part of the magmatic belt of Kerman region, known as the Dehj-Sardouye belt. The information layers controlling mineralization in Dehj-Bazman area include rock units, structures, alterations, geochemistry, geophysics and copper deposits. In practical applications of machine learning algorithms, mineral potential mapping is essentially a bimodal classification problem, such that each undiscovered area is classified as prospective or non-prospective according to some combination of mapping criteria (Zuo, 2011). The final results are a set of predictive maps that show target areas with high ore formation potential.In order to model, training was done. Before training the random forest model, the input data set and the target variable should be prepared and then the model should be trained. The target variables for entering the random forest model and support vector machine were determined as deposit points (values of 1) and non-deposit points (values of 0). Then the genetic algorithm was used to adjust the parameters.Evaluation of the predictive performance of random forest model and support vector machine can be described by the ambiguity matrix. In this matrix, there are four components, which are defined as: (1) a deposit sample that is correctly classified as a deposit (TP); (2) a deposit sample incorrectly classified as a non-deposit sample (FN), (3) a non-deposit sample correctly classified as a non-deposit sample (TN), and (4) a non-deposit sample that is wrongly classified as a deposit sample (FP) (Liu et al., 2005; Tien Bui et al., 2016): (8) (9) (10) (11) (12) After training and evaluating different models, the best model was obtained by adjusting different parameters and it was used to integrate factor maps in order to predict areas with high potential of porphyry copper deposits. Also, knowledge-based methods of fuzzy logic and index overlap were used to combine factor maps to compare with the results of intelligent methods.Results & Discussion:At this stage, the desired information layers were collected and prepared in the GIS environment, and then factor maps were prepared. Accuracy, sensitivity, specificity, predicted positive value, predicted negative value, kappa index and OOB error were used to evaluate the performance of random forest model and support vector machine. Also, the importance of the predictor variables in the random forest model was evaluated through the mean decrease in accuracy and the mean decrease in node impurity or the Gini impurity index (Breiman, 2001). According to the results, the most important predictor in the random forest model is the geochemical map, while the structures factor has the least impact in predicting the preparation of the mineral potential map with the final random forest model.In the potential maps of porphyry copper deposits obtained from two methods of random forest and support vector machine, the target areas cover 14% of the studied area, in which there are 92% and 87% of known deposits, respectively. Finally, the efficiency of machine learning methods and knowledge-based methods were compared. In order to produce porphyry copper potential map with knowledge-based methods, the judgment of expert experts was used to assign weights to each criterion map. For this purpose, weights of 0.3, 0.25, 0.25, 0.1, 0.1 were assigned to produce maps of alteration factor, geochemistry, geology, geophysics and structures respectively. In the potential map obtained from the method of index overlap and fuzzy logic (fuzzy sum), the areas predicted as copper mines cover 16 and 17 percent of the studied area, respectively, in which 83 and 79 percent of the existing mines are located.Conclusion:This research was conducted with the aim of evaluating and comparing the effectiveness of random forest method and support vector machine method and knowledge-based methods to prepare porphyry copper potential map of Dehaj-Bozman region of Kerman province. Based on the results, the random forest model works well in the field of porphyry copper potential map preparation with geochemical, geophysical, geological, alteration and structures datasets. In addition, the random forest algorithm can estimate the importance of factor maps.The results of this research show that the geochemical factor map is the most important and the structure factor map is the least important in predicting the data-driven model of random forests. This estimate of importance is consistent with geological knowledge about porphyry copper mineralization in Dehj-Buzman region. In order to produce porphyry copper potential map with knowledge-based methods, the judgment of expert experts was used to assign weights to each criterion map. According to the obtained results, the performance of the random forest model is better than the vector machine model, and also, the performance of the support vector machine model is better than the knowledge-based methods.

View Article

Remote Sensing (RS)

Investigation of the Effect of Main and Artificial Bands of Sentinel 2 satellite Images on Estimation of Quantitative Characteristics of Zagros Forests

Nastaran Nazariani; Asghar Fallah

Volume 31, Issue 124 , March 2023, , Pages 103-118

https://doi.org/10.22131/sepehr.2023.553505.2875

Abstract

Extended Abstract Introduction Estimation of forest habitat characteristics is a necessary issue in order to collect information for sustainable forest management (Ahmadi et al., 2020). Data collection methods require a lot of time and money. Therefore, it is always tried to use ... Read More Extended Abstract Introduction Estimation of forest habitat characteristics is a necessary issue in order to collect information for sustainable forest management (Ahmadi et al., 2020). Data collection methods require a lot of time and money. Therefore, it is always tried to use complementary methods, with lower costs and acceptable accuracy, using the achievements obtained in various scientific fields (Sivanpillai et al., 2006). Sentinel 2 is a new generation optical satellite for Earth monitoring developed by the European Space Agency with new spectral capabilities, wide coverage and good spatial and temporal resolution for data continuity and enhanced Landsat and Spot missions (Wang et al., 2017). When the size of the population is not very large, the application of each of the simple random, classification and systematic methods leads to a more or less similar result. But when the size of the community increases, these methods are associated with problems such as: preparing a sampling framework, high cost of surveying sample units with high dispersion and preparing a sampling plan from units far from each other (Zubair, 2007). The cluster method is one of the recommended methods for large areas in which instead of one sample plot, several sample plots are harvested in one part of the study area (Yim et al., 2015). Among the researches done on the mentioned subjects are the research of Kleinn (1994), Ismaili et al. (1396), Behera et al. (2021), Sibanda et al. (2021), Praticò et al. (2021), Nazariani et al. (1400) and Dabija et al. (2021). Although studies on estimating quantitative forest characteristics using distance measurement data and nonparametric algorithms in Zagros forests may have been done extensively, the effect of main and artificial bands to estimate canopy characteristics and density (number Per hectare) using Sentinel 2 images in the forests of Watershed Orfi Olad Ghobad Koohdasht with the aim of selecting the optimal cluster design to save time and money to achieve forest inventory has not been reported, so in this study, we tried to investigate this issue. Materials and methods In order to conduct the present study, a part of the Zagros forests located 35 km north of Koohdasht city, named Watershed Olad Ghobad was selected. Sampling points were determined in a regular-random manner using a grid with dimensions of 600 × 500 meters. Then, at each sampling point, 16 different cluster sampling designs with four circular and square subplots were designed and implemented. The radius of the circular subplots was 15 meters, the diameter of the square sample was 37 meters and the distance between the subplots was 60 meters. Then, the information on the characteristics of the number per hectare and canopy of trees including the number, of two large and small canopy diameters per sample was measured. In this study, Sentinel 2 sensor images related to August 6, 2021, equivalent to summer 1400, were used at the L1C correction level. This level of correction is geometrically error-free due to the reference ground and because their reflection is at the upper level of the atmosphere. In the present study, four bands (2-blue band, 3-green band, 4-red band, and 8-near-infrared band) of this sensor with a resolution of 10 meters were used. In general, Sentinel 2 image preprocessing operations involve radiometric and geometric correction. The image processing also includes various operations such as grading, texture analysis, band integration, and fabrication of plant features (Naghavi, 2014). In addition to the main bands, artificial bands were created by applying appropriate processing, which was used in the modeling process. Spectral values equivalent to ground plots were extracted from the main and artificial bands and used as an independent variable in the models. In order to evaluate and fit the regression models, 25% of the data were randomly selected (Lu et al, 2004) and excluded from the evaluation data set. The validity of statistical models was evaluated using the coefficient of determination of the mean squared error squared, bias, mean squared error, and squared percentage. In total, ArcGIS software was used to implement the sample parts on the image, ENVI software was used for image processing and STATISTICA software was used for modeling.ResultsIn this method, during data validation, the results showed the characteristic of number per hectare of cluster 16 and the characteristic of canopy cover of cluster 15 with a coefficient of explanation (0.66) and (0.59), respectively, it has the highest accuracy. The results obtained from the application of the nearest neighbor algorithm with four criteria of Euclidean distance, Euclidean square, Manhattan, and Chapichev showed that for the number of characteristics per hectare, the Euclidean distance criterion with cluster 16 and for the canopy characteristic of the Euclidean distance criterion with cluster three, respectively (R2 = 0.59 and RMSE=5.70%) and (R2 = 0.62 and RMSE= 12.30%). The accuracy and efficiency of the support vector machine algorithm are influenced by the type of kernel used. The results of different kernels by considering different cluster sampling designs in the backup vector machine method showed for the characteristic number of linear kernel trees and 13 cluster sampling designs with an explanation coefficient of 0.72 and for the canopy characteristic. The linear kernel and the cluster sampling design of seven with a coefficient of determination of 0.65 have the best results. Evaluation of the artificial neural network model showed that the MLP algorithm is more suitable than the RBF algorithm in estimating the studied characteristics with its high accuracy and average squared percentage. Based on this, among the 16 designs used with the MLP algorithm, they showed the most suitable results for the number of characteristics per hectare of cluster six with a coefficient of reflection of 0.86 and for the canopy characteristic of cluster 10 with a coefficient of reflection of 0.76, respectively. Based on the values of the coefficient of explanation and the lowest squared percentage of the mean squares of error, the most appropriate model was selected from the four types of algorithms studied in modeling and the results showed both characteristics of the artificial neural network model respectively (with MLP algorithms MLP 80-20-1 and MLP 80-11-1) presented optimal results with explanation coefficients of 0.86 and 0.76.Discussion and conclusionThe modeling results with four studied algorithms for the canopy characteristic showed that the artificial neural network model algorithm with a cluster sampling design of 10 with an explanation coefficient of 0.76 was the most suitable method. The results are consistent with the study (Yim et al., 2015;) and show the superiority of using cluster sampling, nonparametric modeling of the artificial neural networks and Sentinel 2 images in the structure of the forest ecosystem. Yim et al. (2015) acknowledged that in natural environments, the correlation between sub-plots and habitat conditions in terms of their shape and size should be more sensitive to forest structure. According to the study of Sivanpillai et al. (2006) in poorer masses, due to the presence of more gaps in the canopy, absorption and distribution occur. In contrast, Dabija et al. (2021) compared support vector machine and stochastic forest algorithms for canopy mapping using Sentinel-2 and Landsat 8 satellite imagery to evaluate regional and spatial classification and development in three different regions. Catalonia, Poland, and Romania paid. The results showed that Sentinel-2 satellite images were better than Landsat 8 data inaccuracy (8-10%) in land cover classification and radial-based support vector algorithm than in random forest with accuracy (6-7%). Function. Nazariani et al. (1400) also had the stochastic forest algorithm as the most suitable model for estimating the canopy characteristic, which is not consistent with the results of the present study. The reason for the difference can be found in the type of algorithm obtained and the accuracy achieved.

Improving the Detection of Object-Oriented Changes in High-Resolution Images based on Random Forest Method in Optimal Features Space

Saeed Ojaghi; safa khazai

Volume 26, Issue 104 , March 2018, , Pages 117-127

https://doi.org/10.22131/sepehr.2018.30522

Abstract

Extended Abstract Land use/cover (LULC) change detection is one of the most important applications in the remote sensing field, providing insights that inform management, policy, and science. In the recent decade, development of remote sensing systems and accessibility to high spatial resolution images ... Read More Extended Abstract Land use/cover (LULC) change detection is one of the most important applications in the remote sensing field, providing insights that inform management, policy, and science. In the recent decade, development of remote sensing systems and accessibility to high spatial resolution images has associated with the improvement of digital image processing. The advantage of high spatial resolution remote sensing imagery further supports opportunities to apply change detection with object-based image analysis, i.e. object-based change detection – OBCD. OBCD analysis in comparison with pixel-based techniques provides a more effective way, especially in high spatial resolution imagery to incorporate spatial, spectral, textural and geometry feature that can identify the LULC change in comparison with pixel-based technique. OBCD approach is classified into for categories: (i) image-object, (ii) class-object, (iii) multi- temporal object, and (iv) hybrid change detection. Different algorithms and features can be employed in the process of image classification for OBCD. Therefore, the choice of algorithm and optimization features are major challenges in OBCD. This paper has introduced an object- based change detection method based on the machine learning algorithm, which can overcome the traditional change detection method limitation and find the interested changed objects. In this paper, multi-temporal object approach is utilized and high spatial resolution imagery, GeoEye-1 and Quick Bird-1 satellite images were acquired during 2002 and 2015, covering a region of the Geshm Island which were used to detect the meaningful detailed change in the study area. As an essential preprocessing for change detection, multi-temporal image registration with the accuracy of less than one second of a pixel is applied. Also, radiometric correction is performed using histogram matching algorithm in ENVI Software. In the Next step, a number of texture features of images such as mean, variance, entropy, homogeneity, momentum and such are extracted from two images. To reduce the input features space, PCA algorithm is employed and the result of this process is used in the segmentation process. The two images are incorporated with PCA output and are used as inputs feature to segmentation. Segmentation is the first step in OBCD. It divides the image into larger numbers of small image objects by grouping pixels. The segmentation algorithm is a region-merging technique. It begins by considering each pixel as a separate object. Subsequently, adjacent pairs of image objects are merged to form bigger segments. The merging decision is based on local homogeneity criterion, describing the similarity between adjacent image objects. Correct image segmentation is a prerequisite to successful image classification. At the same time, this task requires explicit knowledge representation. Furthermore, optimal segmentation results are depended on not only the choice of segmentation algorithm or procedure, but are also often influenced by the choice of user-defined parameter combinations which are required inputs for many segmentation programs. The segmentation has been done using multi resolution segmentation algorithm which involves knowledge-free extraction of image objects. Multi-resolution segmentation begins with single pixel objects and employs a region-growing algorithm to merge pixels into larger objects; pixels are merged based on whether they meet user-defined homogeneity criteria. Each multi-resolution segmentation task must be parameterized by the user and involves settings of three parameters: Scale, Color-versus-Shape, and Compactness-versus-Smoothness. In this paper the process of segmentation is performed in four different levels using Ecognition software and finally, the level with better output with scale of 100 is selected to provide the change map. The scale values were determined through an iterative method. The color/shape was set to 0.6/0.4 and compactness/sharpness was set to 0.5/0.5 for the selected level. Color and shape weightage are inter-connected to each other. If color has a high value, which means it has a high influence on segmentation; Shape must have a low value with less influence. If both parameters are equal, then each will have roughly equal amount of influence on segmentation outcome. In addition, texture, spatial and geometrical features from the segmented image are extracted. Feature space Optimization (FSO) tool available in Ecognition software have been used to calculate optimum feature combination based on class samples in four classes including: ”barren to road”, ”barren to building”, barren to vegetation” and “barren with no change. It evaluates the Euclidean distance in feature space between the samples of all classes and selects a feature combination resulting in best class separation distance. In this study, the performance of the proposed RF-based OBCD method is compared with the conventional methods such as support vector machine (SVM) and KNN. The commonly used accuracy assessment elements include overall accuracy, producer’s accuracy, user’s accuracy and the Kappa coefficient. The overall accuracy of the change map produced by the RF method was 86.57%, with Kappa statistic of 0.79, whereas the overall accuracy and Kappa coefficient of that by the SVM and NN methods were 83.76%, 0.75 and 75%, 0.63, respectively. Experimental results show that overall accuracy and kappa coefficient obtained from the proposed RF-based OBCD method improve 3% and 18%, 2% and 10% respectively compared with SVM and KNN improved. The results indicated that object base change detection method can be performed more accurately and reliably in the high-density region if it uses image with high spatial resolution. Also, selection of classification algorithm has very impressive effect on the providing change map.

Articles in Press

Current Issue

Volume 32 (2023)

Volume 31 (2022)

Volume 30 (2021)

Volume 29 (2020)

Volume 28 (2019)

Volume 27 (2018)

Volume 26 (2017)

Volume 25 (2016)

Volume 24 (2015)

Volume 23 (2014)

Volume 22 (2013)

Volume 21 (2012)

Volume 20 (2011)

Volume 19 (2010)

Volume 18 (2009)

Volume 17 (2008)

Volume 16 (2007)

Volume 15 (2006)

Volume 14 (2005)

Volume 13 (2004)

Volume 12 (2003)

Volume 11 (2002)

Volume 10 (2001)

Volume 9 (2000)

Volume 8 (1999)

Volume 7 (1998)

Volume 6 (1997)

Volume 5 (1996)

Volume 4 (1995)

Volume 3 (1994)

Volume 2 (1992-1993)

Volume 1 (1990-1992)

Keywords = Random Forest

Comparison of efficiency of random forest and support vector machine methods for mineral potential mapping of copper deposits, - Case study: Dahaj-Bazman

Abstract

Investigation of the Effect of Main and Artificial Bands of Sentinel 2 satellite Images on Estimation of Quantitative Characteristics of Zagros Forests

Abstract

Improving the Detection of Object-Oriented Changes in High-Resolution Images based on Random Forest Method in Optimal Features Space

Abstract