Scientific- Research Quarterly of Geographical Data (SEPEHR)

Geographic Information System (GIS)

Comparing the efficiency of machine learning classifiers in extracting the physical development area of Hamedan city using Object-Based Images analysis of satellite images

Abolfazl Ghanbari; Mostafa Mousapour; Habil Khorrami hossein hajloo; Hossein Anvari

Articles in Press, Accepted Manuscript, Available Online from 21 February 2024

https://doi.org/10.22131/sepehr.2024.2012503.3024

Abstract

Extended AbstractIntroduction:The urban space is the most important human-made spatial structure on the planet earth. The history of urban development shows the path of human development, political system evolution and technological, technical and industrial developments. The physical development of ... Read More Extended AbstractIntroduction:The urban space is the most important human-made spatial structure on the planet earth. The history of urban development shows the path of human development, political system evolution and technological, technical and industrial developments. The physical development of urban areas is one of the main drivers of global changes that have important direct and indirect effects on environmental conditions and biodiversity. In the process of physical development of the city, due to the transformation of natural and semi-natural ecosystems into impermeable surfaces, it often causes irreversible environmental changes. One of the new approaches in urban planning is the use of remote sensing techniques and geographic information system. The emergence of remote sensing and machine learning techniques offers a new and promising opportunity for accurate and efficient monitoring and analysis of urban issues in order to achieve sustainable development. The process of processing satellite images can generally be divided into two approaches: pixel-based image analysis and object-based image analysis. The pixel-based analysis technique is performed at the level of each pixel of the image and uses only the spectral information available in each pixel. On the other hand, the object-based analysis approach is performed on a homogeneous group of pixels, taking into account the spatial characteristics of the pixels. One of the basic problems in urban remote sensing is the heterogeneity of the urban physical environment. The urban environment usually includes built structures such as buildings and urban transportation networks, several different types of vegetation such as agricultural areas, gardens, as well as barren areas and water bodies. Therefore, in the pixel-based processing approach, the existence of heterogeneity in the urban biophysical environment causes spectral mixing and also spectral similarities in the classification operation of satellite images in such a way that in a place where a pixel is If the surrounding environment is different, it causes Salt and Pepper Noise. Therefore, according to the problems in the pixel-based processing approach, the aim of this research is to compare the accuracy of machine learning algorithms based on object-based processing of satellite images in extracting the physical development area of Hamedan city using Sentinel 2 satellite image.Materials & Methods: The remote sensing data used in this research is a multi-spectral satellite image with a spatial resolution of 10 meters from the Sentinel 2 satellite, including bands 2 (blue), 3 (green), 4 (red) and 8 (near infrared) related to the date is the 23 of August 2023 in the city of Hamadan. The image of the Sentinel 2 satellite was downloaded from the website of the European Space Agency. In ENVI software, the pre-processing operation was performed on the satellite image. Then, in the eCognition software, the segmentation process was performed based on the appropriate scale, shape factor, and compression factor with the aim of producing image objects. After segmenting and converting the image into image objects, using machine learning classifiers based on object-oriented processing of satellite images including Bayes classification algorithms, k-nearest neighbor, support vector machine, decision tree and random trees, the classification process was carried out and maps of urban physical development area were produced. After the segmentation operation and the production of visual objects, three classes of built-up urban land, vegetation and barren land were defined, and some of the built objects in the segmentation stage were selected as training points and some were selected as ground Truth points.Results & DiscussionAfter downloading the satellite image from the website of the European Space Organization, in order to apply the radiometric correction of the image and also with the aim of matching the value of the gray levels of the image with the value of the real pixels of the terrestrial reflection, the gray levels are converted to radiance and then, using atmospheric correction, to coefficients. They became terrestrial reflections. In order to apply radiometric correction, Radiometric Calibration tool was used, and to apply atmospheric correction, FLAASH model was used in ENVI software. In order to classify the satellite image based on machine learning algorithms based on object-based processing, eCognition software was used. The satellite image of the study area, which was pre-processed and saved in TIFF format, was called in the environment of this software and saved as a project. In order to produce visual objects, segmentation operations were performed in different scales, shape factor and compression ratio to reach the most appropriate segmentation mode. In this step, the multiple resolution segmentation method was used to segment the image. The most appropriate segmentation included the scale of 100 and the shape factor of 0.6 and the compression factor of 0.4. Because in scales higher than 100, the construction of the visual object was not done correctly, so that several distinct complications were placed in one piece, and in scales less than 100, in some cases, one complication was placed in several pieces. In order to classify the generated image objects, machine learning algorithms were defined separately and after training each algorithm, the classification operation was performed. In this step, the classification was done based on the nearest neighbor method and by selecting the average and standard deviation parameters for each image band. After producing a map of the city physical development range through machine learning classifiers based on object-based processing of satellite images, the classification accuracy of each of the used algorithms was calculated. In order to calculate the accuracy of the above algorithms in eCognition software, using selected ground Truth control points, the overall accuracy and kappa coefficient were calculated for each of the algorithms.Conclusion:Based on the results of the research, it is possible to produce a map of Hamedan's urban physical development using machine learning algorithms based on object-based processing of satellite images with acceptable accuracy. Also, among all the algorithms used in this research, k-nearest neighbor with overall accuracy of 97% and kappa coefficient of 0.96 provided more accuracy.

View Article

Geographic Information System (GIS)

Extracting Place Functionality from User-Generated Textual Contents Using Machine Learning Methods

Mina Karimi; Mohammad Saedi Mesgari

Volume 31, Issue 124 , March 2023, , Pages 7-19

https://doi.org/10.22131/sepehr.2023.552846.2867

Abstract

Extended Abstract1. IntroductionIn GIScience, spatial information has usually been presented in the form of space. However, human reasoning, behavior, and perception are mainly based on place, not space. Places are usually ambiguous and context-dependent and are related to the human experience of the ... Read More Extended Abstract1. IntroductionIn GIScience, spatial information has usually been presented in the form of space. However, human reasoning, behavior, and perception are mainly based on place, not space. Places are usually ambiguous and context-dependent and are related to the human experience of the world. Place functionality as a context in place descriptions is one of the main and distinguishing features of the place. Today, with the increasing use of users of social networks, volunteered geographic information (VGI) and crowdsourcing information has grown significantly. However, information obtained from social networks, e.g. check-ins, often does not have a complete and clear view of the concept of place and it does not include spatial information between phenomena, land uses, and points of interest (POI). It ultimately limits their ability to work with the concept of place. In this case, GIS should detect the place functionality that does not necessarily exist simply and clearly in the stored data.2. Materials and MethodsTo address these issues, this paper aims to extract place functionality based on analysis of user-generated textual contents. In order to achieve this goal, first places and user’s reviews about places in TripAdvisor website are collected through web crawling. The advantage of these data over other place-based data is their independence from formal descriptions of place. These data were collected in October 2020, and only English reviews are considered. New York City (NYC) is selected as our case study area. At first, for each place type, we extracted all corresponding places. Then, for each place, we extracted a maximum of 1000 top reviews. To prepare data, places without geographic coordinates, places out of the study area, duplicates or places whose type is unknown are removed. There are five types of place categories on TripAdvisor, including Attraction, Food Serving Place, Hotel, Shop, and Vacation Rental. Then, different natural language processing (NLP) methods are used to preprocess the reviews. First, each review is converted to lower case and tokenized, then punctuations and stop words are removed. Afterward, all tokens are stemmed and lemmatized. In the next step, proper features should be selected for knowledge discovery. We use a bag-of-words (BoW) feature selection method which features values are weighted using TF-IDF scores for each user’s review. Finally, in a supervised method, these values and place functionalities are trained using a logistic regression classifier to predict place functionality on the test dataset.3. Results and DiscussionWe randomly assigned 75% of the data set to train the model and 25% to test the results. Finally, the results are evaluated using common machine learning evaluation measures by computing confusion-matrix. The evaluation results demonstrate that the overall accuracy of the proposed method is about 96% which is remarkable. For Food Serving Place, the predictions are so close to reality that in 98% of cases the algorithm was able to correctly predict Food Serving Places. Also, about 0.8% of them are considered as Attractions. In the case of Hotels, the accuracy is 97%. However, about 1.8% of Hotels are incorrectly categorized as Food Serving Places. Attractions are also 93% correctly predicted and about 3.8% of them are mistaken for Food Serving Places. In the case of Shop, the accuracy is about 74%, because the number of reviews related to this type of functionality is lower, although this issue has been partially resolved by weighting the samples. Secondly, in many cases, people visit the shopping malls for entertainment and not just shopping, which has led to about 15% of Shops being classified as Attractions. Also, about 11% of these Shops are considered as Food Serving Places. One of the most important reasons for this is the action of buying food in these places, which is a kind of purchase. In addition, in some shopping malls there are places to serve drink and food. Since the reviews of the Vacation Rentals was less than other functionalities, the lowest accuracy (about 65%) is related to them. In 25% of cases, Vacation Rentals are classified as Hotels. This result is not too far-fetched, as Vacation Rentals and Hotels are very similar in function and are often used to accommodate travelers and tourists. Also, 4.8% and 4.6% of them are classified as Attractions and Food Serving Places, respectively. The maximum precision and F1-score is achieved for Food Serving Places while Vacation Rentals show the least precision and F1-score since their functionality is similar to hotels, however, their results are also reliable and satisfactory.4. conclusionIn this study, we tried to extract the place functionality by analyzing the user-generated textual contents shared on the TripAdvisor website by users. To achieve this purpose, different NLP methods were used to prepare and preprocess the data. The bag-of-words constructed for each user's review was then modeled to a logistic regression classifier, and the place functionality on the test data was predicted. In future works, the efficiency of other feature selection methods as well as other classifiers in extracting place functionality can be evaluated and compared. In addition, the place functionality should be extracted in more detail where different types of attractions can be distinguished.

Scientific- Research Quarterly of Geographical Data (SEPEHR)

Articles in Press

Current Issue

Volume 32 (2023)

Volume 31 (2022)

Volume 30 (2021)

Volume 29 (2020)

Volume 28 (2019)

Volume 27 (2018)

Volume 26 (2017)

Volume 25 (2016)

Volume 24 (2015)

Volume 23 (2014)

Volume 22 (2013)

Volume 21 (2012)

Volume 20 (2011)

Volume 19 (2010)

Volume 18 (2009)

Volume 17 (2008)

Volume 16 (2007)

Volume 15 (2006)

Volume 14 (2005)

Volume 13 (2004)

Volume 12 (2003)

Volume 11 (2002)

Volume 10 (2001)

Volume 9 (2000)

Volume 8 (1999)

Volume 7 (1998)

Volume 6 (1997)

Volume 5 (1996)

Volume 4 (1995)

Volume 3 (1994)

Volume 2 (1992-1993)

Volume 1 (1990-1992)

Keywords = Machine Learning

Comparing the efficiency of machine learning classifiers in extracting the physical development area of Hamedan city using Object-Based Images analysis of satellite images

Abstract

Extracting Place Functionality from User-Generated Textual Contents Using Machine Learning Methods

Abstract