Geographic Data
Zahra Moradi; Mohammad Sadi Mesgari
Abstract
Extended Abstract-Introduction: The growing importance of housing is not hidden from anyone in terms of the profound and significant effects it has on the various social, political, and economic dimensions of countries; Therefore, accurate and reliable price estimation definitely facilitates policy-making ...
Read More
Extended Abstract-Introduction: The growing importance of housing is not hidden from anyone in terms of the profound and significant effects it has on the various social, political, and economic dimensions of countries; Therefore, accurate and reliable price estimation definitely facilitates policy-making in this field. Hundreds of factors may affect property prices in different situations as a subset of structural, spatial, and socio-economic factors. Therefore, considering these factors, property pricing should be done efficiently. Due to the complex nature of the real estate market, research has used common deep learning algorithms such as DNN, RNN, CNN, etc., but these algorithms are not very suitable for tabular data. On the other hand, the deep learning models in property pricing are also completely definite and do not take into account data uncertainty.Materials & Method: In this article, we have tried to pay attention to the tabular structure of real estate data in applying deep learning methods. The TabNet deep new architecture is used for this purpose. In addition, at the same time as the learning process, it makes feature selection fully interpretable. In this study, also using existing combination techniques, fuzzy logic is combined with deep learning algorithms to learn complex problems faster and more accurately, to overcome the shortcomings of the certainty of deep learning models and not consider the inherent uncertainty of the data in this models. In this study, using the existing combination techniques, also using spatial information system (GIS) to provide a clearer evaluation to ensure full visualization of the spatial pattern of property properties as well as the relationship between these properties and pricing and spatial variables are included in the valuation model. In order to evaluate the proposed methods, real estate data of District 5 of Tehran were used.Results & Discussion: The order and prioritization of the impact of features on the pricing of Tehran residential properties by the TabNet algorithm indicate the significant impact of spatial factors. So that in this ranking, after the area, the two spatial characteristics of latitude and longitude have the second and third ranks, respectively. Basically, latitude and longitude indicate the criteria of neighborhoods and the type and prestige of different places in the city, and the social class of different streets and neighborhoods in the city, which is clearly a factor in influencing the price. Finally, TabNet, DNN, CNN, RNN, LSTM, Autoencoder algorithms as well as XGBoost machine learning algorithms were used for the Tehran data set, and RMSE, MA and evaluation criteria were compared, which according to the criterion, a 5% improvement in accuracy was achieved by using TabNet. Finally, the RMSE of the FuzzyTabNet hybrid algorithm for Tehran data decreased by 4/65% compared to the basic TabNet algorithm. The fuzzy Autoencoder network also improved by 5/52% compared to the common Autoencoder network.
Geographic Information System (GIS)
Mina Karimi; Mohammad Saedi Mesgari
Abstract
Extended Abstract1. IntroductionIn GIScience, spatial information has usually been presented in the form of space. However, human reasoning, behavior, and perception are mainly based on place, not space. Places are usually ambiguous and context-dependent and are related to the human experience of the ...
Read More
Extended Abstract1. IntroductionIn GIScience, spatial information has usually been presented in the form of space. However, human reasoning, behavior, and perception are mainly based on place, not space. Places are usually ambiguous and context-dependent and are related to the human experience of the world. Place functionality as a context in place descriptions is one of the main and distinguishing features of the place. Today, with the increasing use of users of social networks, volunteered geographic information (VGI) and crowdsourcing information has grown significantly. However, information obtained from social networks, e.g. check-ins, often does not have a complete and clear view of the concept of place and it does not include spatial information between phenomena, land uses, and points of interest (POI). It ultimately limits their ability to work with the concept of place. In this case, GIS should detect the place functionality that does not necessarily exist simply and clearly in the stored data.2. Materials and MethodsTo address these issues, this paper aims to extract place functionality based on analysis of user-generated textual contents. In order to achieve this goal, first places and user’s reviews about places in TripAdvisor website are collected through web crawling. The advantage of these data over other place-based data is their independence from formal descriptions of place. These data were collected in October 2020, and only English reviews are considered. New York City (NYC) is selected as our case study area. At first, for each place type, we extracted all corresponding places. Then, for each place, we extracted a maximum of 1000 top reviews. To prepare data, places without geographic coordinates, places out of the study area, duplicates or places whose type is unknown are removed. There are five types of place categories on TripAdvisor, including Attraction, Food Serving Place, Hotel, Shop, and Vacation Rental. Then, different natural language processing (NLP) methods are used to preprocess the reviews. First, each review is converted to lower case and tokenized, then punctuations and stop words are removed. Afterward, all tokens are stemmed and lemmatized. In the next step, proper features should be selected for knowledge discovery. We use a bag-of-words (BoW) feature selection method which features values are weighted using TF-IDF scores for each user’s review. Finally, in a supervised method, these values and place functionalities are trained using a logistic regression classifier to predict place functionality on the test dataset.3. Results and DiscussionWe randomly assigned 75% of the data set to train the model and 25% to test the results. Finally, the results are evaluated using common machine learning evaluation measures by computing confusion-matrix. The evaluation results demonstrate that the overall accuracy of the proposed method is about 96% which is remarkable. For Food Serving Place, the predictions are so close to reality that in 98% of cases the algorithm was able to correctly predict Food Serving Places. Also, about 0.8% of them are considered as Attractions. In the case of Hotels, the accuracy is 97%. However, about 1.8% of Hotels are incorrectly categorized as Food Serving Places. Attractions are also 93% correctly predicted and about 3.8% of them are mistaken for Food Serving Places. In the case of Shop, the accuracy is about 74%, because the number of reviews related to this type of functionality is lower, although this issue has been partially resolved by weighting the samples. Secondly, in many cases, people visit the shopping malls for entertainment and not just shopping, which has led to about 15% of Shops being classified as Attractions. Also, about 11% of these Shops are considered as Food Serving Places. One of the most important reasons for this is the action of buying food in these places, which is a kind of purchase. In addition, in some shopping malls there are places to serve drink and food. Since the reviews of the Vacation Rentals was less than other functionalities, the lowest accuracy (about 65%) is related to them. In 25% of cases, Vacation Rentals are classified as Hotels. This result is not too far-fetched, as Vacation Rentals and Hotels are very similar in function and are often used to accommodate travelers and tourists. Also, 4.8% and 4.6% of them are classified as Attractions and Food Serving Places, respectively. The maximum precision and F1-score is achieved for Food Serving Places while Vacation Rentals show the least precision and F1-score since their functionality is similar to hotels, however, their results are also reliable and satisfactory.4. conclusionIn this study, we tried to extract the place functionality by analyzing the user-generated textual contents shared on the TripAdvisor website by users. To achieve this purpose, different NLP methods were used to prepare and preprocess the data. The bag-of-words constructed for each user's review was then modeled to a logistic regression classifier, and the place functionality on the test data was predicted. In future works, the efficiency of other feature selection methods as well as other classifiers in extracting place functionality can be evaluated and compared. In addition, the place functionality should be extracted in more detail where different types of attractions can be distinguished.
Mehrdad Kaveh; Mohammad Saadi Mesgari
Abstract
Extended Abstract Introduction Site selection for health centers and hospitals in proper locations and the allocation of population to them is an important issue in urban planning. The location and allocation of health and medical facilities including hospitals, have long been an important issue ...
Read More
Extended Abstract Introduction Site selection for health centers and hospitals in proper locations and the allocation of population to them is an important issue in urban planning. The location and allocation of health and medical facilities including hospitals, have long been an important issue for urban planners that has become more complicated with the growth of population. Location and allocation of hospitals is basically planned to ensure the availability of proper and comprehensive health services as well as the reduction of the establishment costs. Improper planning of the health centers has created multiple problems for big cities in developing countries in recent years. In the present study, the Genetic Algorithm (GA), Hybrid Particle Swarm Optimization algorithm (HPSO), Geospatial Information System (GIS) and Analytic Hierarchy Process (AHP) have been used for selecting proper sites of hospital and allocating the demanded locations to these centers in District 2 of Tehran. Materials & Methods The main goal of this research is to compare and evaluate the performance of the Genetic Algorithm (GA) and Hybrid Particle Swarm Optimization algorithm (HPSO) for determining the optimal locations of hospital centers and allocating the population blocks to them. In order to limit the search space, the analyzing capabilities of the Geospatial Information System (GIS) and Analytic Hierarchy Process (AHP) have been used to select the candidate sites satisfying the initial conditions and criteria. The locations of such candidate centers are the input of the optimization section. The accuracy of the entire process strongly depends on the selection of these candidate sites. Hence, in this paper, the Analytic Hierarchy Process (AHP) method has been used to select the candidate centers. Then, two optimization algorithms were applied in choosing six optimum sites from the candidate locations and allocating the population to them through minimizing the overall distances between the centers and their allocated blocks. In this study, to improve the Particle Swarm Optimization, a simple neighborhood search has been proposed for better exploitation of the elite particles. The main purpose of this neighborhood search is to increase the convergence rate of the algorithm without decreasing the random search. Since the neighborhood search has a specific definition proportional to each issue, and the issues of location and allocation are spatial issues as well, therefore, the geographic principle of appropriate distribution of the centers in space has been used to define the neighborhood search (the distance between the centers should not be less than a certain amount). In an elite particle, two centers with the lowest distance are selected and one of them is replaced by a new and randomly selected center. If such a change provides a better objective function, the newly created solution in the elite particle is replaced. To calibrate the algorithms parameters, a simulated data set has been used. Having proper values for those parameters, the algorithms were tested on the real data of the study area. Results & Discussion Given the results of algorithms on real data, the performances of both algorithms are highly dependent on the initial population and the allowed number of iterations. In general, lower numbers of iterations and more populations brings better results than the higher iterations and lower populations. The results show that the Hybrid Particle Swarm Optimization (HPSO) has better performance than the Genetic Algorithm (GA). The convergence rate of the Hybrid Particle Swarm Optimization (HPSO) algorithm is faster than the genetic algorithm (GA), which can be attributed to the particle’s motion toward the best personal and global experiences. Furthermore, the proposed neighborhood search has caused the HPSO algorithm to converge earlier. To evaluate the repeatability of the algorithms, they were performed 40 times for both simulated and real data. Both algorithms have displayed high levels of repeatability, but the Hybrid Particle Swarm Optimization (HPSO) algorithm is more stable. However, the use of Genetic Algorithm (GA) on simulated data has shown more stability than its use on real data. For both the simulated data and real data, the Hybrid Particle Swarm Optimization (HPSO) algorithm performs faster than the Genetic Algorithm (GA). Conclusion Simplicity and repeatability of the algorithm are among the important factors which are very significant from the user’s point of view. In this research, the HPSO algorithm has not only been repeatable and simple, but has performed faster than the GA. Therefore, considering these criteria, regarding the special case of this research, the HPSO seems to be more promising than the GA.