استخراج عملکرد محل از محتواهای متنی کاربرتولید با استفاده از روش‌های یادگیری ماشین

کریمی, مینا; مسگری, محمدسعدی

doi:10.22131/sepehr.2023.552846.2867

استخراج عملکرد محل از محتواهای متنی کاربرتولید با استفاده از روش‌های یادگیری ماشین

نوع مقاله : مقاله پژوهشی

نویسندگان

مینا کریمی ¹

محمدسعدی مسگری ²

¹ دانشجوی دکتری، گروه مهندسی سیستم اطلاعات مکانی، دانشکده مهندسی نقشه‌برداری، دانشگاه صنعتی خواجه‌نصیرالدین طوسی، تهران، ایران.

² استادیار، گروه مهندسی سیستم اطلاعات مکانی،دانشکده مهندسی نقشه برداری - دانشگاه صنعتی خواجه نصیرالدین طوسی، تهران، ایران.

10.22131/sepehr.2023.552846.2867

چکیده

امروزه با افزایش روز افزون استفاده کاربران از شبکه های اجتماعی، اطلاعات مکانی مردم گستر رشد چشمگیری داشته است. از میان انواع اطلاعات، محتواهای متنی کاربرتولید غالباً در ساختار مشخصی به اشتراک گذاشته نمی شوند. یکی از ویژگی های عمده این نوع اطلاعات محل مبنا بودن آنها است.محل های مورد گفتگوی بشر معمولاً همراه با ابهام و وابسته به بافت است. عملکرد محل یا به عبارتی عمده فعالیت هایی که افراد در یک محل انجام می دهند، به عنوان یک بافت در توصیفات محل، ازجمله ویژگی های عمده و متمایزکننده محل است. هدف این تحقیق استخراج عملکرد محل با استفاده از تحلیل محتواهای متنی کاربرتولید به اشتراک گذاشته شده توسط کاربران است. به این منظور ابتدا محل ها و نظرات کاربران در مورد محل ها در وبگاه TripAdvisorبه عنوان محتواهای متنی، جمع آوری شده، سپس از روش های مختلف پردازش زبان طبیعی به منظور آماده سازی و پیش پردازش داده ها استفاده می شود. در ادامه برای هر دیدگاه کاربر یک مجموعه واژگان با استفاده از مقادیر TF-IDFبه عنوان مقادیر بردار ویژگی ساخته می شود. سپس در یک روش نظارت شده این مقادیر به همراه عملکرد محل هابه عنوان ورودی به یک طبقه بندی کننده لجستیک رگرسیون به منظور آموزش مدل داده شده و با استفاده از آن عملکرد محل بر روی داده های آزمایشی پیش بینی شده است. نتایج ارزیابی روش از طریق محاسبه ماتریس درهم ریختگی نشان می دهد، صحت کلی روش پیشنهادی در حدود 96درصد است که رقم قابل توجهی است. همچنین بیشترین دقت و امتیاز F1 برای محل های سرو خوراکی است، درحالی که اقامتگاه ها به دلیل شباهت عملکردی به هتل ها کمترین دقت و امتیاز F1را دارند ولی با این وجود نتایج آنها نیز قابل اطمینان و رضایت بخش است.

کلیدواژه‌ها

محل

عملکرد محل

محتواهای کاربرتولید

پردازش زبان طبیعی

یادگیری ماشین

متن

20.1001.1.25883860.1401.31.124.1.9

موضوعات

سیستم اطلاعات جغرافیایی

عنوان مقاله English

Extracting Place Functionality from User-Generated Textual Contents Using Machine Learning Methods

نویسندگان English

Mina Karimi ¹

Mohammad Saedi Mesgari ²

¹ PhD Candidate, Faculty of Geodesy and Geomatics Eng., K.N.Toosi University of Technology, Tehran, Iran.

² Associate Professor, Faculty of Geodesy and Geomatics Eng., K.N.Toosi University of Technology, Tehran, Iran.

چکیده English

Extended Abstract

1. Introduction

In GIScience, spatial information has usually been presented in the form of space. However, human reasoning, behavior, and perception are mainly based on place, not space. Places are usually ambiguous and context-dependent and are related to the human experience of the world. Place functionality as a context in place descriptions is one of the main and distinguishing features of the place. Today, with the increasing use of users of social networks, volunteered geographic information (VGI) and crowdsourcing information has grown significantly. However, information obtained from social networks, e.g. check-ins, often does not have a complete and clear view of the concept of place and it does not include spatial information between phenomena, land uses, and points of interest (POI). It ultimately limits their ability to work with the concept of place. In this case, GIS should detect the place functionality that does not necessarily exist simply and clearly in the stored data.

2. Materials and Methods

To address these issues, this paper aims to extract place functionality based on analysis of user-generated textual contents. In order to achieve this goal, first places and user’s reviews about places in TripAdvisor website are collected through web crawling. The advantage of these data over other place-based data is their independence from formal descriptions of place. These data were collected in October 2020, and only English reviews are considered. New York City (NYC) is selected as our case study area. At first, for each place type, we extracted all corresponding places. Then, for each place, we extracted a maximum of 1000 top reviews. To prepare data, places without geographic coordinates, places out of the study area, duplicates or places whose type is unknown are removed. There are five types of place categories on TripAdvisor, including Attraction, Food Serving Place, Hotel, Shop, and Vacation Rental. Then, different natural language processing (NLP) methods are used to preprocess the reviews. First, each review is converted to lower case and tokenized, then punctuations and stop words are removed. Afterward, all tokens are stemmed and lemmatized. In the next step, proper features should be selected for knowledge discovery. We use a bag-of-words (BoW) feature selection method which features values are weighted using TF-IDF scores for each user’s review. Finally, in a supervised method, these values and place functionalities are trained using a logistic regression classifier to predict place functionality on the test dataset.

3. Results and Discussion

We randomly assigned 75% of the data set to train the model and 25% to test the results. Finally, the results are evaluated using common machine learning evaluation measures by computing confusion-matrix. The evaluation results demonstrate that the overall accuracy of the proposed method is about 96% which is remarkable. For Food Serving Place, the predictions are so close to reality that in 98% of cases the algorithm was able to correctly predict Food Serving Places. Also, about 0.8% of them are considered as Attractions. In the case of Hotels, the accuracy is 97%. However, about 1.8% of Hotels are incorrectly categorized as Food Serving Places. Attractions are also 93% correctly predicted and about 3.8% of them are mistaken for Food Serving Places. In the case of Shop, the accuracy is about 74%, because the number of reviews related to this type of functionality is lower, although this issue has been partially resolved by weighting the samples. Secondly, in many cases, people visit the shopping malls for entertainment and not just shopping, which has led to about 15% of Shops being classified as Attractions. Also, about 11% of these Shops are considered as Food Serving Places. One of the most important reasons for this is the action of buying food in these places, which is a kind of purchase. In addition, in some shopping malls there are places to serve drink and food. Since the reviews of the Vacation Rentals was less than other functionalities, the lowest accuracy (about 65%) is related to them. In 25% of cases, Vacation Rentals are classified as Hotels. This result is not too far-fetched, as Vacation Rentals and Hotels are very similar in function and are often used to accommodate travelers and tourists. Also, 4.8% and 4.6% of them are classified as Attractions and Food Serving Places, respectively. The maximum precision and F1-score is achieved for Food Serving Places while Vacation Rentals show the least precision and F1-score since their functionality is similar to hotels, however, their results are also reliable and satisfactory.

4. conclusion

In this study, we tried to extract the place functionality by analyzing the user-generated textual contents shared on the TripAdvisor website by users. To achieve this purpose, different NLP methods were used to prepare and preprocess the data. The bag-of-words constructed for each user's review was then modeled to a logistic regression classifier, and the place functionality on the test data was predicted. In future works, the efficiency of other feature selection methods as well as other classifiers in extracting place functionality can be evaluated and compared. In addition, the place functionality should be extracted in more detail where different types of attractions can be distinguished.

کلیدواژه‌ها English

Place

Place Functionality

User Generated Contents (UGCs)

Natural Language Processing (NLP)

Machine Learning

Text

1- Adams, B., &Janowicz, K. (2015). Thematic signatures for cleansing and enriching place-related linked data. International Journal of Geographical Information Science, 29(4), 556-579.

2- Adams, B., & McKenzie, G. (2013). Inferring thematic places from spatially referenced natural language descriptions. In Crowdsourcing geographic knowledge (pp. 201-221): Springer.

3- Alazzawi, A. N., Abdelmoty, A. I., & Jones, C. B. (2012). What can I do there? Towards the automatic discovery of place-related services and activities. International Journal of Geographical Information Science, 26(2), 345-364.

4- Alexander, C. (2002). The Phenomenon of Life: BOOK ONE The Nature of Order: An Essay on the Art of Building and The Nature of the Universe.

5- Couclelis, H. (1992). Location, place, region, and space. Geography’s inner worlds, 2, 15-233.

6- “Decennial Census P.L. 94-171 Redistricting Data”. U.S. Census Bureau. Retrieved August 12, 2020.

7- Fan, K., Zhang, D., Wang, Y., & Zhao, S. (2015). Discovering urban social functional regions using taxi trajectories. Paper presented at the 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

8- Gibson, J. J. (1977). The theory of affordances. Hilldale, USA, 1(2).

Goodchild, M. F. (2015). Space, place and health. Annals of GIS, 21(2), 97-100.

9- Han, H., Yu, X., & Long, Y. (2015). Discovering functional zones using bus smart card data and points of interest in beijing. arXiv preprint arXiv:1503.03131.

10- Han, J., &Kamber, M. (2001). Data mining concepts and techniques, Morgan Kaufmann Publishers. San Francisco, CA, 335-391.

11- Hartshorne, R. (1969). Perspective on the Nature of Geography.

12- Hill, L. L. (2000). Core elements of digital gazetteers: placenames, categories, and footprints. Paper presented at the International Conference on Theory and Practice of Digital Libraries.

13- Hobel, H., Fogliaroni, P., & Frank, A. U. (2016). Deriving the geographic footprint of cognitive regions. In Geospatial data in a changing world (pp. 67-84): Springer.

14- Jordan, T., Raubal, M., Gartrell, B., &Egenhofer, M. (1998). An affordance-based model of place in GIS. Paper presented at the 8th Int. Symposium on Spatial Data Handling, SDH.

15- Khoury, R., Karray, F., &Kamel, M. (2006). Extracting and representing actions in text using possibility theory. Paper presented at the Proceedings of the 3rd annual e-learning conference on Intelligent Interactive Learning Object Repositories (i2LOR 2006).

16- Mocnik, F.-B. (2022). Putting geographical information science in place–towards theories of platial information and platial information systems. Progress in Human Geography, 03091325221074023.

17- Mocnik, F.-B., &Westerholt, R. Places Across Cultures.

18- Noulas, A., Scellato, S., Mascolo, C., &Pontil, M. (2011). Exploiting semantic annotations for clustering geographic areas and users in location-based social networks. Paper presented at the Fifth International AAAI Conference on Weblogs and Social Media.

19- Papadakis, E., &Blaschke, T. (2017). Place-based GIS: Functional Space. Paper presented at the AGILE PhD School.

20- Papadakis, E., Gao, S., &Baryannis, G. (2019). Combining Design Patterns and Topic Modeling to Discover Regions That Support Particular Functionality. ISPRS International Journal of Geo-Information, 8(9), 385.

21- Papadakis, E., Resch, B., &Blaschke, T. (2016). A Function-based model of Place. Paper presented at the International Conference on GIScience Short Paper Proceedings.

22- Purves, R. S., Clough, P., Jones, C. B., Arampatzis, A., Bucher, B., Finch, D., . . . Vaid, S. (2007). The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet. International Journal of Geographical Information Science, 21(7), 717-745.

23- Purves, R. S., Winter, S., & Kuhn, W. (2019). Places in information science. Journal of the Association for Information Science and Technology, 70(11), 1173-1182.

24- Relph, E. (1976). Place and placelessness (Vol. 1): Pion.

25- Su, S., Lei, C., Li, A., Pi, J., &Cai, Z. (2017). Coverage inequality and quality of volunteered geographic features in Chinese cities: Analyzing the associated local characteristics using geographically weighted regression. Applied geography, 78, 78-93.

26- Tao, H., Wang, K., Zhuo, L., & Li, X. (2019). Re-examining urban region and inferring regional function based on spatial–temporal interaction. International journal of digital earth, 12(3), 293-310.

27- “Top 8 Cities by GDP: China vs. The U.S.” Business Insider, Inc. July 31, 2011. Retrieved July 1, 2018. For instance, Shanghai, the largest Chinese city with the highest economic production, and a fast-growing global financial hub, is far from matching or surpassing New York, the largest city in the U.S. and the economic and financial super center of the world.”New York City: The Financial Capital of the World”. Pando Logic. October 8, 2015. Retrieved July 1, 2018.

28- Tuan, Y.-F. (1979). Space and place: humanistic perspective. In Philosophy in geography (pp. 387-427): Springer.

29- Vasardani, M., Tomko, M., &Winter, S. (2016). The cognitive aspect of place properties. Paper presented at the International Conference on GIScience Short Paper Proceedings.

30- Winter, S., Baldwin, T., Tomko, M., Renz, J., Kuhn, W., &Vasardani, M. (2021). Spatial concepts in the conversation with a computer. Communications of the ACM, 64(7), 82-88.

31- Zhou, T., Liu, X., Qian, Z., Chen, H., & Tao, F. (2020). Automatic Identification of the Social Functions of Areas of Interest (AOIs) Using the Standard Hour-Day-Spectrum Approach. ISPRS International Journal of Geo-Information, 9(1), 7.

32- Zhou, X., & Zhang, L. (2016). Crowdsourcing functions of the living city from Twitter and Foursquare data. Cartography and Geographic Information Science, 43(5), 393-404.

33- https://doi.org/10.1111/tgis.12999