بررسی استفاده از خوشه بندی جهت کاهش زمان پرس و جوهای تجمیع رستری داخل پایگاه داده مکانی مطالعه موردی: رسترهای بارش

سدیدی, جواد; صاحبی وایقان, سعیده; رضائیان, هانی

doi:10.22131/sepehr.2017.28889

بررسی استفاده از خوشه بندی جهت کاهش زمان پرس و جوهای تجمیع رستری داخل پایگاه داده مکانی مطالعه موردی: رسترهای بارش

نوع مقاله : مقاله پژوهشی

نویسندگان

جواد سدیدی ¹

سعیده صاحبی وایقان ²

هانی رضائیان ³

¹ استادیار گروه سنجش از دور و سیستم اطلاعات جغرافیایی، دانشکده علوم جغرافیایی، دانشگاه خوارزمی، تهران ، ایران (نویسنده مسئول).

² کارشناس ارشد سنجش از دور و سیستم اطلاعات جغرافیایی، دانشگاه خوارزمی، تهران، ایران.

³ استادیار گروه سنجش از دور و سیستم اطلاعات جغرافیایی، دانشکده علوم جغرافیایی، دانشگاه خوارزمی، تهران ، ایران.

10.22131/sepehr.2017.28889

چکیده

در سال‌های اخیر با پیشرفت فن‌آوری‌های جمع‌آوری و مدیریت‌داده، پایگاه‌داده‌های بسیار بزرگ پدیدار شده‌اند. بسیاری از پرس‌وجوهای تجزیه‌ و تحلیل بر اساس ماهیتشان به تجمیع و خلاصه‌سازی بخش‌های بزرگی از داده‌های در حال تجزیه ‌و تحلیل نیاز دارند. مسئله اصلی در حیطه‌ی پایگاه داده پردازش کارآمد پرس‌وجو مخصوصاً در سیستم‌های لحظه‌ای^{^[1]} است که نیازمند رسیدن به جواب آنی می‌باشد تا اینکه کاربر زمان زیادی را برای دریافت پاسخ صرف نکند. (AQP (Approximate Query Processingبه‌عنوان روشی جایگزین برای پردازش پرس‌وجو‌ در محیط‌هایی که ارائه یک پاسخ دقیق زمان‌بر است، با هدف ارائه پاسخ تخمینی، کاهش زمان پاسخ را با حذف یا کاهش تعداد دسترسی‌ها به داده‌ی پایه میسر می‌سازد. پردازش ^{^[2]}In-Database عملکرد شبکه‌های کامپیوتری را بهبود بخشیده و به طراحی مناسب پرس‌وجو‌ها با نتایج نسبتاً سریع و دقیق کمک می‌کند. در این پژوهش عملیات تجمیع (Sum) در پایگاه داده PostgreSQL روی داده‌های رستری بارش به دو روش معمولی و بهینه پیشنهاد شده، انجام شده است. بررسی نتایج نشان می‌دهد که سرعت اجرای تابع Sum با خوشه‌بندی، 2/27 برابر اجرای این تابع بدون خوشه‌بندی است و میانگین اختلاف عددی پیکسل‌های حاصل از اجرای تابع Sum بهینه با اجرای تابع معمولی آن 028/0 می‌باشد.میانگین زمان اجرای پرس‌وجوهای معمولی و بهینه برای تابع Sum به ترتیب 211 و 754/7 ثانیه می‌باشد که نشانگر کارآمد بودن روش پیشنهاد شده در این تحقیق می‌باشد. نتایج تحقیق حاضر که در حقیقت کاهش معنی دار زمان پاسخ آنالیزهای داخل پایگاه داده‌ای در داده‌های رستری می‌باشد، می‌تواند در ارائه سرویس‌های رئال تایم تحت وب مانند هواشناسی، ترافیک و ... که نیازمند تحلیل‌های آنی و جواب لحظه‌ای می‌باشند مورد استفاده قرار گیرد.

^{^[1]}- Real time

^{^[2]}- درون پایگاه‌داده

کلیدواژه‌ها

بهینه سازی تجمیع

پردازش تقریبی پرس و جو

پردازش In-Database

آنالیز رستری

Sum

20.1001.1.25883860.1396.26.103.1.2

عنوان مقاله English

Investigation on using clustering to reduce In-Database Sum query execution time for spatial rasters A case study for precipitation raster

نویسندگان English

Javad Sadidi ¹

Saiedeh Sahebi Vayghan ²

Hani Rezaiyan ³

¹ Assistant professor, Department of remote sensing and GIS, Faculty of geographical sciences, Kharazmi Uuniversity, Tehran, Iran

² MSc in remote sensing and GIS, Kharazmi University, Tehran, Iran

³ Assistant professor, Department of remote sensing and GIS, Faculty of geographical sciences, Kharazmi University, Tehran, Iran

چکیده English

Extended Abstract
1. Introduction
During the recent years, advances in data collection and management technology, have led to the creation of very large databases. In contrast to other data such as numbers and strings, raster data are considered as complicated and contain special characteristics so that, they are classified as “big data”. Due to the nature of spatial analysis queries, the need arises to aggregate or summarize a large portions of the data to be analyzed. The main issue in the database era is the efficient query processing so that users do not spend long time for retrieving the requests. Traditional query processes return exact answers, however, the answers take more time than what is needed in real time systems. It is notable that sometimes the query running time is much more important than the accuracy, specially, in real time services.
AQP (Approximate Query Processing) is an alternative method for query processing in time – consuming environments that enables the system to provide fast approximated answers. One of the most significant applications of AQP is query optimization. AQP may play a valuable role in increasing the speed of spatial queries facing robust and complicated data. It is also an efficient method for recognizing the needed data and subsequently minimizing the cost of aggregation queries. Since 1980s, utilizing the approximation methods have been initiated for decision support systems. Also, AQP has been noticed to address some problems in database era during the past decade. The current technics in various research frontiers are only useful for relational database systems (Azevedo, et al., 2007). The main idea behind in-database processing is the elimination of big data sets transmission to disjointed programs. Since, in-database processing that all analysis are implemented into database, it offers fast implementation, scalability and security. Hence, In-Database processing improves the computer network productivity and participates in well-suited designing of fast response queries.

2. Methodology
The current research aims at comparing traditional and optimized Sum aggregation operation to decrease the running time of spatial queries into PostgreSQL database. To undertake the research, 60 precipitation rasters have been used. The study area is located in Lorestan province and precipitation gauging stations were used as primary data. Raster data have been created from monthly precipitation data for the period of 2010-2014 using Kriging interpolation method and entered into PostgreSQL database using Raster2pgSQl extension. Then, raster pixels are stored into their related tables. In optimized aggregation method, firstly, raster data are clustered by the written similarity function. The used functions have been written by PL/pgSQL language in PostGIS. The execution steps of Sum function are as the following: creating the similarity function, performing the function, running the optimized query and consequently, resulting the approximated query respectively.
Subsequently, one raster is selected from each cluster and it is multiplied by the number of rasters belonging to the given cluster. The resulted raster is entered to Sum function as the representative of the cluster. In each cluster, the number of implemented arithmetic operations is reduced as the following formula: (number of rasters in the cluster-1) *rows*columns of the given raster). Using the mentioned method, the number of arithmetic operations is significantly reduced and prepares the fast approximate answers. Finally, for accuracy assessment, the error of each method was approximated by calculating mean relative error, DI (difference indicator) error and relative error for each raster. Finally, the achieved results were analyzed.
It is mentionable that the user may make a decision whether the resulted accuracy is acceptable for a particular project or an exact query has to be executed.

3. Results and discussion
In this research, to compare the traditional and optimized Sum function, five scenarios have been implemented. The results show that the optimized Sum function is 27.2 times faster than the traditional function. The average difference of pixel values between the traditional and optimized one is 0.028. Consequently, the query running time for the optimized and traditional Sum is 7.754 and 211 seconds respectively, which implies the efficiency of the used method (optimized Sum).
It is notable that the accuracy of the optimized method depends on the nature and homogeneity or heterogeneity of the used rasters.
The valuable decreasing of the in-database spatial query running time may be used to offer real time web-based services such as meteorology, traffic, etc., which need real time analysis and fast retrieving responses.

کلیدواژه‌ها English

Keywords: Aggregation Optimization

Approximate Query Processing

In-Database processing

Raster Analysis

Sum

1. Acharya, S., Gibbons, P. B., Poosala, V., & Ramaswamy, S. (1999, June). Join synopses for approximate query answering. In ACM SIGMOD Record (Vol. 28, No. 2, pp. 275-286). ACM.
2. Azevedo, L. G., Zimbrão, G., & De Souza, J. M. (2007). Approximate query processing in spatial databases using raster signatures. In Advances in Geoinformatics (pp. 69-86). Springer Berlin Heidelberg.
3. Babcock, B.; Chaudhuri, S. and Das, G. (2003), Dynamic Sample Selection for Approximate Query Processing., in Alon Y. Halevy; Zachary G. Ives & AnHai Doan, ed., ‘SIGMOD Conference’, ACM,, pp. 539-550.
4. Bernardino, J. R., Furtado, P. S., & Madeira, H. C. (2002). Approximate query answering using data warehouse striping. Journal of Intelligent Information Systems, 19(2), 145-167.
5. Brinkhoff, T., Horn, H., Kriegel, H. P., & Schneider, R. (1993, June). A storage and access architecture for efficient query processing in spatial database systems. In International Symposium on Spatial Databases (pp. 357-376). Springer Berlin Heidelberg.
6. Burden, R., and D. Faires. 1993. Numerical Analysis. 5th ed. Boston, MA: PWS.
7. Burden, R., and D. Faires. 1993. Numerical Analysis. 5th ed. Boston, MA: PWS.
8. Chakrabarti, K., Garofalakis, M., Rastogi, R. and Shim, K. (2001). Approximate query processing using wavelets. The VLDB Journal, 10(2-3), pp.199-223.
9. Garofalakis, M. and Gibbon, P. (2001). Approximate Query Processing: Taming the TeraBytes. In: VLDB ‘01 Proceedings of the 27th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., p.725.
10. Gibbons, P. B., Matias, Y., Poosala, V., 1997, Aqua project white paper, Technical report, Bell Laboratories (1997).
11. Golub, G., and C. Van Loan. 1993. MATRIX Computations. 4th ed. Baltimore, MD: Johns Hopkins University Press.
12. Golub, G., and C. Van Loan. 1993. MATRIX Computations. 4th ed. Baltimore, MD: Johns Hopkins University Press.
13. Güting, R. H., & Schneider, M. (1993, June). Realms: A foundation for spatial data types in database systems. In International Symposium on Spatial Databases (pp. 14-35). Springer Berlin Heidelberg.
14. Ioannidis, Y. E., & Poosala, V. (1995, June). Balancing histogram optimality and practicality for query result size estimation. In ACM SIGMOD Record Vol. 24, No. 2, pp. 233-244.
15. Kang, M. A., Zaamoune, M., Pinet, F., Bimonte, S., & Beaune, P. (2015). Performance optimization of grid aggregation in spatial data warehouses. International Journal of Digital Earth, 8(12),1-19.
16. Liu, Q. (2009). Approximate Query Processing. Encyclopedia of Database Systems, pp.113-119.
17. Mehanna, Y. S., Mahmuddin, M., & Abdelaziz, H. S. Approximate Query Processing Concepts and Techniques,pp.11-19.
18. Obe, R. O., & Hsu, L. S. (2015). PostGIS in action. Manning Publications Co.
19. Papadias, D., Kalnis, P., Zhang, J., & Tao, Y. (2001, July). Efficient OLAP operations in spatial data warehouses. In International Symposium on Spatial and Temporal Databases (pp. 443-459). Springer Berlin Heidelberg
20. Perera, K. S., Hahmann, M., Lehner, W., Pedersen, T. B., & Thomsen, C. (2015, April). Modeling Large Time Series for Efficient Approximate Query Processing. In International Conference on Database Systems for Advanced Applications (pp. 190-204). Springer International Publishing.
21. Wang, J. (2009). Encyclopedia of data warehousing and mining. Hershey: Information Science Reference.
22. Xie, Q., Chen, F., Zhang, Z., & Lucena, I. (2013). In-database image processing in Oracle Spatial GeoRaster. In ASPRS 2013 Annual Conference Baltimore, Maryland.

فصلنامه علمی- پژوهشی اطلاعات جغرافیایی « سپهر»

دوره 26، شماره 103 - شماره پیاپی 103
پاییز 1396
صفحه 5-16

XML

اصل مقاله 1.37 M

تعداد مشاهده مقاله 1,937
تعداد دریافت فایل اصل مقاله 1,249

فصلنامه علمی- پژوهشی اطلاعات جغرافیایی « سپهر»

بررسی استفاده از خوشه بندی جهت کاهش زمان پرس و جوهای تجمیع رستری داخل پایگاه داده مکانی مطالعه موردی: رسترهای بارش

Investigation on using clustering to reduce In-Database Sum query execution time for spatial rasters A case study for precipitation raster

دوره 26، شماره 103 - شماره پیاپی 103پاییز 1396صفحه 5-16

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 26، شماره 103 - شماره پیاپی 103
پاییز 1396
صفحه 5-16