An Approach for Incremental Parallel Mining of Interesting Clustering Patterns in Big Data

Authors

  • Ahmed S. Al-Hegami, Akram A. M. Mustafa, Abdulmajed A. G. Al-Khulidi

Keywords:

data mining, incremental clustering, parallel mining, machine learning, novelty measure

Abstract

Clustering algorithms are a significant problem in data mining. Researchers were motivated to propose incremental and parallel clustering algorithms To handle the ever-increasing size of data in real-world databases in order to discover interesting patterns. This is because traditional clustering algorithms, which require the entire dataset to be present before the clustering process can begin, can be computationally expensive and time-consuming to run on large datasets. Incremental clustering algorithms, on the other hand, can be used to cluster data that is being added to a dataset incrementally, which can be much more efficient. Incremental clustering allows new data points to be added to an existing clustering model without having to reprocess all the data. It is useful when dealing with big data that are continuously growing or changing, as it allows the clustering model to be updated without incurring the computational cost of reprocessing all the data. In this paper, an incremental and parallel Clustering mining approach that integrates interestingness criterion  during the discovery process of  the model is proposed. The approach efficiently discovers interesting patterns from big data. The user's prior knowledge about the domain is essential for the patterns to be interesting. The approach uses MapReduce to process big data in parallel. Parallel and incremental clustering algorithms that consider changing data trends and user attitudes are promising for making the mining process more  effective for decision making.

Downloads

Download data is not yet available.

References

Absalom E. Ezugwu, Abiodun M. Ikotun, Olaide O. Oyelade, Laith Abualigah, Jeffery O. Agushaka, Christopher I. Eke, Andronicus A. Akinyelu,(2022).A comprehensive survey of clustering algorithms: State-of- the-art machine learning applications, taxonomy, challenges, and future research prospects, Engineering Applications of Artificial Intelligence, Volume 110,2022, 104743,ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2022.104743.

Amber Abernathy, M. Emre Celebi, (2022). The incremental online k-means clustering algorithm and its application to color quantization. Expert Systems with Applications, Volume 207, 2022, 17927,ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2022.117927.

Abiodun M. Ikotun, Absalom E. Ezugwu, Laith Abualigah, Belal Abuhaija, Jia Heming. (2023).K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,Information Sciences, Volume 622, Pages 178-210, ISSN 0020-0255,https://doi.org/10.1016/j.ins.2022.11.139.

Redha Mutar, J. (2022). A Review of Clustering Algorithms. International Journal of Computer Scienceand Mobile Applications (IJCSMA), 10(10), 44–50. https://doi.org/10.5281/zenodo.7243829.

Bhatnagar, V. Al-Hegami, A. S. and Kumar, N., (2005). A hybrid approach for Quantification of Novelty in Rule Discovery”, In Proceedings of International Conference on Artificial Learning and Data Mining (ALDM’05).

Cicirelli, F., Nigro, L., Pupo, F. (2023). Performance of Parallel K-Means Based on Theatre. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Seventh International Congress on Information and Communication Technology, DOI:10.1007/978-981-19-2397-5_24.

N Kerdprasop, N., Kerdprasop, K. (2003), Data partitioning for incremental data mining. In proceedings of 1st International Forum on Information and Computer Science, 114-118.

Rasim M. Alguliyev, Ramiz M. Aliguliyev, Lyudmila V. Sukhostat, (2021). Parallel batch k-means for Big data clustering", Computers & Industrial Engineering, Volume 152.

Bagirov, A.M. &Ordin, Burak & Ozturk, Gurkan& Xavier, Adilson. (2015). An incremental clustering algorithm based on hyperbolic smoothing. Computational Optimization and Applications. 61. 10.1007/s10589- 014-9711-7.

Mai, S. T., Jacobsen, J. , Amer-Yahia, S. Spence, I. and Nhat-Phuong, T. (2022). Incremental Density- Based Clustering on Multicore Processors, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1338-1356.

Prasad, R.K., Sarmah, R., Chakraborty, S. (2019). Incremental k-Means Method In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S.(eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11941. Springer, Cham.

Kushwah, A.P., Jaloree, S., & Thakur, R.S. (2021). A Comparative Review of Incremental Clustering Methods for Large Dataset. International Journal of Advanced Trends in Computer Science and Engineering. Volume 10, No.2.

Özyer, T., & Alhajj, R. (2009). Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer. Applied Intelligence, 31, 318-331.

Zhao, W., Li, L., Alam, S., Wang, Y. (2021). An incremental clustering method for anomaly detection in flight data, Transportation Research Part C: Emerging Technologies, Volume 132, 2021,103406.

Zhu, W., Yu, W., Kan, B., & Liu, G. (2017). Smart Meter Data Analytics Based on Modified Streaming k- Means. 2017 3rd International Conference on Big Data Computing and Communications (BIGCOM), 328-333. https://doi.org/10.1109/BIGCOM.2017.49.

Chen, L., & Cao, L. (2018). Learning incremental model-based online clustering. IEEE Transactions on Neural Networks and Learning Systems, 30(6), 1795–1809.

Qian, W., Cao, F., & Ester, M. (2016). Incremental density-based clustering over evolving data streams with concept drift. IEEE Transactions on Knowledge and Data Engineering, 28(7), 1729–1743.

Chakraborty, S., Nagwani, N.K. (2011). Analysis and Study of Incremental K-Means Clustering Algorithm. In: Mantri, A., Nandi, S., Kumar, G., Kumar, S.(eds) High Performance Architecture and GridComputing. HPAGC 2011. Communications in Computer and Information Science, vol 169. Springer, Berlin.

Al-Sai, Z. & Abdullah, R. & Husin, H. (2019). Big Data Impacts and Challenges: A Review. In Proceedings of conference: 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman- Jordan DOI:10.1109/JEEIT.2019.8717484

Alsaeedi, H. and Alhegami, A. S., (2022). An Incremental Interesting Maximal Frequent Itemset Mining Based on FP-Growth Algorithm ", Journal of Complexity, Volume 2022, Article ID 1942517.

Alhegami, A. S., &Alsaeedi, H. (2020). A framework for incremental parallel mining of interesting association patterns for big data. International Journal of Computing, 19(1), 106-117.

Lu, W. (2020). Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework. Journal of Grid Computing, 18, 239-250.

Reddy, K., & Bindu, C. (2019). StreamSW: A density-based approach for clustering data streams over sliding windows. Measurement. https://doi.org/10.1016/J.MEASUREMENT.2018.11.041.

Mudassirm K. & Aadarsh, M. & Mahtab, A. (2019). Map Reduce Clustering in Incremental Big Data Processing. International Journal of Innovative Technology and Exploring Engineering 9(2):2278-3075. 2019.

Hayatu, I., H., Mohammed, A., BarroonIsma’eel, A. (2021). Big Data Clustering Techniques: Recent Advances and Survey. In: Chiroma, H., Abdulhamid, S.M., Fournier-Viger, P., Garcia, N.M. (eds) Machine Learning and Data Mining for Emerging Trend in Cyber Dynamics. Springer, Cham. https://doi.org/10.1007/978-3-030-66288-2_3

Fawzia Omer, A., Mohammed, H.A., Awadallah, M.A., Khan, Z., Abrar, S.U., Shah, M.D. (2022). Big Data Mining Using K-Means and DBSCAN Clustering Techniques. In: Ouaissa, M., Boulouard, Z., Ouaissa, M., Khan, I.U., Kaosar, M. (eds) Big Data Analytics and Computational Intelligence for Cybersecurity. Studies in Big Data, vol 111. Springer.

Sahoo, S. (2017). A Parallel Forecasting Approach Using Incremental K-means Clustering Technique. In:Behera, H., Mohapatra, D. (eds) Computational Intelligence in Data Mining. Avances in Intelligent Systems and Computing, vol 556. Springer, Singapore.

Bakr, A., Ghanem, N., & Ismail, M. (2015). Efficient incremental density-based Algorithm for clustering large datasets. Alexandria engineering journal, 54, 1147-1154.

Chakraborty, S., &Nagwani, N. (2011). Analysis and Study of Incremental K-Means Clustering Algorithm. , 338-341. https://doi.org/10.1007/978-3-642-22577-2_46.

Liberty, E., Sriharsha, R., & Sviridenko, M., (2014). An Algorithm for Online K-Means Clustering, 81-89 , https://doi.org/10.1137/1.9781611974317.7.

Liberty, Edo & Sriharsha, Ram & Sviridenko, Maxim. (2014). An Algorithm for Online K-Means Clustering. 10.1137/1.9781611974317.7.

Li, Y., Macready, W. G., & Chen, G. (2015). Streaming k-means approximation. In Proceedings of the 2015 SIAM International Conference on Data Mining (SDM'15) (pp. 189-197).

Cao, Feng & Ester, Martin & Qian, Weining& Zhou, Aoying. (2006). Density-Based Clustering over an Evolving Data Stream with Noise. In Proceedings of the 2006 SIAM International Conference on Data Mining (SDM)

Charu C. Aggarwal, Philip S. Yu, Jiawei Han, Jianyong Wang, (2003). A Framework for Clustering Evolving Data Streams, In Proceeding of 29th International Conference on Very Large Data Bases, VLDB 2003

Al-Khamees, H. Al-A’araji, A., N. and Al-Shamery, E. S., (2021). Survey: Clustering Techniques of Data Stream," 2021 1st Babylon International Conference on Information Technology and Science (BICITS), Babil, Iraq, 2021, pp. 113-119, doi: 10.1109/BICITS51482.2021.9509923.

Amini, M, Saboohi, H., Herawan, T., Wah, Y. (2016). MuDi-Stream: A multi density clustering algorithm for evolving data stream, Journal of Network and Computer Applications, Volume 59, 2016,Pages 370-385

Azhir, E., Jafari, N., N., Hosseinzadeh, M., Sharifi, A., and Darwesh, A., (2021). An efficient automated incremental density-based algorithm for clustering and classification", Future Generation Computer Systems,Volume 114, Pages 665-678

Shao, J., Tan, Y., Gao, L., Yang, Q., Plant, C., & Assent, I. (2019). Synchronization-based clustering on evolving data stream. Inf. Sci., 501, 573-587. https://doi.org/10.1016/J.INS.2018.09.035.

Madan, S., & Dana, K. (2015). m-BIRCH: An online clustering approach for computer vision applications., 9408. https://doi.org/10.1117/12.2078264.

Fu, JS., Liu, Y. & Chao, HC. (2015). ICA: An Incremental Clustering Algorithm Based on OPTICS. Wireless Pers Commun84, 2151–2170 https://doi.org/10.1007/s11277-015-2517-9

Abiodun, I., & Absalom, E., & Laith, A., & Belal, A., & Jia, H. (2022). K-means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data. Information Sciences. 622.10.1016/j.ins.2022.11.139.

Mulay, P., & Kulkarni, P. (2013). Knowledge augmentation via incremental clustering: new technology for effective knowledge management. Int. J. Bus. Inf. Syst., 12, 68

Tran, T., Nayak, R., Bruza, P. (2008). Document Clustering Using Incremental and Pairwise Approaches. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds) Focused Access to XML Documents. INEX 2007. Lecture Notes in Computer Science, vol 4862. Springer, Berlin, Heidelberg.

Sowjanya, A. M., & Shashi, M. (2011). A cluster feature-based incremental clustering approach to mixed data. Journal of Computer Science, 7(12), 1875.

Zhao, W., Ma, H., He, Q. (2009). Parallel K-Means Clustering Based on MapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds) Cloud Computing. CloudCom 2009. Lecture Notes in Computer Science, vol 5931. Springer, Berlin, Heidelberg

Son, L.H., Tien, N.D. (2017).Tune Up Fuzzy C-Means for Big Data: Some Novel Hybrid Clustering Algorithms Based on Initial Selection and Incremental Clustering. Int. J. Fuzzy Syst. 19, 1585–1602.

Najdataei, H., Gulisano, V., Tsigas, P., &Papatriantafilou, M. (2022). pi-Lisco: parallel and incremental stream-based point-cloud clustering. Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing.

Downloads

Published

12.06.2024

How to Cite

Ahmed S. Al-Hegami. (2024). An Approach for Incremental Parallel Mining of Interesting Clustering Patterns in Big Data. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 4668–4681. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/7164

Issue

Section

Research Article