Discovery of Frequent Itemsets in Distributed Datasets with Reduced Communication Overhead

Authors

  • Houda Essalmi, Anass El Affar

Keywords:

Distributed data mining algorithm, Communication Scheme, Apriori and FP-growth algorithms, Performance Analysis

Abstract

The identification of frequent itemsets is a vital and complex task in data mining. Conventional approaches for mining frequent itemsets in distributed datasets typically involve considerable communication overhead. To address this, this paper introduces an effective strategy aimed at optimizing communication times in extensive datasets. We provide a novel scheme strategy to reduce the frequency of communications and synchronizations needed for computing global frequent itemsets. We propose an algorithm, named Efficient Frequent Itemsets Finding (EFIF), which uncovers frequent itemsets at slave nodes within distributed environments. Our algorithm efficiently produces significant frequent itemsets by employing an effective candidate pruning technique. Two datasets with varying characteristics and complexities were selected to evaluate the efficiency and effectiveness of the EFIF algorithm in generating distributed frequent itemsets. This comprehensive assessment demonstrates the algorithm's performance across different scenarios. We compared the performance of the EFIF algorithm with the Apriori and FP-growth algorithms using a novel scheme strategy. Experimental results indicate that EFIF surpasses both Apriori and FP-growth in terms of communication and computation costs.

Downloads

Download data is not yet available.

References

R. Agrawal, T. Imieliński et A. Swami, “ Mining association rules between sets of items in large databases “, ACM SIGMOD Rec., vol. 22, no. 2, pp. 207–216, 1993, doi: 10.1145/170036.170072.

R. Agrawal, R. Srikant, “ Fast algorithms for mining association rules,” in Proc. 20th int. conf. very large databases, VLDB. Vol. 1215, 1994, pp. 487-499,doi:10.5555/645920.

J. Han, J. Pei et Y. Yin, “ Mining frequent patterns without candidate generation “, ACM SIGMOD Rec., vol. 29, no. 2, pp. 1–12, 2000, doi: 10.1145/335191.335372.

R. Agrawal et J. C. Shafer, “Parallel mining of association rules “, IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 962–969, 1996, doi :10.1109/69.553164.

J. S. Park, M.-S. Chen et P. S. Yu, “An effective hash-based algorithm for mining association rules “, ACM SIGMOD Rec., vol. 24, no. 2, pp. 175–186, 1995,doi:10.1145/568271.223813.

D. W. Cheung, Jiawei Han, V. T. Ng, A. W. Fu and Yongjian Fu, “A fast distributed algorithm for mining association rules, “ Fourth International Conference on Parallel and Distributed Information Systems, Miami Beach, FL, USA, 1996, pp. 31-42, doi: 10.1109/PDIS.1996.568665.

T. Shintani and M. Kitsuregawa, “Hash based parallel algorithms for mining association rules, “ Fourth International Conference on Parallel and Distributed Information Systems, Miami Beach, FL, USA, 1996, pp. 19-30, doi: 10.1109/PDIS.1996.568664.

E.-H. Han, G. Karypis et V. Kumar, “Scalable parallel data mining for association rules “, ACM SIGMOD Rec., vol. 26, no.2, pp. 277–288, 1997,doi:10.1145/253262.253330.

B. Mudumba et M. F. Kabir, “Mine-first association rule mining: An integration of independent frequent patterns in distributed environments “, Decis. Analytics J., pp. 100434, 2024, doi :10.1016/j.dajour.2024.100434.

K. Samudrala, J. Kolisetty, A. S. Chakravadhanula, B. Preetham and R. Senapati, “Novel Distributed Architecture for Frequent Pattern Mining using Spark Framework, “ 2023 3rd International Conference on Intelligent Technologies (CONIT), Hubli, India, 2023, pp. 1-5, doi: 10.1109/CONIT59222.2023.10205903.

J. Martin-Prin, I. O.Dlala, N.Travers, and S. Jabbour, “A Distributed SAT-Based Framework for Closed Frequent Itemset Mining“, In International Conference on Advanced Data Mining and Applications. Cham: Springer Nature Switzerland, November.2022, pp. 419-433, doi: 10.1007/978-3-031-22137-8_31.

A. Sahoo and R. Senapati, “A Novel Approach for Distributed Frequent Pattern Mining Algorithm using Load-Matrix, “ 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India, 2021, pp. 1-5, doi: 10.1109/CONIT51480.2021.9498411.

F. S. C. Tseng, Y.-H. Kuo et Y.-M. Huang, “Toward boosting distributed association rule mining by data de-clustering “, Inf. Sci., vol. 180, no 22, pp. 4263–4289,2010, doi:10.1016/j.ins.2010.07.020.

A. Vasoya et N. Koli, “Mining of Association Rules on Large Database Using Distributed and Parallel Computing “, Procedia Comput. Sci., vol. 79, pp. 221–230, 2016, doi: 10.1016/j.procs.2016.03.029.

T. Tassa, “Secure Mining of Association Rules in Horizontally Distributed Databases, “ in IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 4, pp. 970-983, April 2014, doi: 10.1109/TKDE.2013.41.

M. Z. Ashrafi, D. Taniar and K. Smith, “ODAM: An optimized distributed association rule mining algorithm, “ in IEEE Distributed Systems Online, vol. 5, no. 3, 2004, doi: 10.1109/MDSO.2004.1285877.

B. Goethals, M. J. Zaki, “FIMI’03: Workshop on frequent itemset mining implementations“, In Third IEEE International Conference on Data Mining Workshop on Frequent Itemset Mining Implementations,2003, pp.1–13, doi: 10.1145/1007730.1007744.

P. Fournier-viger, A. Gomariz, T. Gueniche, A. Soltani, C. Wu et al., “Spmf: a java open-source pattern mining library“, The Journal of Machine Learning Research, vol.15, pp.3389-3393, 2014, doi : 10.1007/978-3-319-46131-1_8.

Downloads

Published

14.08.2024

How to Cite

Houda Essalmi. (2024). Discovery of Frequent Itemsets in Distributed Datasets with Reduced Communication Overhead. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 2572 –. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/6684

Issue

Section

Research Article

Similar Articles

You may also start an advanced similarity search for this article.