Discovery of Frequent Itemsets in Distributed Datasets with Reduced Communication Overhead
Keywords:
Distributed data mining algorithm, Communication Scheme, Apriori and FP-growth algorithms, Performance AnalysisAbstract
The identification of frequent itemsets is a vital and complex task in data mining. Conventional approaches for mining frequent itemsets in distributed datasets typically involve considerable communication overhead. To address this, this paper introduces an effective strategy aimed at optimizing communication times in extensive datasets. We provide a novel scheme strategy to reduce the frequency of communications and synchronizations needed for computing global frequent itemsets. We propose an algorithm, named Efficient Frequent Itemsets Finding (EFIF), which uncovers frequent itemsets at slave nodes within distributed environments. Our algorithm efficiently produces significant frequent itemsets by employing an effective candidate pruning technique. Two datasets with varying characteristics and complexities were selected to evaluate the efficiency and effectiveness of the EFIF algorithm in generating distributed frequent itemsets. This comprehensive assessment demonstrates the algorithm's performance across different scenarios. We compared the performance of the EFIF algorithm with the Apriori and FP-growth algorithms using a novel scheme strategy. Experimental results indicate that EFIF surpasses both Apriori and FP-growth in terms of communication and computation costs.
Downloads
References
R. Agrawal, T. Imieliński et A. Swami, “ Mining association rules between sets of items in large databases “, ACM SIGMOD Rec., vol. 22, no. 2, pp. 207–216, 1993, doi: 10.1145/170036.170072.
R. Agrawal, R. Srikant, “ Fast algorithms for mining association rules,” in Proc. 20th int. conf. very large databases, VLDB. Vol. 1215, 1994, pp. 487-499,doi:10.5555/645920.
J. Han, J. Pei et Y. Yin, “ Mining frequent patterns without candidate generation “, ACM SIGMOD Rec., vol. 29, no. 2, pp. 1–12, 2000, doi: 10.1145/335191.335372.
R. Agrawal et J. C. Shafer, “Parallel mining of association rules “, IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 962–969, 1996, doi :10.1109/69.553164.
J. S. Park, M.-S. Chen et P. S. Yu, “An effective hash-based algorithm for mining association rules “, ACM SIGMOD Rec., vol. 24, no. 2, pp. 175–186, 1995,doi:10.1145/568271.223813.
D. W. Cheung, Jiawei Han, V. T. Ng, A. W. Fu and Yongjian Fu, “A fast distributed algorithm for mining association rules, “ Fourth International Conference on Parallel and Distributed Information Systems, Miami Beach, FL, USA, 1996, pp. 31-42, doi: 10.1109/PDIS.1996.568665.
T. Shintani and M. Kitsuregawa, “Hash based parallel algorithms for mining association rules, “ Fourth International Conference on Parallel and Distributed Information Systems, Miami Beach, FL, USA, 1996, pp. 19-30, doi: 10.1109/PDIS.1996.568664.
E.-H. Han, G. Karypis et V. Kumar, “Scalable parallel data mining for association rules “, ACM SIGMOD Rec., vol. 26, no.2, pp. 277–288, 1997,doi:10.1145/253262.253330.
B. Mudumba et M. F. Kabir, “Mine-first association rule mining: An integration of independent frequent patterns in distributed environments “, Decis. Analytics J., pp. 100434, 2024, doi :10.1016/j.dajour.2024.100434.
K. Samudrala, J. Kolisetty, A. S. Chakravadhanula, B. Preetham and R. Senapati, “Novel Distributed Architecture for Frequent Pattern Mining using Spark Framework, “ 2023 3rd International Conference on Intelligent Technologies (CONIT), Hubli, India, 2023, pp. 1-5, doi: 10.1109/CONIT59222.2023.10205903.
J. Martin-Prin, I. O.Dlala, N.Travers, and S. Jabbour, “A Distributed SAT-Based Framework for Closed Frequent Itemset Mining“, In International Conference on Advanced Data Mining and Applications. Cham: Springer Nature Switzerland, November.2022, pp. 419-433, doi: 10.1007/978-3-031-22137-8_31.
A. Sahoo and R. Senapati, “A Novel Approach for Distributed Frequent Pattern Mining Algorithm using Load-Matrix, “ 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India, 2021, pp. 1-5, doi: 10.1109/CONIT51480.2021.9498411.
F. S. C. Tseng, Y.-H. Kuo et Y.-M. Huang, “Toward boosting distributed association rule mining by data de-clustering “, Inf. Sci., vol. 180, no 22, pp. 4263–4289,2010, doi:10.1016/j.ins.2010.07.020.
A. Vasoya et N. Koli, “Mining of Association Rules on Large Database Using Distributed and Parallel Computing “, Procedia Comput. Sci., vol. 79, pp. 221–230, 2016, doi: 10.1016/j.procs.2016.03.029.
T. Tassa, “Secure Mining of Association Rules in Horizontally Distributed Databases, “ in IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 4, pp. 970-983, April 2014, doi: 10.1109/TKDE.2013.41.
M. Z. Ashrafi, D. Taniar and K. Smith, “ODAM: An optimized distributed association rule mining algorithm, “ in IEEE Distributed Systems Online, vol. 5, no. 3, 2004, doi: 10.1109/MDSO.2004.1285877.
B. Goethals, M. J. Zaki, “FIMI’03: Workshop on frequent itemset mining implementations“, In Third IEEE International Conference on Data Mining Workshop on Frequent Itemset Mining Implementations,2003, pp.1–13, doi: 10.1145/1007730.1007744.
P. Fournier-viger, A. Gomariz, T. Gueniche, A. Soltani, C. Wu et al., “Spmf: a java open-source pattern mining library“, The Journal of Machine Learning Research, vol.15, pp.3389-3393, 2014, doi : 10.1007/978-3-319-46131-1_8.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.