Fuzzy-Based Event Clustering for Semantic Load Shedding of Real-Time Data Streaming
Keywords:
Semantic Load Shedding, Fuzzy Clustering, K Nearest Neighbor (KNN), Apache Kafka, Real-Time Data StreamingAbstract
In real-time data stream processing, load shedding is used to manage data overload. Fuzzy-based event grouping and load shedding optimize Apache Kafka's performance in this study. This study presents a hybrid load-shedding strategy with high recall rates that retains the throughput and cost models needed to calculate the value of matched events to shed. It also shows that deleting a constant fraction of input events can reduce latency without losing recall. The study also shows that state-based methods had the highest recall rates and input-based procedures the highest throughput. As time slices become more significant, the hybrid technique, which employs four or more slices, is best for high recall rates and acceptable throughput. These findings can enhance machine learning algorithms and load-shedding tactics for many applications. This study is dynamic and will test the method's flexibility by employing automated algorithms to determine the system's ideal sampling rate. Workload, data flow, and resources comprise this environment
Downloads
References
N. Tatbul and S. Zdonik, “Window-aware load shedding for aggregation queries over data streams” in VLDB, vol. 6, 2006, pp. 799-810.
R. Guo et al., “Bioinformatics applications on Apache Spark,” GigaScience, vol. 7, no. 8, p. giy098, 2018. (doi:10.1093/gigascience/giy098).
R. Shree et al., “KAFKA: The modern platform for data management and analysis in the big data domain” in 2nd international conference on telecommunication and networks (TEL-NET). IEEE, 2017, pp. 1-5. (doi:10.1109/TEL-NET.2017.8343593).
A. Floratou et al., “Dhalion: Self-regulating stream processing in heron,”, Proc. VLDB Endow., vol. 10, no. 12, pp. 1825-1836, 2017. (doi:10.14778/3137765.3137786).
G. Van Dongen and D. Van den Poel, “Evaluation of stream processing frameworks,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 8, pp. 1845-1858, 2020. (doi:10.1109/TPDS.2020.2978480).
P. Le Noac’H et al., “A performance evaluation of Apache Kafka in support of big data streaming applications” in IEEE International Conference on Big Data (Big Data). IEEE, 2017, pp. 4803-4806. (doi:10.1109/BigData.2017.8258548).
B. R. Hiraman et al., “A study of Apache Kafka in big data stream processing” in International Conference on Information, Communication, Engineering and Technology (ICICET). IEEE, 2018, pp. 1-3. (doi:10.1109/ICICET.2018.8533771).
K. M. Thein, “Me. ‘Apache Kafka: next generation distributed messaging system.’,” Int. J. Sci. Eng. Technol. Research, vol. 3, no. 47, pp. 9478-9483, 2014.
Y. Chen et al., “Fast density peak clustering for large scale data based on kNN,” Knowl. Based Syst., vol. 187, p. 104824, 2020. (doi:10.1016/j.knosys.2019.06.032).
B. Mozafari and C. Zaniolo, “Optimal load shedding with aggregates and mining queries” in 26th International Conference on Data Engineering (ICDE 2010). IEEE. IEEE, 2010, pp. 76-88. (doi:10.1109/ICDE.2010.5447867).
B. Zhao et al., “Eires: Efficient integration of remote data in event stream processing” in Proc. 2021 International Conference on Management of Data, 2021, pp. 2128-2141. (doi:10.1145/3448016.3457304).
J. Bang et al., “Design and implementation of a load shedding engine for solving starvation problems in Apache Kafka” in Noms IEEE/IFIP Network Operations and Management Symposium, vol. 2018. IEEE, 2018, pp. 1-4. (doi:10.1109/NOMS.2018.8406306).
C. Basaran et al., “Adaptive load shedding via fuzzy control in data stream management systems” in Fifth IEEE International Conference on Service-Oriented Computing and Applications (SOCA). IEEE, 2012, pp. 1-8. (doi:10.1109/SOCA.2012.6449438).
X. Wang et al., “Fuzzy-clustering and fuzzy network based interpretable fuzzy model for prediction,” Sci. Rep., vol. 12, no. 1, p. 16279, 2022. (doi:10.1038/s41598-022-20015-y).
X. Liu et al., “Fuzzy clustering with semantic interpretation,” Appl. Soft Comput., vol. 26, pp. 21-30, 2015. (doi:10.1016/j.asoc.2014.09.037).
J. Xie et al., “Research on efficient fuzzy clustering method based on local fuzzy granular balls,” Arxiv e-Prints, 2023: arXiv-2303.
Y. Mi et al., “Fuzzy-based concept learning method: Exploiting data with fuzzy conceptual clustering,” IEEE Trans. Cybern., vol. 52, no. 1, pp. 582-593, 2022. (doi:10.1109/TCYB.2020.2980794).
B. Hayat et al., “A study on fuzzy logic-based cloud computing,” Clust. Comput., vol. 21, no. 1, pp. 589-603, 2018. (doi:10.1007/s10586-017-0953-x).
P. Maratha and K. Gupta, “Linear optimization and fuzzy-based clustering for WSNs assisted internet of things,” Multimedia Tool. Appl., vol. 82, no. 4, pp. 5161-5185, 2023. (doi:10.1007/s11042-021-11850-8).
B. Mozafari et al., “SnappyData: A unified cluster for streaming, transactions and interactive analytics” in CIDR, vol. 17, 2017, pp. 8-11.
N. Rivetti et al., “Load-aware shedding in stream processing systems” in Proc. 10th ACM International Conference on Distributed and Event-Based Systems, 2016, pp. 61-68. (doi:10.1145/2933267.2933311).
K. Tang et al., “DRS+: Load Shedding Meets Resource Auto-Scaling in Distributed Stream Processing” 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE. IEEE, 2020, pp. 292-301. (doi:10.1109/HPCC-SmartCity-DSS50907.2020.00036).
H.-Y. Wang et al., “A survey of fuzzy clustering validity evaluation methods,” Inf. Sci., vol. 618, 270-297, 2022. (doi:10.1016/j.ins.2022.11.010).
S. K. Jha et al., “A hybrid machine learning approach of fuzzy-rough-k-nearest neighbor, latent semantic analysis, and ranker search for efficient disease diagnosis,” J. Intell. Fuzzy Syst., vol. 42, no. 3, pp. 2549-2563, 2022. (doi:10.3233/JIFS-211820).
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.