NIOJS: A Novel Intelligent Model Based on Optimal Jumps for Creating Data Sampling from Big Dataset

Authors

  • Mohammed Mohammed Zayed, Fadl Mutaher Ba-Alwi, Hiba ALMarwi, Gheleb H. AL-Gaphari

Keywords:

Big data sampling, Cluster sampling, DBSCAN, NIOJS , Samples, Optimal Jump

Abstract

The pervasiveness of big data has revolutionized the landscape of information technology (IT), offering a wealth of insights and opportunities for various sectors, including healthcare, education, and the Internet of Things (IoT). However, the sheer volume and complexity of big data pose challenges in extracting meaningful knowledge. To address this, we propose a novel model for optimal sample selection, enabling efficient extraction of representative subsets from big data. The proposed model, based on optimal jumps, dynamically adapts the clustering process to enhance the efficiency of data sampling. We employ the Adjusted Rand Index (ARI) to evaluate the similarity between clusters and guide the selection of new data in each iteration This model holds the potential to significantly enhance the utilization of big data while reducing computational demands. The proposed could run on big datasets and the samples taken represents the dataset. 

Downloads

Download data is not yet available.

References

Deng, Dingsheng. "DBSCAN clustering algorithm based on density." 2020

7th international forum on electrical engineering and automation (IFEEA). IEEE, 2020.

Warrens, Matthijs J., and Hanneke van der Hoef. "Understanding the adjusted rand index and other partition comparison indices based on counting object pairs." Journal of Classification 39.3 (2022): 487-509.

Chacón, José E., and Ana I. Rastrojo. "Minimum adjusted Rand index for two clusterings of a given size." Advances in Data Analysis and Classification 17.1 (2023): 125-133.

de Moura Ventorim, Igor, et al. "BIRCHSCAN: A sampling method for applying DBSCAN to large datasets." Expert Systems with Applications 184 (2021): 115518.

Ros, Frédéric, and Serge Guillaume. "DENDIS: A new density-based sampling for clustering algorithm." Expert Systems with Applications 56 (2016): 349-359.

Ros, Frédéric, and Serge Guillaume. "DIDES: a fast and effective sampling for clustering algorithm." Knowledge and information systems 50 (2017): 543-568.

Zhu, Lu, et al. "Improvement of DBSCAN algorithm based on adaptive Eps parameter estimation." Proceedings of the 2018 international conference on algorithms, computing and artificial intelligence. 2018.

Xianting, Qi, and Wang Pan. "A density-based clustering algorithm for high-dimensional data with feature selection." 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII). IEEE, 2016.

Alwosheel, Ahmad, Sander van Cranenburgh, and Caspar G. Chorus. "Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis." Journal of choice modelling 28 (2018): 167-182.

Silva, José, Bernardete Ribeiro, and Andrew H. Sung. "Finding the critical sampling of big datasets." Proceedings of the Computing Frontiers Conference. 2017.

Luchi, Diego, Alexandre Loureiros Rodrigues, and Flávio Miguel Varejão. "Sampling approaches for applying DBSCAN to large datasets." Pattern Recognition Letters 117 (2019): 90-96.

Berndt, Andrea E. "Sampling methods." Journal of Human Lactation 36.2 (2020): 224-226.

Li, Mingyang, et al. "A method of two-stage clustering learning based on improved DBSCAN and density peak algorithm." Computer Communications 167 (2021): 75-84.

Iliyasu, R., & Etikan, I. (2021). Comparison of quota sampling and stratified random sampling. Biom. Biostat. Int. J. Rev, 10(1), 24-27.

Sharma, Gaganpreet. "Pros and cons of different sampling techniques." International journal of applied research 3, no. 7 (2017): 749-752.

Stratton, Samuel J. "Population research: convenience sampling strategies." Prehospital and disaster Medicine 36, no. 4 (2021): 373-374.

Berndt, Andrea E. "Sampling methods." Journal of Human Lactation 36, no. 2 (2020): 224-226.

Mahmud, Mohammad Sultan, Joshua Zhexue Huang, Salman Salloum, Tamer Z. Emara, and Kuanishbay Sadatdiynov. "A survey of data partitioning and sampling methods to support big data analysis." Big Data Mining and Analytics 3, no. 2 (2020): 85-101.

Pandey, Kamlesh Kumar, and Diwakar Shukla. "Stratified sampling-based data reduction and categorization model for big data mining." In Communication and Intelligent Systems: Proceedings of ICCIS 2019, pp. 107-122. Springer Singapore, 2020.

Djouzi, Kheyreddine, Kadda Beghdad-Bey, and Abdenour Amamra. "A new adaptive sampling algorithm for big data classification." Journal of Computational Science 61 (2022): 101653.

Hasanin, Tawfiq, Taghi M. Khoshgoftaar, Joffrey L. Leevy, and Richard A. Bauder. "Severely imbalanced big data challenges: investigating data sampling approaches." Journal of Big Data 6, no. 1 (2019): 1-25.

Pandey, Kamlesh Kumar, and Diwakar Shukla. "Euclidean distance stratified random sampling based clustering model for big data mining." Computational and Mathematical Methods 3, no. 6 (2021): e1206

Downloads

Published

13.11.2024

How to Cite

Mohammed Mohammed Zayed. (2024). NIOJS: A Novel Intelligent Model Based on Optimal Jumps for Creating Data Sampling from Big Dataset . International Journal of Intelligent Systems and Applications in Engineering, 12(4), 4290–4295. Retrieved from https://www.ijisae.org/index.php/IJISAE/article/view/7047

Issue

Section

Research Article