Impact of Different Data Management Frameworks on Common Data Management Tasks in Information System (R Language Perspective)

Anant Prakash Awasthi, Niraj Kumar Singh,  Masood H. Siddiqui, Aanchal A. Awasthi

Authors

Anant Prakash Awasthi, Niraj Kumar Singh, Masood H. Siddiqui, Aanchal A. Awasthi

Keywords:

Memory Management in R, Performance in R, Native R, Tidyverse, Data.Table

Abstract

To maximize data processing and analysis, effective data management is essential. It ensures that data is efficiently processed, readily accessible, secure, and well-organized. This enhances data integrity, reduces the amount of redundancy, and it makes decision-making more prompt. In an era where data is a valued asset that drives innovation and strategic decision-making, effective data management techniques are essential.

The two essential data management activities for improving data processing are joining and sorting. By combining datasets based on common characteristics, joining makes thorough analysis easier. Sorting data well enhances search and retrieval. When combined, these processes enhance the accuracy and speed of data processing, simplifying workflows and enabling sound decision-making. Database management systems depend on joining and sorting to enable the creation of value, the extraction of significant insights, and the identification of trends from massive datasets.

The performance of native R, tidyverse, and data.table when merging data in R varies. Large datasets may cause Native R to lag, despite its versatility. Known for its readability, Tidyverse strikes a balance between performance and simplicity. Because of its exceptional speed, Data.table is a very effective option for large-scale data joins. The decision is based on the complexity and amount of the dataset. The best option for maximum performance, particularly for complex and large-scale jobs, is Data.table. Native R and Tidyverse work well with smaller, more manageable datasets when code readability is crucial. Every method addresses particular requirements in R data analysis. Similarly, when it comes to sorting data in R, Native R, tidyverse, and data.table behave differently. While Native R provides a standard method, it might not be as effective with larger datasets. Although readability is given priority in Tidyverse's user-friendly syntax, it may not be as fast as more efficient options. Once more, Data.table runs faster and uses less memory when sorting large amounts of data than the competition. The decision is based on the needs of the analysis: data.table for best performance, especially with large datasets and computationally intensive tasks; tidyverse for readability; and Native R for simplicity.

Hence, in order to sum up, effective data management is essential for businesses to fully utilize their data and make wise decisions. Optimizing data processing and analysis requires careful consideration of joining, sorting, and tool selection.

Downloads

Download data is not yet available.

References

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” _Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.

Dowle M, Srinivasan A (2021). _data.table: Extension of `data.frame`_. R package version 1.14.2, <https://CRAN.R-project.org/package=data.table>.

R Core Team. (2021). object.size: Estimate the Size of R Objects (R version 4.1.0). R Foundation for Statistical Computing. https://www.rdocumentation.org/packages/base/versions/4.1.0/topics/object.size

Wickham, H., & Csárdi, G. (2020). nycflights13: Flights that Departed NYC in 2013. R package version 1.1.0. https://CRAN.R-project.org/package=nycflights13

Müller, K., Wickham, H., & François, R. (2021). tibble: Simple Data Frames (R version 4.1.0). RStudio. https://tibble.tidyverse.org

Montgomery, D. C. (2017). Design and Analysis of Experiments. John Wiley & Sons.

Agresti, A., & Franklin, C. (2018). Statistics: The Art and Science of Learning from Data

Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. SAGE Publications

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis. John Wiley & Sons.

Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2004). Applied Linear Statistical Models. McGraw-Hill.

Draper, N. R., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). Wiley-Interscience.

Wickham, H. (2021). Memory. Advanced R. http://adv-r.had.co.nz/memory.html

Impact of Different Data Management Frameworks on Common Data Management Tasks in Information System (R Language Perspective)

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

ijisae

Information

trindex