Proposal of Machine Learning Approach for Identification of Instant Messaging Applications in Raw Network Traffic

Abdurrahman Pektaş

doi:10.18201/ijisae.2018642060

Authors

Abdurrahman Pektaş Galatasaray University

DOI:

https://doi.org/10.18201/ijisae.2018642060

Keywords:

Encrypted Traffic Identification, Network flow, Security, Machine Learning, Network Forensics

Abstract

Identification of Internet protocol from either raw network traffic or either network flows plays a crucial role at maintaining and improving the security of computer systems. A significant amount of research is carried out while exploiting a variety of identification techniques. Although certain level in success at detection of network protocols for unencrypted traffic has been achieved, accuracy and performance is rather poor for encrypted traffic. Considering technological trends, new and existing applications have been adopted to use encryption mechanism to protect information and privacy. Therefore, classification of encrypted network traffic is mandatory for ensuring security. Moreover, while performing network forensic investigation, labelling of network protocols/applications is a must to accomplish. In this study, we propose a method to automatically identify instant messaging applications from raw network traffic. To this end, we first extract flow based static features from network capture and then apply machine learning algorithms. The proposed method is evaluated with fairly large dataset. The dataset compromise of publicly available NISM dataset and the network traffic of 9 popular instant messaging applications collected in a controlled environment. The dataset overall contains 716607network flows belonging to 20 application categories. The proposed method classifies network flows of instant messaging applications into their corresponding application categories with the accuracy over 0.99 and F1-score of 0.99.

Downloads

Download data is not yet available.

Author Biography

Abdurrahman Pektaş, Galatasaray University

Computer Engineering Department

References

A. W. Moore and D. Zuev, “Internet traffic classification using bayesian analysis techniques,” ACM SIGMETRICS Performance Evaluation Review., vol. 33, pp. 50-60, 2005.

C. V Wright, F. Monrose, and G. M. Masson, “On inferring application protocol behaviors in encrypted network traffic,” Journal of Machine Learning Research, vol. 7, pp. 2745-2769, 2006.

R. Alshammari and A. N. Zincir-Heywood, “Machine learning based encrypted traffic classification: Identifying ssh and skype”, CISDA, vol. 9, pp. 289-296, 2009.

R. Alshammari and A. N. Zincir-Heywood, “Can encrypted traffic be identified without port numbers, IP addresses and payload inspection?” Computer networks, vol. 55, no.6, pp. 1326-1350, 2011.

Calculating Flow Statistics Using NetMate, 2017. [Online], Available: https://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/ . Accessed on: Jan15, 2017.

D. J. Arndt and A N. Zincir-Heywood, “A comparison of three machine learning techniques for encrypted network traffic analysis,” In Proc. IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2011, pp. 107-114.

Y. Okada, S. Ata, N. Nakamura, Y. Nakahira, and I. Oka, “Comparisons of machine learning algorithms for application identification of encrypted traffic,”. In Proc. Machine Learning and Applications and Workshops (ICMLA), 2011, pp. 358-361.

Github repo containing the source code and the dataset of this work, 2017, [Online], Available: https://gitlab.com/apektas/instant_messaging_app_identification. Accessed on: Feb-12, 2017.

P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine learning, vol. 63, no. 1, pp. 3-42, 2006.

NIMS1 data set, 2017, [Online], Available: https://projects.cs.dal.ca/projectx/data/NIMS.arff.zip. Accessed on: Jan-15, -2017.

H. Yu, F. Huang, and C. Lin, “Dual coordinate descent methods for logistic regression and maximum entropy models,” Machine Learning, vol. 85, no.1, pp.41-75, 2011.

M. Schmidt, N. L. Roux, and F. Bach, “Minimizing finite sums with the stochastic average gradient,” Mathematical Programming, pp. 1-30, 2013.

T. Wu, C. Lin, and R. C. Weng, “Probability estimates for multiclass classification by pairwise coupling,” Journal of Machine Learning Research, vol. 5, pp.975-1005, 2004.

L. Breiman,. “Random forests,” Machine learning, vol. 45, no. 1, pp. 5-32, 2001.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, “Scikit-learn: Machine learning in python,”. Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.

Scikit-learn: machine learning in Python, 2017, [Online], Available: http://scikit-learn.org/stable/index.html, Accessed on: Mar-15, 2017.