Capsule Neural Network and Determinantal Point Process (CAPSDPP) based Summarization of Surveillance Videos
Keywords:
Capsule Neural Network; Determinantal point process; Keyframes; Redundancy; Segmentation; Summarization; Shot; Video surveillancAbstract
Seamless deployment and the low cost of surveillance cameras have benefited various agencies, such as schools, colleges, airports, railway stations, and shopping malls. However, the data generated by these cameras is enormous. Accessing a specific clip requires users to invest time and energy in watching the entire video. Video summarization aims to produce a brief and comprehensive depiction of the essential content of video. The information can be presented using keyframes or video summaries, avoiding redundancy and emphasizing important and varied segments. Constraints such as finite computational capacity and bandwidth restriction limit the availability of resources at the edge. This work proposes a lightweight model based on the Capsule neural network (CapsNet) to summarize surveillance videos. Capsule neural networks are employed to extract spatiotemporal features that capture both motion and visual information. Deep CapsNet features are utilized for shot segmentation. A determinantal point process (DPP) selects diverse keyframes within segmented shots. We assessed the effectiveness of the proposed method with benchmark datasets from the Open Video Project (OVP) and YouTube (YT). Our findings illustrate that the proposed approach surpasses the performance of existing methodologies.
Downloads
References
K. Budati, S. Islam, M. K. Hasan, N. Safie, N. Bahar, and T. M. Ghazal, “Optimized Visual Internet of Things for Video Streaming Enhancement in 5G Sensor Network Devices,” Sensors, vol. 23, no. 11, p. 5072, May 2023, doi: 10.3390/s23115072.
R. Arunachalam, G. Sunitha, S. K. Shukla, S. N. pandey, S. Urooj, and S. Rawat, “A smart Alzheimer’s patient monitoring system with IoT-assisted technology through enhanced deep learning approach,” Knowl Inf Syst, vol. 65, no. 12, pp. 5561–5599, Dec. 2023, doi: 10.1007/s10115-023-01890-x.
D. J. Cassidy et al., “#SurgEdVidz: Using Social Media to Create a Supplemental Video-Based Surgery Didactic Curriculum,” Journal of Surgical Research, vol. 256, pp. 680–686, Dec. 2020, doi: 10.1016/j.jss.2020.04.004.
D. M. Davids, A. A. E. Raj, and C. S. Christopher, “Hybrid multi scale hard switch YOLOv4 network for cricket video summarization,” Wireless Networks, vol. 30, no. 1, pp. 17–35, Jan. 2024, doi: 10.1007/s11276-023-03449-8.
Singh and M. Kumar, “Bayesian fuzzy clustering and deep CNN-based automatic video summarization,” Multimed Tools Appl, vol. 83, no. 1, pp. 963–1000, Jan. 2024, doi: 10.1007/s11042-023-15431-9.
G. Yasmin, S. Chowdhury, J. Nayak, P. Das, and A. K. Das, “Key moment extraction for designing an agglomerative clustering algorithm-based video summarization framework,” Neural Comput Appl, vol. 35, no. 7, pp. 4881–4902, Mar. 2023, doi: 10.1007/s00521-021-06132-1.
Muhammad, B. Sadiq, I. Umoh, and H. Bello Salau, “A K-Means Clustering Approach for Extraction of Keyframes in Fast- Moving Videos,” pp. 147–157, Jul. 2020.
A. Pandian and S. Maheswari, “A keyframe selection for summarization of informative activities using clustering in surveillance videos,” Multimed Tools Appl, vol. 83, no. 3, pp. 7021–7034, Jan. 2024, doi: 10.1007/s11042-023-15859-z.
DeMenthon, V. Kobla, and D. Doermann, “Video summarization by curve simplification,” in Proceedings of the sixth ACM international conference on Multimedia - MULTIMEDIA ’98, New York, New York, USA: ACM Press, 1998, pp. 211–218. doi: 10.1145/290747.290773.
K. Muhammad, T. Hussain, and S. W. Baik, “Efficient CNN based summarization of surveillance videos for resource-constrained devices,” Pattern Recognit Lett, vol. 130, pp. 370–375, Feb. 2020, doi: 10.1016/j.patrec.2018.08.003.
G. Balamurugan and J. Jayabharathy, “Abnormal Event Detection using Additive Summarization Model for Intelligent Transportation Systems.” [Online]. Available: www.ijacsa.thesai.org
Sabha and A. Selwal, “Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directions,” Multimed Tools Appl, vol. 82, no. 21, pp. 32635–32709, Sep. 2023, doi: 10.1007/s11042-023-14925-w.
S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic Routing Between Capsules,” in 31st Conference on Neural Information Processing Systems (NIPS 2017, Long Beach, CA, USA, 2017.
W. Pauli, “The Connection Between Spin and Statistics,” Physical Review, vol. 58, no. 8, pp. 716–722, Oct. 1940, doi: 10.1103/PhysRev.58.716.
B. Gong, W.-L. Chao, K. Grauman, and S. Fei, “Diverse Sequential Subset Selection for Supervised Video Summarization,” in Advances in neural information processing systems , 2014.
R. H. Affandi, E. B. Fox, R. P. Adams, and B. Taskar, “Learning the Parameters of Determinantal Point Process Kernels.”
Kulesza and B. Taskar, “Structured Determinantal Point Processes,” in Neural Information Processing Systems, 2010. [Online]. Available: https://api.semanticscholar.org/CorpusID:13192203
Kulesza, “Determinantal Point Processes for Machine Learning,” Foundations and Trends® in Machine Learning, vol. 5, no. 2–3, pp. 123–286, 2012, doi: 10.1561/2200000044.
G. Geisler and G. Marchionini, “The open video project,” in The open video project. Proceedings of the Fifth ACM Conference on Digital Libraries, Association for Computing Machinery (ACM), Jun. 2000, pp. 258–259. doi: 10.1145/336597.336693.
M. Furini, F. Geraci, M. Montangero, and M. Pellegrini, “STIMO: STIll and MOving video storyboard for the web scenario,” Multimed Tools Appl, vol. 46, no. 1, pp. 47–69, Jan. 2010, doi: 10.1007/s11042-009-0307-7.
S. E. F. De Avila, A. P. B. Lopes, A. Da Luz, and A. De Albuquerque Araújo, “VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method,” Pattern Recognit Lett, vol. 32, no. 1, pp. 56–68, 2011, doi: 10.1016/j.patrec.2010.08.004.
M. Fei, W. Jiang, and W. Mao, “Memorable and rich video summarization,” J Vis Commun Image Represent, vol. 42, pp. 207–217, Jan. 2017, doi: 10.1016/j.jvcir.2016.12.001.
Yu-Chyeh Wu, Yue-Shi Lee, and Chia-Hui Chang, “VSUM: Summarizing from Videos,” in IEEE Sixth International Symposium on Multimedia Software Engineering, IEEE, pp. 302–309. doi: 10.1109/MMSE.2004.90.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.