VRSS: A Touch-to-Vision-Text-Audio Artificial Multi-Modal Sensory System to Demonstrate Neural Network Processes
Keywords:
Classification, Convolutional Neural Network (CNN), DIGIT sensor, multi-modal, touch, visionAbstract
The human brain is the most complicated human organ, and simulating its functionality is an exceedingly challenging task, particularly the multi-modal sensory functionalities of the brain. Results from biological experiments show that it is possible to identify instances of objects using tactile signals. This research uses similar concepts for modelling a multi-modal sensory input processing system for tactile inputs. VRSS is a novel touch-to-vision-to-text-to-audio system which simulates the multi-modal sensory behavior of the brain by converting tactile inputs to visual images, which are further converted to audio and text. The main aim of this research is to classify object instances based on tactile signals. Tactile inputs are captured and implicitly converted to visual inputs using the DIGIT sensor simulated in the TACTO simulator, and using them, the object is classified using Convolutional Neural Networks. The classification output is further converted into audio, thus successfully simulating three modalities - touch, vision, and sound. For construction of VRSS, multiple pretrained CNNs with different configurations of hyperparameters were tested, and the pretrained ConvNeXtTiny model had the best accuracy of them all - 91%. It was further modified, and the accuracy of the resulting custom VRSS CNN Model was found to be 95.83%. Following these results, this research will help in expanding the applicability of different CNNs. Along with this, it will also facilitate in-depth understanding of the human multi-modal sensory system, and also has wide scope in the fields of artificial intelligence and robotics, particularly in the navigation of uncharted territories.
Downloads
References
S. Luo, W. Yuan, E. Adelson, A. G. Cohn, and R. Fuentes, “ViTac: Feature Sharing between Vision and Tactile Sensing for Cloth Texture Recognition,” in Proceedings - IEEE International Conference on Robotics and Automation, 2018. doi: 10.1109/ICRA.2018.8460494.
S. Luo, J. Bimbo, R. Dahiya, and H. Liu, “Robotic tactile perception of object properties: A review,” Mechatronics, vol. 48. 2017. doi: 10.1016/j.mechatronics.2017.11.002.
B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen, “Adversarial cross-modal retrieval,” in MM 2017 - Proceedings of the 2017 ACM Multimedia Conference, 2017. doi: 10.1145/3123266.3123326.
[4] F. R. Hogan, M. Jenkin, S. Rezaei-Shoshtari, Y. Girdhar, D. Meger, and G. Dudek, “Seeing through your Skin: Recognizing objects with a novel visuotactile sensor,” in Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021, 2021. doi: 10.1109/WACV48630.2021.00126.
S. Sundaram, P. Kellnhofer, Y. Li, J. Y. Zhu, A. Torralba, and W. Matusik, “Learning the signatures of the human grasp using a scalable tactile glove,” Nature, vol. 569, no. 7758, 2019, doi: 10.1038/s41586-019-1234-z.
O. Ozioko and R. Dahiya, “Smart Tactile Gloves for Haptic Interaction, Communication, and Rehabilitation,” Advanced Intelligent Systems, vol. 4, no. 2, 2022, doi: 10.1002/aisy.202100091.
M. Altamirano Cabrera, J. Heredia, and D. Tsetserukou, “Tactile perception of objects by the user’s palm for the development of multi-contact wearable tactile displays,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020. doi: 10.1007/978-3-030-58147-3_6.
P. K. Murali, C. Wang, D. Lee, R. Dahiya, and M. Kaboli, “Deep Active Cross-Modal Visuo-Tactile Transfer Learning for Robotic Object Recognition,” IEEE Robot Autom Lett, vol. 7, no. 4, 2022, doi: 10.1109/LRA.2022.3191408.
F. Ito and K. Takemura, “A model for estimating tactile sensation by machine learning based on vibration information obtained while touching an object,” Sensors, vol. 21, no. 23, 2021, doi: 10.3390/s21237772.
X. Zhang et al., “Target classification method of tactile perception data with deep learning,” Entropy, vol. 23, no. 11, 2021, doi: 10.3390/e23111537.
S. Cai, K. Zhu, Y. Ban, and T. Narumi, “Visual-Tactile Cross-Modal Data Generation Using Residue-Fusion GAN with Feature-Matching and Perceptual Losses,” IEEE Robot Autom Lett, vol. 6, no. 4, 2021, doi: 10.1109/LRA.2021.3095925.
G. Rouhafzay, A. M. Cretu, and P. Payeur, “Transfer of learning from vision to touch: A hybrid deep convolutional neural network for visuo-tactile 3d object recognition,” Sensors (Switzerland), vol. 21, no. 1, 2021, doi: 10.3390/s21010113.
J. T. Lee, D. Bollegala, and S. Luo, “‘Touching to see’ and ‘seeing to feel’: Robotic cross-modal sensory data generation for visual-tactile perception,” in Proceedings - IEEE International Conference on Robotics and Automation, 2019. doi: 10.1109/ICRA.2019.8793763.
J. Lin, R. Calandra, and S. Levine, “Learning to identify object instances by touch: Tactile recognition via multimodal matching,” in Proceedings - IEEE International Conference on Robotics and Automation, 2019. doi: 10.1109/ICRA.2019.8793885.
X. Li, H. Liu, J. Zhou, and F. Sun, “Learning cross-modal visual-tactile representation using ensembled generative adversarial networks,” Cognitive Computation and Systems, vol. 1, no. 2, 2019, doi: 10.1049/ccs.2018.0014.
Y. Li, J. Y. Zhu, R. Tedrake, and A. Torralba, “Connecting touch and vision via cross-modal prediction,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019. doi: 10.1109/CVPR.2019.01086.
S. Pohtongkam and J. Srinonchat, “Tactile object recognition for humanoid robots using new designed piezoresistive tactile sensor and dcnn,” Sensors, vol. 21, no. 18, 2021, doi: 10.3390/s21186024.
P. Falco, S. Lu, A. Cirillo, C. Natale, S. Pirozzi, and D. Lee, “Cross-modal visuo-tactile object recognition using robotic active exploration,” in Proceedings - IEEE International Conference on Robotics and Automation, 2017. doi: 10.1109/ICRA.2017.7989619.
G. Izatt, G. Mirano, E. Adelson, and R. Tedrake, “Tracking objects with point clouds from vision and touch,” in Proceedings - IEEE International Conference on Robotics and Automation, 2017. doi: 10.1109/ICRA.2017.7989460.
M. Lambeta et al., “DIGIT: A Novel Design for a Low-Cost Compact High-Resolution Tactile Sensor with Application to In-Hand Manipulation,” IEEE Robot Autom Lett, vol. 5, no. 3, 2020, doi: 10.1109/LRA.2020.2977257.
S. Wang, M. Lambeta, P. W. Chou, and R. Calandra, “TACTO: A Fast, Flexible, and Open-Source Simulator for High-Resolution Vision-Based Tactile Sensors,” IEEE Robot Autom Lett, vol. 7, no. 2, 2022, doi: 10.1109/LRA.2022.3146945.
F. Rajeena P. P., A. S. U., M. A. Moustafa, and M. A. S. Ali, “Detecting Plant Disease in Corn Leaf Using EfficientNet Architecture—An Analytical Approach,” Electronics (Basel), vol. 12, no. 8, p. 1938, Apr. 2023, doi: 10.3390/electronics12081938.
P. K. Allen and K. S. Roberts, “Haptic object recognition using a multi-fingered dextrous hand,” in Proceedings - IEEE International Conference on Robotics and Automation, 1989. doi: 10.1109/robot.1989.100011.
S. Caselli, C. Magnanini, and F. Zanichelli, “On the robustness of haptic object recognition based on polyhedral shape representations,” in IEEE International Conference on Intelligent Robots and Systems, 1995. doi: 10.1109/iros.1995.526160.
Z. Pezzementi, C. Reyda, and G. D. Hager, “Object mapping, recognition, and localization from tactile geometry,” in Proceedings - IEEE International Conference on Robotics and Automation, 2011. doi: 10.1109/ICRA.2011.5980363.
M. Meier, M. Schöpfer, R. Haschke, and H. Ritter, “A probabilistic approach to tactile shape reconstruction,” IEEE Transactions on Robotics, vol. 27, no. 3, 2011, doi: 10.1109/TRO.2011.2120830.
A. Aggarwal, P. Kampmann, J. Lemburg, and F. Kirchner, “Haptic object recognition in underwater and deep-sea environments,” J Field Robot, vol. 32, no. 1, 2015, doi: 10.1002/rob.21538.
V. K. Nanayakkara, G. Cotugno, N. Vitzilaios, D. Venetsanos, T. Nanayakkara, and M. N. Sahinkaya, “The Role of Morphology of the Thumb in Anthropomorphic Grasping: A Review,” Frontiers in Mechanical Engineering, vol. 3. 2017. doi: 10.3389/fmech.2017.00005.
J. Bimbo, S. Luo, K. Althoefer, and H. Liu, “In-Hand Object Pose Estimation Using Covariance-Based Tactile To Geometry Matching,” IEEE Robot Autom Lett, vol. 1, no. 1, 2016, doi: 10.1109/LRA.2016.2517244.
J. M. Gandarias, A. J. Garcia-Cerezo, and J. M. Gomez-De-Gabriel, “CNN-Based Methods for Object Recognition with High-Resolution Tactile Sensors,” IEEE Sens J, vol. 19, no. 16, 2019, doi: 10.1109/JSEN.2019.2912968.
S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10. 2010. doi: 10.1109/TKDE.2009.191.
Y. Ganin et al., “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, vol. 17, 2016.
G. Csurka, “A comprehensive survey on domain adaptation for visual applications,” in Advances in Computer Vision and Pattern Recognition, 2017. doi: 10.1007/978-3-319-58347-1_1.
P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017. doi: 10.1109/CVPR.2017.632.
K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category models to new domains,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010. doi: 10.1007/978-3-642-15561-1_16.
Pekka Koskinen, Pieter van der Meer, Michael Steiner, Thomas Keller, Marco Bianchi. Automated Feedback Systems for Programming Assignments using Machine Learning. Kuwait Journal of Machine Learning, 2(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/190
Juhani Nieminen , Johan Bakker, Martin Mayer, Patrick Schmid, Andrea Ricci. Exploring Explainable AI in Educational Machine Learning Models. Kuwait Journal of Machine Learning, 2(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/191
Vadivu, N. S., Gupta, G., Naveed, Q. N., Rasheed, T., Singh, S. K., & Dhabliya, D. (2022). Correlation-based mutual information model for analysis of lung cancer CT image. BioMed Research International, 2022, 6451770. doi:10.1155/2022/6451770
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.