SeeThruFinger: See and Grasp Anything via a Multi-Modal Soft Touch


Fang Wan, Zheng Wang, Wei Zhang, Chaoyang Song: SeeThruFinger: See and Grasp Anything via a Multi-Modal Soft Touch. Forthcoming, (Submitted to IEEE Transactions on Robotics).

Abstract

We present SeeThruFinger, a Vision-Based Tactile Sensing (VBTS) architecture using a markerless See-Thru-Network. It achieves simultaneous visual perception and tactile sensing while providing omni-directional, adaptive grasping for manipulation. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors for different modalities, a preferred solution is to integrate them into one elegant and coherent design, which is a challenging task. This study leverages the in-finger vision to inpaint occluded regions of the external environment, achieving coherent scene reconstruction for visual perception. By tracking real-time segmentation of the Soft Polyhedral Network’s large-scale deformation, we achieved real- time markerless tactile sensing of 6D forces and torques. We demonstrate the capable performances of the SeeThruFinger for reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the SeeThruFinger architecture to simultaneously achieve various capabilities, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force-and-torque sensing, and contact event detection, all within a single input from the in-finger vision of the See-Thru-Network in a markerless way. All codes are available at https://github.com/ ancorasir/SeeThruFinger.

BibTeX (Download)

@online{Wan2024SeeThruFinger,
title = {SeeThruFinger: See and Grasp Anything via a Multi-Modal Soft Touch},
author = {Fang Wan and Zheng Wang and Wei Zhang and Chaoyang Song},
doi = {10.48550/arXiv.2312.09822},
year  = {2024},
date = {2024-09-20},
abstract = {We present SeeThruFinger, a Vision-Based Tactile Sensing (VBTS) architecture using a markerless See-Thru-Network. It achieves simultaneous visual perception and tactile sensing while providing omni-directional, adaptive grasping for manipulation. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors for different modalities, a preferred solution is to integrate them into one elegant and coherent design, which is a challenging task. This study leverages the in-finger vision to inpaint occluded regions of the external environment, achieving coherent scene reconstruction for visual perception. By tracking real-time segmentation of the Soft Polyhedral Network’s large-scale deformation, we achieved real- time markerless tactile sensing of 6D forces and torques. We demonstrate the capable performances of the SeeThruFinger for reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the SeeThruFinger architecture to simultaneously achieve various capabilities, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force-and-torque sensing, and contact event detection, all within a single input from the in-finger vision of the See-Thru-Network in a markerless way. All codes are available at https://github.com/ ancorasir/SeeThruFinger.},
note = {Submitted to IEEE Transactions on Robotics},
keywords = {Corresponding Author, Under Review},
pubstate = {forthcoming},
tppubtype = {online}
}