Semantic-Contact Fields for Category-Level Generalizable Tactile Tool Manipulation

Kevin Yuchen Ma^1,2, Heng Zhang^1,3, Weisi Lin³, Mike Zheng Shou^*2, Yan Wu^*1

^*Corresponding Authors
¹Institute for Infocomm Research, A*STAR ²Show Lab, National University of Singapore ³College of Computing and Data Science, Nanyang Technological University

RSS 2026

Paper Code arXiv Data & Checkpoints

Semantic-Contact Fields (SCFields) Overview. 1. Multimodal Inputs: The system takes RGB-D observations and tactile readings from GelSight sensors. 2. SCFields Generation: Our unified perception module fuses these inputs into a dense point cloud representation containing both category-level semantics (blue/green heatmap) and extrinsic contact force vectors (green arrows). 3. Policy Execution: A diffusion policy conditioned on the SCFields enables zero-shot generalization to novel tools variants (e.g., peelers of different shapes) in contact-rich tasks by reasoning about functional affordance and contact forces simultaneously.

Video

Abstract

Generalizing tool manipulation requires both semantic planning and precise physical control. Modern generalist robot policies, such as Vision-Language-Action (VLA) models, often lack the physical grounding required for contact-rich tool manipulation. Conversely, existing contact-aware policies that leverage tactile or haptic sensing are typically instance-specific and fail to generalize across diverse tool geometries. Bridging this gap requires learning representations that are both semantically transferable and physically grounded, yet a fundamental barrier remains: diverse real-world tactile data are prohibitive to collect at scale, while direct zero-shot sim-to-real transfer is challenging due to the complex nonlinear deformation of soft tactile sensors.

To address this, we propose Semantic-Contact Fields (SCFields), a unified 3D representation that fuses visual semantics with dense extrinsic contact estimates, including contact probability and force. SCFields is learned through a two-stage Sim-to-Real Contact Learning Pipeline: we first pre-train on large-scale simulation to learn geometry-aware contact priors, then fine-tune on a small set of real data pseudo-labeled via geometric heuristics and force optimization to align real tactile signals. The resulting force-aware representation serves as the dense observation input to a diffusion policy, enabling physical generalization to unseen tool instances. Experiments on scraping, crayon drawing, and peeling demonstrate robust category-level generalization, significantly outperforming vision-only and raw-tactile baselines.

Method Overview

Left: Contact Field Learning Stage 1 learns the general geometry and contact physics in simulated data; Stage 2 aligns sensor domain with pseudo-labeled real data. Right: Policy Learning A Diffusion Policy is trained conditioned on the combined SCFields to achieve robust tool manipulation.

Contact Field Visualization

RGB Front View

Contact Field Prediction

Scraper

Crayon

Peeler

Real World Demos

Seen Tool Variant

Unseen Tool Variant

Scraper

Crayon

Peeler

BibTeX

@misc{ma2026semanticcontactfieldscategorylevelgeneralizable,
    title={Semantic-Contact Fields for Category-Level Generalizable Tactile Tool Manipulation},
    author={Kevin Yuchen Ma and Heng Zhang and Weisi Lin and Mike Zheng Shou and Yan Wu},
    year={2026},
    eprint={2602.13833},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2602.13833},
}