Geometric Red-Teaming for Robotic Manipulation

Conference on Robot Learning (CoRL) 2025

Oral Presentation • 5.7% acceptance
1Robotics Institute, Carnegie Mellon University | 2National Institute of Standards and Technology
( indicates equal advising)
Overview of the Geometric Red-Teaming framework showing object deformation and policy failure.

Abstract

Standard evaluation protocols in robotic manipulation typically assess policy performance over curated, in-distribution test sets, offering limited insight into how systems fail under plausible variation. We introduce Geometric Red-Teaming (GRT), a red-teaming framework that probes robustness through object-centric geometric perturbations, automatically generating CrashShapes---structurally valid, user-constrained mesh deformations that trigger catastrophic failures in pre-trained manipulation policies. The method integrates a Jacobian field–based deformation model with a gradient-free, simulator-in-the-loop optimization strategy. Across insertion, articulation, and grasping tasks, GRT consistently discovers deformations that collapse policy performance, revealing brittle failure modes missed by static benchmarks. By combining task-level policy rollouts with constraint-aware shape exploration, we aim to build a general purpose framework for structured, object-centric robustness evaluation in robotic manipulation. We additionally show that fine-tuning on individual CrashShapes, a process we refer to as blue-teaming, improves task success by up to 60 percentage points on those shapes, while preserving performance on the original object, demonstrating the utility of red-teamed geometries for targeted policy refinement. Finally, we validate both red-teaming and blue-teaming results with a real robotic arm, observing that simulated CrashShapes reduce task success from 90% to as low as 22.5%, and that blue-teaming recovers performance to up to 90% on the corresponding real-world geometry---closely matching simulation outcomes.

System Overview

System overview of our geometric red-teaming pipeline.

System overview of GRT. Given a task description and nominal object (Initialization Parameters), anchor and handle points are selected using a vision-language model (a). Handle displacements are sampled to define a population of deformation candidates. Each sample is converted into a perturbed mesh via Jacobian field-based optimization (b) and evaluated in simulation with a frozen policy (c). Deformations that induce failure are sampled to guide the next population.

Simulation Demos

Object Type Task 1: Grasping Task 2: Insertion Task 3: Articulation
Nominal Objects
CrashShapes

VLM Prompting Strategy

Diagram of the Two-stage VLM prompting strategy for 3D handle-point selection

Two-stage VLM prompting strategy for 3D handle-point selection. First, the Geometric Reasoning template aligns a canonical view-panel and indexed keypoints with a high-level task description, guiding the VLM to infer which vertices control meaningful mesh deformations. Next, the Task-Critical Ranking template asks the model to pareto-rank these candidates by plausibility and task relevance, producing a compact set of handle points for targeted, task-aware red-teaming.

VLM Prompting Examples

Real World Validation Over Insertion Policy

We tested our geometric red-teaming framework on a physical xARM 6 robot using 3D-printed plugs (Nominal, CS-1, CS-2). The following videos demonstrate the effectiveness of red-teaming in identifying failures and blue-teaming in enhancing policy robustness.

Tip: Select a thumbnail from the video strips to play the trial in the larger window.

1. Baseline: Original Policy with Nominal Plug

The pre-trained policy consistently succeeds with the standard, undeformed object.

2. Red Teaming: Original Policy Fails on CrashShape CS-1

The same policy consistently fails when presented with CrashShape CS-1.

3. Red Teaming: Original Policy Fails on CrashShape CS-2

The policy also consistently fails when presented with CrashShape CS-2.

4. Blue Teaming on CrashShape CS-1

A policy was fine-tuned using CS-1 and the nominal plug. It was then evaluated on both shapes.

Performance on CS-1 (with CS-1 Blue-Teamed Policy)

The blue-teamed policy now consistently succeeds on CS-1.


Performance on Nominal Plug (with CS-1 Blue-Teamed Policy)

Performance on the nominal plug is preserved.

5. Blue Teaming on CrashShape CS-2

A separate policy was fine-tuned using CS-2 and the nominal plug, then evaluated.

Performance on CS-2 (with CS-2 Blue-Teamed Policy)

This blue-teamed policy now consistently succeeds on CS-2.


Performance on Nominal Plug (with CS-2 Blue-Teamed Policy)

Performance on the nominal plug is also preserved with this policy.

Real World Validation Over Contact Graspnet

We tested our geometric red-teaming framework on a physical Franka Emika Panda robot using 3D-printed objects (Nominal, Deformed) pairs from the YCB dataset. The following videos demonstrate the effectiveness of red-teaming in identifying failures over the generalizable grasping model - Contact Graspnet.

Tip: Select a thumbnail from the video strips to play the trial in the larger window.

1. Baseline: Contact Graspnet on Original YCB Mustard Bottle

The most confident grasp from Contact Graspnet consistently performs well on the 3D printed version of the mustard bottle taken directly from the YCB dataset.

2. Red-Teaming: Contact Graspnet Fails on Deformed YCB Mustard Bottle

The most confident grasp from Contact Graspnet fails on the 3D printed version of the deformed mustard bottle obtained upon geometric red-teaming.

3. Baseline: Contact Graspnet on Original YCB Screw Driver

The most confident grasp from Contact Graspnet consistently performs well on the 3D printed version of the screw driver taken directly from the YCB dataset.

2. Red-Teaming: Contact Graspnet Fails on Deformed YCB Screw Driver

The most confident grasp from Contact Graspnet fails on the 3D printed version of the deformed screw driver obtained upon geometric red-teaming.

Acknowledgement

This material is based upon work supported by NIST under Grant No. 70NANB24H314, and the Office of Naval Research under the MURI Grant No. N00014-24-1-2748. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NIST or the Office of Naval Research.

BibTeX

@misc{goel2025geometricredteamingroboticmanipulation,
      title={Geometric Red-Teaming for Robotic Manipulation}, 
      author={Divyam Goel and Yufei Wang and Tiancheng Wu and Guixiu Qiao and Pavel Piliptchak and David Held and Zackory Erickson},
      year={2025},
      eprint={2509.12379},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2509.12379}, 
}