Conference on Neural Information Processing Systems 2023


Scaling Up Soft Body Manipulation using Vision-Language Driven Differentiable Physics

(A) A dumpling making video; (B) The annotator interacts with our GUI tool to create DiffVL tasks; (C) A DiffVL task contains a sequence of 3D scenes along with natural language instructions to guide the solver; (D) DiffVL leverages a large language model to compile instructions into optimization programs consisting of vision elements; (E) The optimization program guides the solver to solve the task in the end.


Combining gradient-based trajectory optimization with differentiable physics simulation is an efficient technique for solving soft-body manipulation problems. Using a well-crafted optimization objective, the solver can quickly converge onto a valid trajectory. However, writing the appropriate objective functions requires expert knowledge, making it difficult to collect a large set of naturalistic problems from non-expert users. We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks — a combination of vision and natural language, given in multiple stages — that can be readily leveraged by a differential physics solver. We have developed GUI tools that enable non-expert users to specify 100 tasks inspired by real-life soft-body manipulations from online videos, which we’ll make public. We leverage large language models to translate task descriptions into machine-interpretable optimization objectives. The optimization objectives can help differentiable physics solvers to solve these long-horizon multistage tasks that are challenging for previous baselines.

Download publication

Associated Researchers

Zhiao Huang

University of California, San Diego

Feng Chen

Tsinghua University

Chunru Lin

UMass Amherst

Hao Su

University of California, San Diego

Chuang Gan

UMass Amherst

View all researchers

Other Resources

Related Publications



CAD-LLM: Large Language Model for CAD Generation

This research presents generating Computer Aided Designs (CAD) using…



Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

Generative model that can synthesize consistent 3D shapes from a…



Generating Pragmatic Examples to Train Neural Program Synthesizers

Using neural networks is a novel way to amortize a synthesizer’s…



Learned Visual Features to Textual Explanations

A novel method that leverages the capabilities of large language…

Get in touch

Something pique your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities.

Contact us