Generalizable Pose Estimation Using Implicit Scene Representations

Yotto Koga


At Autodesk, we’re focused on enabling and empowering engineers and designers to tackle their product design and make challenges successfully. In the physical realm, this can range from small mechanical or electronic components to big infrastructure projects.

One of our active research areas is finding ways to automate the assembly of such products. With our customers in mind and anticipating the kinds of manufacturing challenges they will face, we want a solution that is easy to use, set up, and capable of handling high-mix, low-volume production where the desired assembly changes frequently. We believe exploring and prototyping adaptive robotic systems integrated with thoughtful user workflows starting with CAD tools and models, will move us closer to this goal.

A critical building block for this work is developing perception models that understand the shape of the parts that make up the product. For example, a model that can infer the pose of a part in an image sampled from a camera. We can control a robot and its attached gripper from the pose estimate to manipulate that part adaptively.

At the 2023 International Conference on Robotics and Automation (ICRA) in London, we present some recent results on this part pose-estimation problem [1].

With an extended implicit scene representation network [2] and utilizing the technique of inverting neural renderers, we demonstrate how to infer the pose of the part in a query image. Specifically, from a guess of the pose (or a set of guesses), we render the part using its implicit scene representation and then iterate on the pose parameters to minimize the pixel error between the rendered guess and the query image. The converged result is then the pose estimate of the part.


Using few-shot learning, we also show generalization to unseen parts within a category. Please refer to the project website and paper for more details.

We continue to explore various perception models and building blocks for our adaptive robotic assembly workflow. We are intrigued by the demonstrated learning potential of these implicit models and their associated prompt-based workflows. We hope to have a follow-on blog post with an update on our progress.

Vaibhav Saxena, a Ph.D. student at Georgia Tech and an intern at Autodesk, is the primary contributor to the research presented at ICRA 2023. Yotto Koga is an architect in the Autodesk Research Robotics Lab.

[1] V. Saxena, K.R. Malekshan, L. Tran, and Y. Koga, “Generalizable Pose Estimation Using Implicit Scene Representations, ” in 2023 International Conference on Robotics and Automation (ICRA). IEEE, 2023.
[2] V. Sitzmann, M. Zollhofer, and G. Wetzstein, “Scene representation networks: Continuous 3d-structure-aware neural scene representations,” Advances in Neural Information Processing Systems, vol. 32, 2019.

Get in touch

Have we piqued your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities

Go to link