Publication
Learning Dense Reward with Temporal Variant Self-Supervision
AbstractRewards play an essential role in reinforcement learning. In contrast to rule-based game environments with well-defined reward functions, complex real-world robotic applications, such as contact-rich manipulation, lack explicit and informative descriptions that can directly be used as a reward. Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations. In this paper, we aim to extend this effort by proposing a more efficient and robust way of sampling and learning. In particular, our sampling approach utilizes temporal variance to simulate the fluctuating state and action distribution of a manipulation task. We then proposed a network architecture for self-supervised learning to better incorporate temporal information in latent representations. We tested our approach in two experimental setups, namely joint-assembly and door opening. Preliminary results show that our approach is effective and efficient in learning dense rewards, and the learned rewards lead to faster convergence than baselines.
Download publicationRelated Resources
2021
Inferring CAD Modeling Sequences using Zone Graphs
In computer-aided design (CAD), the ability to “reverse engineer” the…
2022
Systems Design and Simulation
Predictive models of complex systems will require a more scalable,…
2016
HIVE Exhibit
Hive is an innovative and interactive exhibit built through a unique…
2002
3D Freeform Design
Alternative input devices and new interaction paradigms can better…
Get in touch
Something pique your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities.
Contact us