Tldr
- Start with Gaussian Noise. Have a diffusion model learn to denoise it and give us a clean and executable action trajectory. The whole process is our policy representation. reactive closed loop behavior
Motivation
- Previous behavioral cloning works struggled with multi-modality (when there are multiple valid actions for a single observation), covariate shift, and smoothness/consistency
Intuition
- We are generating trajectories (an example is a pencil motion during drawing).
- We corrupt the drawing by adding scribbles (noise). Diffusion models allow us to learn to reverse the noise, one layer at a time, until the clean/realistic trajectory is recovered.
- This is what allows us to generate new plausible actions that match the style of expert demonstrations
Read up about my notes on Diffusion Models.
Idea
- Diffusion Polciy is essentially a conditional diffusion model for actions.
- So we are conditioning the trajectory genreation on the robot’s current observation/state.