Tldr

  • Start with Gaussian Noise. Have a diffusion model learn to denoise it and give us a clean and executable action trajectory. The whole process is our policy representation. reactive closed loop behavior

Motivation

  • Previous behavioral cloning works struggled with multi-modality (when there are multiple valid actions for a single observation), covariate shift, and smoothness/consistency

Intuition

  • We are generating trajectories (an example is a pencil motion during drawing).
  • We corrupt the drawing by adding scribbles (noise). Diffusion models allow us to learn to reverse the noise, one layer at a time, until the clean/realistic trajectory is recovered.
  • This is what allows us to generate new plausible actions that match the style of expert demonstrations

Read up about my notes on Diffusion Models.

Idea

  • Diffusion Polciy is essentially a conditional diffusion model for actions.
  • So we are conditioning the trajectory genreation on the robot’s current observation/state.