Diffusion Policy

Tldr

Start with Gaussian Noise. Have a diffusion model learn to denoise it and give us a clean and executable action trajectory. The whole process is our policy representation. reactive closed loop behavior

Motivation

Previous behavioral cloning works struggled with multi-modality (when there are multiple valid actions for a single observation), covariate shift, and smoothness/consistency

Intuition

We are generating trajectories (an example is a pencil motion during drawing).

We corrupt the drawing by adding scribbles (noise). Diffusion models allow us to learn to reverse the noise, one layer at a time, until the clean/realistic trajectory is recovered.

This is what allows us to generate new plausible actions that match the style of expert demonstrations

Read up about my notes on Diffusion Models.

Idea

Diffusion Polciy is essentially a conditional diffusion model for actions.

So we are conditioning the trajectory genreation on the robot’s current observation/state.

Brayden Zhang

Explorer

Diffusion Policy

Graph View