DeepMind Motor Primitives Patent | NeuralDocket

An August 2022 DeepMind grant describes learning 'motor primitives' with a stabilized policy. How a simulated body learns to walk before it learns to run.

Forget the jargon for a second. Teaching an agent to control a body, real or simulated, directly is hard because the search space of muscle commands is enormous. Motor primitives shrink it: instead of learning every twitch from scratch, the agent learns a small vocabulary of movements and then learns to sequence them.

DeepMind's grant US11403513B2 (issued August 2, 2022) formalizes this with a linear-feedback-stabilized policy, a control trick that keeps the learned movement from flying apart. The CPC tags G06N 3/0472 (recurrent/temporal networks) and G06N 5/00 (knowledge-based methods) reflect that this sits between deep learning and classical control.

“A computer-implemented method of training a student machine learning system comprises receiving data indicating execution of an expert, determining one or more actions performed by the expert during the execution and a corresponding state-action Jacobian, and training the student machine learning system using a linear-feedback-stabilized policy.”— U.S. Patent No. 11,403,513 source

The grant actually describes two linked things, and the abstract names both. The first is the training method quoted above: a student system learns from an expert's execution by computing, at each moment, a “state-action Jacobian”, the sensitivity of the expert's action to small changes in state, and then training with a linear-feedback-stabilized policy built on that Jacobian. The second is an architecture: a neural network system “for representing a space of probabilistic motor primitives,” with an encoder that turns sequences of input frames into latent variables and a decoder that turns those latent variables, plus the current state, back into an action. The encoder learns the vocabulary; the decoder speaks it.

Claim 1 is where the recurrent, temporal character shows. It describes encoding behaviours for recall: you obtain a training trajectory of an example behaviour, a sequence of (observation, action) pairs across time steps, and for a particular time step t you generate the action from the observation at t together with one or more future observations at steps t+1 through t+k. The encoder input deliberately peeks ahead. A later claim adds a probabilistic regularizer: the training objective includes a term penalizing the difference between the learned posterior over motor-primitive latents and a prior, and that prior is autoregressive, at each step it depends on the prior at the previous step (scaled by a factor with magnitude below one) plus a noise component. That is the machinery that makes the primitives a smooth, sampleable space rather than a lookup table.

Under the hood, “linear-feedback-stabilized” is the load-bearing phrase. Learned policies can be twitchy and unstable; a feedback term built from the state-action Jacobian continuously corrects small deviations, like a steadying hand on the controls. Stabilizing the primitive means the agent can rely on it as a solid building block rather than relearning balance every time. The autoregressive prior, meanwhile, is what lets the decoder generate coherent sequences of primitives, each step's distribution carries momentum from the last.

Why this matters beyond robotics: the primitive-and-recompose pattern is everywhere in modern AI. Language models learn reusable sub-skills and recombine them; agent systems chain learned tools. DeepMind's movement work is a clean, physical instance of a principle, learn modular skills, compose them, that generalizes far past walking, and the patent shows the principle made rigorous through an encoder-decoder latent space and a stabilizing control law.

The look-ahead structure in Claim 1 deserves a closer reading, because it is what makes the primitives reusable rather than brittle. When the encoder builds its input for time step t, it does not use only the present observation; it folds in observations from steps t+1 through t+k, the near future of the trajectory. Encoding a movement together with where it is going lets the latent variable capture an intention, a short arc of behavior, instead of a single instantaneous pose. That is precisely what you want from a “primitive”: a chunk of motion you can recall and replay as a unit, not a frame you have to stitch by hand.

The probabilistic prior is the other half of the design, and the autoregressive form is deliberate. By making the prior over motor-primitive latents depend on the previous step's prior, scaled by a factor below one, plus noise, the model imposes temporal smoothness: consecutive primitives are correlated, so sampling from the latent space yields coherent sequences of movement rather than disconnected jerks. Combined with the linear-feedback-stabilized policy built on the state-action Jacobian, which damps deviations as they arise, the result is a learned-yet-stable controller. The patent is, in effect, marrying a probabilistic generative model of behavior to a classical control law, and the two CPC tags, recurrent networks and knowledge-based methods, name exactly that marriage.

It is also worth noting how cleanly the two halves of the grant compose into a single story. The encoder-decoder system gives you a learned, probabilistic vocabulary of behaviors, the look-ahead encoder building intention-laden latents, the autoregressive prior keeping sampled sequences smooth. The training method gives you a way to fill that vocabulary from demonstration: watch an expert, compute the state-action Jacobian at each step, and distill the behavior into the student under a linear-feedback-stabilized policy that will not destabilize when replayed. Vocabulary plus acquisition plus stability is a complete recipe for the kind of reusable, recomposable motor skill the headline gestures at, and it is why the filing sits at the seam between machine learning (G06N 3/0472) and classical knowledge-based control (G06N 5/00) rather than squarely in either.

The careful note: this is a granted patent with claims that set its real scope, and “learns to move” is a research result, not a shipped product. As a marker it's useful, a dated, named DeepMind grant showing that by 2022, structured movement learning via reusable, probabilistically-modeled primitives, stabilized by classical feedback control, was core enough to protect.

How an AI Learns to Move — DeepMind's 2022 Grant on Motor Primitives

Comments