These three things are the same story. The patent US20210092069A1 (published March 25, 2021) is filed under both G06N 3/04 (neural networks) and a stack of H04L networking classes, traffic management, routing, congestion. That CPC pairing is the whole point: at scale, AI performance is a networking problem.
Connect the dots. When you train a large model across hundreds of accelerators, each step requires the machines to exchange gradients, huge bursts of synchronized traffic. If the network stalls, every expensive chip sits idle waiting. So accelerating multi-node ML means scheduling and routing that traffic so the compute never starves.
Follow both the money and the IP and the capex story falls out. The reason hyperscaler buildouts cost what they do isn't only the GPUs, it's the interconnect, the switches, the topology that lets thousands of chips act like one. A patent that lives in both G06N and H04L is the literal intersection of the AI bill and the networking bill.
This is also why how many GPUs is the wrong question on its own. A cluster's effective performance depends on how well the network keeps the chips synchronized. The 2021 filing is an early, concrete acknowledgment that the scale-out bottleneck had moved from raw FLOPs to data movement between nodes.
House caveat: this is a published application describing a method, and real cluster performance depends on workload and topology specifics the patent doesn't fix. But as a marker it's clean, by early 2021, making many machines train one model efficiently was understood as a networking discipline worth patenting.