These two things are the same story: the shape of a neural network and the shape of the chip that runs it. A dataflow accelerator tries to make the silicon mirror the computation, weights stream one way, activations another, and the multiply-accumulate operations happen in a grid built for exactly that traffic.
US20210271960A1 (published September 2, 2021) addresses how to scale performance across such a dataflow architecture, tagged G06N 3/063, the class for neural-network hardware implementation. The question it tackles is the one every accelerator team faces: as you add more compute units, how do you keep them fed without the data movement becoming the bottleneck?
“Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators.”— U.S. Patent Application 2021/0271960 A1 source
Connect the dots to the sector. The reason 'data center' lines dominate NVIDIA's and AMD's filings is that the hard problem in AI is no longer just the math, it's moving the numbers. A dataflow design wins when it minimizes how far data travels, because moving bits costs more energy than computing on them. Performance scaling on these chips is mostly a memory-and-interconnect story wearing a compute costume.
Follow the IP and you see the whole industry converging on this insight around 2021: GPUs, custom ASICs, and startup accelerators all chasing higher utilization of their compute by smarter dataflow. The publication is one data point in a dense cluster of hardware-architecture filings from that window.
The caveat: a publication describes an architecture, not a shipping chip, and 'performance scaling' claims live or die on the specific workload. But the filing is a clean marker that by late 2021, dataflow scaling, not raw transistor count, was understood to be the lever for AI accelerator performance.