@intrbiz I ran the test with two of the 176-core machines, with pgbench on one and the database on the other. And in this case I don't observe the issue. I did that in both directions, so for each pinning strategy there are two data series. Of course, the pinning is mostly pointless - it's on the pgbench machine, it can't pin the backends at all. Still, I'm a bit surprised "none" wins this much.
@intrbiz It's however true the throughput is much lower - here's a chart with results for local (unix sockets) runs from both machines. It reaches almost 5M tps, the remote TCP only gets to 1M tps.
The network is pretty good. iperf3 says it can do ~80Gb/s, and per netperf the latency is about 0.07ms (min: 40us, mean: 73us, max 4441us).
@intrbiz Good idea. I did consider that, but then didn't actually try for some reason. Will give it a try tomorrow.