don't get me wrong: it *is* a flaw when something performs better on a benchmark than actual usage. Sometimes, that's a flaw of the benchmark (they're supposed to simulate actual usage), other times it's a flaw of effectively developing *for* the benchmark - that's not great, and should be brought up as a mistake, at least.
But categorising it as "cheating", etc? Nah - they're probably mistakes similar to overtraining an AI, tbh.
In short: we have enough *genuine bad intentions* in the tech industry as it is to be adding more that probably aren't actually intentionally nefarious...
Just call out the issues, don't act like it's all intentional, and if it gets fixed? Great! It maybe wasn't intentional.
If you go in with a "this is evil" mentality, everyone is damned if you do, damned if you don't. It's toxic.