Adam's rant on benchmarks

Posted 2019-11-18

A DIP to remove the ~= operator from slices was shot down by the community, and there were forum posts about benchmarks. I write about why I don't really care for benchmarks.

Core D Development Statistics

20 bugs fixed
17 bugs and enhancement requests opened
39 pull requests merged into the language: 27 into DMD, 12 into Phobos, and 0 into druntime.
1 pull requests merged into the website.

Adam's rant

I don't really believe in benchmarks. Lots of websites write lots of numbers on them, but really, they don't have a lot of real world applicability.

Despite some benchmarks trying hard to be realistic, they rarely actually are, and they are very often are applied far too generally. For example, how many times have you seen a comment on one of those vibe benchmarks that says "D is slow?" That's not what the benchmark actually said though: all it really said is this implementation using the vibe.d library performed more slowly than the competitors on this specific test.

That says very little about D itself. It doesn't even say a lot about vibe.d itself - perhaps this implementation was just not great, or called a poor section of the library as a whole. Or maybe the implementation is fine, but the competitors cheated! Well, cheated is a kinda strong word, but they could be optimized to the benchmark case, perhaps neutral to or perhaps at the expense of the general case.

Benchmarks try to do an apples-to-apples comparison by using a particular piece of hardware for everyone. But that hardware may be absolutely nothing like what you actually use, and your code may perform radically different on the hardware you actually use.

To draw a conclusion about your use case on your hardware, the online benchmarks are of little help. Instead you have to profile yourself. And then, unlike the benchmark which just says "this took X seconds", the profile actually gives you hints as to why it is slow.

Benchmarking	Profiling
Not generally applicable	Useful to you specifically
Generates complaints	Generates actionable data
Done before work; premature optimization	Done after real experience guides you

Let me expand on that last row: you might argue it is important to know where to look ahead of time so you don't hit a wall after getting invested. And I somewhat agree, but the problem is benchmarks don't really measure the holistic cost and benefit of a system.

If you are using a "slow" language, you might worry it is going to be too slow for you. And that might be fair, but you should consider that the implementation could be improved (probably though profiling!), or there's a good chance you can rewrite bottlenecks as a component in another language, or just set up a memory cache, or something like that to turn it around.

The real question is: how difficult is it to set those things up? Will they actually work with your usage patterns? Benchmarks are rarely insightful on these other factors.

Whereas once you start working and find your development speed is poor, or your site has poor latency, or you are hitting a global GC lock on high concurrency you actually need, now you can test that specific thing and try to change or work around it.

You can learn about those problems ahead of time by reading other people's reports. But you'll almost certainly need to look more closely at the circumstances than benchmark websites provide. And that's why I don't put a whole lot of stock in them.

(unless my code wins, then benchmarks are totally legit and 100% accurate to everything!!!!!!)

Blog Articles

Adam's rant on benchmarks

Core D Development Statistics

In the community

Community announcements

Adam's rant