Compiler into ideal machine code

279 viewsOther

I came across LLVM, and they are in a constant struggle to optimize IL into binary, and it seems they are always losing this battle in some way. Either by introducing weird bugs, or by finding themselves in checkmate position by not being able to apply some optimizations because it breaks invariants.
Why is it not possible? Is it just because CPUs are complex, or is there something fundamental going on? They have been trying to optimize C for x86 for quite a while now…

In: Other

3 Answers

Anonymous 0 Comments

> They have been trying to optimize C for x86 for quite a while now…

Optimize for what? Speed? Memory?

There often is not one optimal solution as you can optimize for different goals. A fast code does not use less memory.

And x86 is just one architecture, compilers compile to multiple architectures because people have different hardwares.

Anonymous 0 Comments

Because you can’t define “ideal”. It’s always a compromise between multiple performance and cost aspects. The most balanced choice will always be the worst product in each specific benchmark category, not best.

Anonymous 0 Comments

With the techniques conventional compilers use, optimal code is only generated by accident. Compiler are really not even trying to make optimal code, what they’re doing is applying a big bag of tricks and hoping for the best. Improving a compiler means expanding the bag of tricks, which is not a simple process because the tricks are complicated and tend to have a lot of subtle pre-conditions.

There is such a thing as a “superoptimizer”, which really does try to find an optimal sequence of instructions, but that has a lot of limitations: it’s too slow to find anything other than a short sequence of instructions, even *that* can already be very difficult depending on what sort of instructions you allow and whether you also search for constants, accurate cost modelling is impossible even in theory because the cost of a sequence of instructions depends on its context (balancing µops over the different available execution ports is more important than minimizing the number of µops, how fast a load from memory is depends on the state of the caches so it depends on previous loads, etc, so costs are non-local).

I personally work on superoptimization using SAT solving, I can get some good results with it *sometimes*, but I see no direct use for it in a compiler in the near future. This nonsense can take hours or days to solve seemingly simple problems. What is promising IMO is [using a superoptimizer to build a database of optimizations](https://github.com/google/souper), but then we’re back to building an ever-expanding “big bag of tricks”, at least with more assurances that the tricks are correct, but nothing about this makes code *optimal*, it just makes it better.

Even when code *is* optimal, that means it’s optimal on a specific microarchitecture, for example “optimal on Intel Skylake” or “optimal on AMD Zen4”, the same code is almost never optimal across a wide range of hardware at the same time – the hardware is just too different for that to happen (well, there is a wide range of Skylake-like Intel processors that are close enough that you could target all of them at the same time, but outside of that: not happening).