Description
This issue describes a problem, potential solutions and results collected by a prototype implementation.
It will be followed by a go-review CL.
The current state
Some important Go optimization phases happen at the frontend stage where we have annotated AST.
Notable examples:
- Inlining.
- Escape analysis.
- Things like short string comparison optimization.
While the idea of porting everything to SSA backend sounds exciting:
- It's not trivial.
- It's hard to express some things on the SSA level as it is very low-level and sometimes relevant
information is erased during the SSA construction. - The SSA backend has its own set of little problems.
My proposal is to extend frontend part a little bit so there is just a bit more information so some useful optimizations
that implemented in frontend can work better. It won't make (potential) transition to SSA harder as the changes are
very focused and affect only a few places.
The problem
The problem I want to address in this document is a loss of const nature of values used through local variables.
If you use constant value directly, frontend recognizes it. If you're using it through never modified local variable,
it does not apply optimizations.
I'll describe two things I was investigating, but there can be a few more places where the proposed solution is applicable.
Problematic case 1: escape analysis for make-slice
(See #24577
)
// Can be allocated on stack if "xs" doesn't escape.
xs := make([]int, 10)
// Always escapes (non-const size reason).
n := 10
xs := make([]int, n)
One might argue that this is not useful piece of code.
True, but the code below does essentially the same thing, but is more meaningful:
func allocSlice(length int) []int {
return make([]int, length)
}
func example() int {
xs := allocSlice(10)
return len(xs)
}
allocSlice
is inlined. After inlining, the produced code looks like this:
func example() int {
_length := 10
_ret := make([]int, _length)
xs = _ret
return len(xs)
}
Note that it introduces the additional assignment.
It makes escape analysis to be too conservative there.
If const info is preserved, there could be no allocation at all.
Problematic case 2: short string comparison optimizations
Functions like strings.HasPrefix
and strings.HasSuffix
also suffer from constness info loss during the inlining.
// This is how HasPrefix is defined in stdlib.
// HasPrefix tests whether the string s begins with prefix.
func HasPrefix(s, prefix string) bool {
return len(s) >= len(prefix) && s[0:len(prefix)] == prefix
}
Now let's define two routines:
func isComment1(line string) bool {
return strings.HasPrefix(line, "#")
}
func isComment2(line string) bool {
return len(line) >= 1 && line[0] == '#'
}
The first version is more readable and idiomatic. And it can be optimized to the isComment2
if we
inline HasPrefix
body manually, even without substituting len("#")
since it's const expression anyway,
but it won't be as efficient due to intermediate variables reason.
Here is a difference in performance:
name old time/op new time/op delta
HasPrefix/#-8 16.5ns ± 3% 2.8ns ± 8% -82.98% (p=0.000 n=10+10)
The amount of generated code also different (for AMD64 it's 105 vs 31 bytes).
Other potentially related issues
Effectively constant value definition
We can consider local variable as effectively constant if:
- Its address is never taken.
- It is never assigned to.
The closure value capturing mechanism uses somewhat similar definition to figure out whether to capture something
by value or by reference.
Proposed solution
There are 2 parts. One is very simple and solves the simplest case, the other solves inlining-related problem.
- Use
Node.Name.Defn
for effectively constant nodes. So, if they're never assigned to,
just use constant value in analysis instead ofONAME
node inside the optimizations and escape analysis.
The good news is that it's very trivial to do. First CL includes this step.
The bad news is that inliner doesn't assignName.Defn
and usesOAS2
nodes.
So inlining problem remains. - During inlining, assign
Node.Name.Defn
field for constant initializers.
More below.
// Statements below are introducing ONAMEs that have Defn bound to them.
// So we already have relevant data for simplest, (1) case.
x := 10
var y = 10
The first step is solved by introducing getConstValue(n *Node) *Node
.
When node is passed to it, it looks into n.Name.Defn
field and if it's OAS
and Right
field is constant, returns it.
Otherwise, it returns n
itself (It could return nil
as well, it's almost irrelevant detail).
In the place where constness-dependent optimization is about to happen, instead of checking the nodes itself, one
queries the const value of the node with getConstValue
.
The second step implies that getConstValue
can also return appropriate constant values passed into inlined function.
Note that it won't help to handle cases like this:
n1 := 10
n2 := n1
xs := make([]int, n2)
The proposed mechanism is very trivial (or primitive, even), but it covers useful cases and is deterministic, which means programmers can rely on it. It also makes functions like strings.HasPrefix
appropriate in a hot-spots so you don't have to inline it manually or write specialization as demonstrated in one of the examples above.
Extra cases
Another example that can be enhanced by getConstValue
is OSTR2BYTES
.
package str2bytes
type object struct {
name string
data []byte
}
func newObject(name, data string) *object {
return &object{
name: name,
data: []byte(data),
}
}
var sink interface{}
func BenchmarkNewObject(b *testing.B) {
var o *object
for i := 0; i < b.N; i++ {
o = newObject("foo", "123")
}
sink = o
}
name old time/op new time/op delta
NewObject-8 96.6ns ± 2% 80.0ns ± 2% -17.20% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
NewObject-8 56.0B ± 0% 51.0B ± 0% -8.93% (p=0.000 n=10+10)
Any use of Isconst
inside compiler frontend can be a potential use case for getConstValue
, but one should
not go too far (it can change compile-time semantics, like reporting out-of-bound access to an array through a variable that is effectively constant, but the spec does not permit that to happen and other spots are something that SSA backend can optimize on its own).
compilebench results
Effect on compilation time is negligible:
name old time/op new time/op delta
Template 353ms ± 1% 354ms ± 1% ~ (p=0.497 n=9+10)
Unicode 143ms ± 2% 142ms ± 1% ~ (p=0.050 n=9+9)
GoTypes 1.40s ± 2% 1.40s ± 1% ~ (p=0.780 n=10+9)
Compiler 6.71s ± 2% 6.69s ± 0% ~ (p=0.370 n=9+8)
SSA 18.0s ± 1% 17.8s ± 1% -1.00% (p=0.004 n=10+10)
Flate 233ms ± 1% 232ms ± 1% ~ (p=0.123 n=10+10)
GoParser 287ms ± 1% 288ms ± 1% ~ (p=0.400 n=9+10)
Reflect 792ms ± 2% 787ms ± 0% ~ (p=0.447 n=10+9)
Tar 324ms ± 0% 326ms ± 1% +0.49% (p=0.021 n=8+9)
XML 462ms ± 2% 459ms ± 1% ~ (p=0.315 n=10+9)
StdCmd 31.8s ± 1% 31.8s ± 1% ~ (p=0.739 n=10+10)
With changes from initial CL (short string cmp + isSimpleSliceMake) there is a slight improvement in binary sizes
due to better optimization opportunities:
name old text-bytes new text-bytes delta
HelloSize 753kB ± 0% 753kB ± 0% -0.02% (p=0.000 n=10+10)
CmdGoSize 10.2MB ± 0% 10.2MB ± 0% -0.19% (p=0.000 n=10+10)
name old exe-bytes new exe-bytes delta
HelloSize 1.09MB ± 0% 1.09MB ± 0% ~ (all equal)
CmdGoSize 14.1MB ± 0% 14.1MB ± 0% -0.17% (p=0.000 n=10+10)
Metadata
Metadata
Assignees
Type
Projects
Status