Skip to content

cmd/compile: better const-based optimizations handling in compiler frontend #29095

Open
@quasilyte

Description

@quasilyte

This issue describes a problem, potential solutions and results collected by a prototype implementation.

It will be followed by a go-review CL.

The current state

Some important Go optimization phases happen at the frontend stage where we have annotated AST.

Notable examples:

  1. Inlining.
  2. Escape analysis.
  3. Things like short string comparison optimization.

While the idea of porting everything to SSA backend sounds exciting:

  1. It's not trivial.
  2. It's hard to express some things on the SSA level as it is very low-level and sometimes relevant
    information is erased during the SSA construction.
  3. The SSA backend has its own set of little problems.

My proposal is to extend frontend part a little bit so there is just a bit more information so some useful optimizations
that implemented in frontend can work better. It won't make (potential) transition to SSA harder as the changes are
very focused and affect only a few places.

The problem

The problem I want to address in this document is a loss of const nature of values used through local variables.
If you use constant value directly, frontend recognizes it. If you're using it through never modified local variable,
it does not apply optimizations.

I'll describe two things I was investigating, but there can be a few more places where the proposed solution is applicable.

Problematic case 1: escape analysis for make-slice

(See #24577)

// Can be allocated on stack if "xs" doesn't escape.
xs := make([]int, 10)

// Always escapes (non-const size reason).
n := 10
xs := make([]int, n)

One might argue that this is not useful piece of code.
True, but the code below does essentially the same thing, but is more meaningful:

func allocSlice(length int) []int {
  return make([]int, length)
}

func example() int {
  xs := allocSlice(10)
  return len(xs)
}

allocSlice is inlined. After inlining, the produced code looks like this:

func example() int {
  _length := 10
  _ret := make([]int, _length)
  xs = _ret
  return len(xs)
}

Note that it introduces the additional assignment.
It makes escape analysis to be too conservative there.

If const info is preserved, there could be no allocation at all.

Problematic case 2: short string comparison optimizations

Functions like strings.HasPrefix and strings.HasSuffix also suffer from constness info loss during the inlining.

// This is how HasPrefix is defined in stdlib.

// HasPrefix tests whether the string s begins with prefix.
func HasPrefix(s, prefix string) bool {
	return len(s) >= len(prefix) && s[0:len(prefix)] == prefix
}

Now let's define two routines:

func isComment1(line string) bool {
  return strings.HasPrefix(line, "#")
}

func isComment2(line string) bool {
  return len(line) >= 1 && line[0] == '#'
}

The first version is more readable and idiomatic. And it can be optimized to the isComment2 if we
inline HasPrefix body manually, even without substituting len("#") since it's const expression anyway,
but it won't be as efficient due to intermediate variables reason.

Here is a difference in performance:

name           old time/op  new time/op  delta
HasPrefix/#-8  16.5ns ± 3%   2.8ns ± 8%  -82.98%  (p=0.000 n=10+10)

The amount of generated code also different (for AMD64 it's 105 vs 31 bytes).

Other potentially related issues

Effectively constant value definition

We can consider local variable as effectively constant if:

  1. Its address is never taken.
  2. It is never assigned to.

The closure value capturing mechanism uses somewhat similar definition to figure out whether to capture something
by value or by reference.

Proposed solution

There are 2 parts. One is very simple and solves the simplest case, the other solves inlining-related problem.

  1. Use Node.Name.Defn for effectively constant nodes. So, if they're never assigned to,
    just use constant value in analysis instead of ONAME node inside the optimizations and escape analysis.
    The good news is that it's very trivial to do. First CL includes this step.
    The bad news is that inliner doesn't assign Name.Defn and uses OAS2 nodes.
    So inlining problem remains.
  2. During inlining, assign Node.Name.Defn field for constant initializers.
    More below.
// Statements below are introducing ONAMEs that have Defn bound to them.
// So we already have relevant data for simplest, (1) case.
x := 10
var y = 10

The first step is solved by introducing getConstValue(n *Node) *Node.
When node is passed to it, it looks into n.Name.Defn field and if it's OAS and Right field is constant, returns it.
Otherwise, it returns n itself (It could return nil as well, it's almost irrelevant detail).

In the place where constness-dependent optimization is about to happen, instead of checking the nodes itself, one
queries the const value of the node with getConstValue.

The second step implies that getConstValue can also return appropriate constant values passed into inlined function.

Note that it won't help to handle cases like this:

n1 := 10
n2 := n1
xs := make([]int, n2)

The proposed mechanism is very trivial (or primitive, even), but it covers useful cases and is deterministic, which means programmers can rely on it. It also makes functions like strings.HasPrefix appropriate in a hot-spots so you don't have to inline it manually or write specialization as demonstrated in one of the examples above.

Extra cases

Another example that can be enhanced by getConstValue is OSTR2BYTES.

package str2bytes

type object struct {
	name string
	data []byte
}

func newObject(name, data string) *object {
	return &object{
		name: name,
		data: []byte(data),
	}
}

var sink interface{}

func BenchmarkNewObject(b *testing.B) {
	var o *object
	for i := 0; i < b.N; i++ {
		o = newObject("foo", "123")
	}
	sink = o
}
name         old time/op    new time/op    delta
NewObject-8    96.6ns ± 2%    80.0ns ± 2%  -17.20%  (p=0.000 n=10+10)

name         old alloc/op   new alloc/op   delta
NewObject-8     56.0B ± 0%     51.0B ± 0%   -8.93%  (p=0.000 n=10+10)

Any use of Isconst inside compiler frontend can be a potential use case for getConstValue, but one should
not go too far (it can change compile-time semantics, like reporting out-of-bound access to an array through a variable that is effectively constant, but the spec does not permit that to happen and other spots are something that SSA backend can optimize on its own).

compilebench results

Effect on compilation time is negligible:

name       old time/op       new time/op       delta
Template         353ms ± 1%        354ms ± 1%    ~     (p=0.497 n=9+10)
Unicode          143ms ± 2%        142ms ± 1%    ~     (p=0.050 n=9+9)
GoTypes          1.40s ± 2%        1.40s ± 1%    ~     (p=0.780 n=10+9)
Compiler         6.71s ± 2%        6.69s ± 0%    ~     (p=0.370 n=9+8)
SSA              18.0s ± 1%        17.8s ± 1%  -1.00%  (p=0.004 n=10+10)
Flate            233ms ± 1%        232ms ± 1%    ~     (p=0.123 n=10+10)
GoParser         287ms ± 1%        288ms ± 1%    ~     (p=0.400 n=9+10)
Reflect          792ms ± 2%        787ms ± 0%    ~     (p=0.447 n=10+9)
Tar              324ms ± 0%        326ms ± 1%  +0.49%  (p=0.021 n=8+9)
XML              462ms ± 2%        459ms ± 1%    ~     (p=0.315 n=10+9)
StdCmd           31.8s ± 1%        31.8s ± 1%    ~     (p=0.739 n=10+10)

With changes from initial CL (short string cmp + isSimpleSliceMake) there is a slight improvement in binary sizes
due to better optimization opportunities:

name       old text-bytes    new text-bytes    delta
HelloSize        753kB ± 0%        753kB ± 0%  -0.02%  (p=0.000 n=10+10)
CmdGoSize       10.2MB ± 0%       10.2MB ± 0%  -0.19%  (p=0.000 n=10+10)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.09MB ± 0%       1.09MB ± 0%    ~     (all equal)
CmdGoSize       14.1MB ± 0%       14.1MB ± 0%  -0.17%  (p=0.000 n=10+10)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Triage Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions