Skip to content

unicode: unicode.Is and bytes.Buffer.WriteRune get confused by negative runes #43254

Closed
@davidben

Description

@davidben

What version of Go are you using (go version)?

$ go version
go version go1.15.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/davidben/Library/Caches/go-build"
GOENV="/Users/davidben/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/davidben/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/davidben/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/davidben/boringssl/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/[...]/T/go-build280151090=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

https://play.golang.org/p/9ZkvjGuE1so

What did you expect to see?

unicode.Is and related functions should return false on negative values, as they do for other invalid runes. bytes.Buffer.WriteRune should write a replacement character, as it does for other invalid runes. In particular, a UTF-32 decoder could easily construct a negative rune before checking. (Looks like x/text/encoding/unicode/utf32 does that and then relies on RuneLen noticing.)

What did you see instead?

unicode.Is thinks some negative values are printable, and bytes.Buffer.WriteRune accidentally runs a single-byte fast path.

I wasn't sure at first whether this was a bug, but most functions seem to check for negative values or cast to uint32, so I think these should as well. (If they expect the rune be a real code point, that should probably be documented.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeNeedsFixThe path to resolution is known, but the work has not been done.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions