Skip to content

Query endpoint #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 3, 2018
Merged

Query endpoint #8

merged 3 commits into from
May 3, 2018

Conversation

carlosms
Copy link
Contributor

@carlosms carlosms commented Apr 25, 2018

Part of #2. Adds a new endpoint, POST /query. It forwards SQL queries to gitbase.

A few notes:

I've used the master branch as the constraint of github.com/go-sql-driver/mysql because the latest release, 1.3, does not support ColumnType.DatabaseTypeName. See go-sql-driver/mysql#667

In the design we talked about adding extra information, such as the elapsed time, to the meta field. In this PR meta only contains the table headers. I think it's ok to add extra pieces of information later as we find the need for them.

I've left a couple of TODO comments with minor things, I think we can take care of that later on. -> Solved with the review suggestions

.

The requests and results look like this:

{
  "query": "SELECT name, hash, is_remote(name) AS is_remote FROM refs",
  "limit": 20
}
{
    "status": 200,
    "data": [
        {
            "hash": "66fd81178abfa342f873df5ab66639cca43f5104",
            "is_remote": false,
            "name": "HEAD"
        },
        {
            "hash": "66fd81178abfa342f873df5ab66639cca43f5104",
            "is_remote": false,
            "name": "refs/heads/master"
        },
        {
            "hash": "66fd81178abfa342f873df5ab66639cca43f5104",
            "is_remote": true,
            "name": "refs/remotes/origin/master"
        }
    ],
    "meta": {
        "headers": [
            "name",
            "hash",
            "is_remote"
        ]
    }
}

A failure:

{
	"query": "SELECT * FROM somewhere",
	"limit": 20
}
{
    "status": 400,
    "errors": [
        {
            "status": 400,
            "title": "Error 1105: unknown error: table not found: somewhere"
        }
    ]
}

For a query with uast, the protobuf response is unmarshalled and the json is returned:

{
	"query": "SELECT hash, content, uast(blobs.content, 'go') FROM blobs WHERE hash='fd30cea52792da5ece9156eea4022bdd87565633'",
	"limit": 20
}
{
    "status": 200,
    "data": [
        {
            "content": "package server\n\nimport (\n\t\"net/http\"\n\n\t\"github.com/src-d/gitbase-playground/server/handler\"\n\n\t\"github.com/go-chi/chi\"\n\t\"github.com/go-chi/chi/middleware\"\n\t\"github.com/pressly/lg\"\n\t\"github.com/rs/cors\"\n\t\"github.com/sirupsen/logrus\"\n)\n\n// Router returns a Handler to serve the backend\nfunc Router(\n\tlogger *logrus.Logger,\n\tstatic *handler.Static,\n\tversion string,\n) http.Handler {\n\n\t// cors options\n\tcorsOptions := cors.Options{\n\t\tAllowedOrigins:   []string{\"*\"},\n\t\tAllowedMethods:   []string{\"GET\", \"POST\", \"PUT\", \"OPTIONS\"},\n\t\tAllowedHeaders:   []string{\"Location\", \"Authorization\", \"Content-Type\"},\n\t\tAllowCredentials: true,\n\t}\n\n\tr := chi.NewRouter()\n\n\tr.Use(middleware.Recoverer)\n\tr.Use(cors.New(corsOptions).Handler)\n\tr.Use(lg.RequestLogger(logger))\n\n\tr.Get(\"/version\", handler.APIHandlerFunc(handler.Version(version)))\n\n\tr.Get(\"/static/*\", static.ServeHTTP)\n\tr.Get(\"/*\", static.ServeHTTP)\n\n\treturn r\n}\n",
            "hash": "fd30cea52792da5ece9156eea4022bdd87565633",
            "uast(blobs.content, \"go\")": [
                {
                    "InternalType": "File",
                    "Properties": {
                        "Package": "1"
                    },
                    "Children": [
                        {
                            "InternalType": "Ident",
                            "Properties": {
                                "internalRole": "Name"
                            },
                            "Token": "server",
                            "StartPosition": {
                                "Offset": 9,
                                "Line": 1,
                                "Col": 10
                            },
                            "Roles": [
                                18,
                                1
                            ]
                        },
                        
                        [...]

                }
            ]
        }
    ],
    "meta": {
        "headers": [
            "hash",
            "content",
            "uast(blobs.content, \"go\")"
        ]
    }
}

@carlosms carlosms self-assigned this Apr 25, 2018
@carlosms carlosms requested review from smacker, dpordomingo and bzz April 25, 2018 15:39
return func(r *http.Request) (*serializer.Response, error) {
var queryRequest queryRequest
body, err := ioutil.ReadAll(r.Body)
defer r.Body.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/pkg/net/http/#Request

The Server will close the request body. The ServeHTTP Handler does not need to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query := fmt.Sprintf("%s LIMIT %d", queryRequest.Query, queryRequest.Limit)
rows, err := db.Query(query)
if err != nil {
return nil, serializer.NewHTTPError(http.StatusBadRequest, err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as TODO. Mysql returns MySQLError with 2 fields number&message. Would be nice to map them into our json.
https://github.com/go-sql-driver/mysql/blob/master/errors.go#L58

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done in e6deab2

return nil, err
}

colData := make(map[string]string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colData := make(map[string]string, nColumns)


columnNames, _ := rows.Columns()
nColumns := len(columnNames)
columnValsPtr := genericVals(nColumns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it work for integers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. I see you changed it in next commit.

func genericVals(colTypes []*sql.ColumnType) []interface{} {
columnValsPtr := make([]interface{}, len(colTypes))

for i, colType := range colTypes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we cover all types correctly? For example I don't see double or float.
https://github.com/go-sql-driver/mysql/blob/master/fields.go#L16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I just added the types I encountered them in gitbase replies, but it's better to make it complete since the beginning. Done in e6deab2

colData[columnNames[i]] = sqlVal.String
}
case *[]byte:
// TODO (carlosms) this may not be an array always
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what else can it be? In engine there is always array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that gitbase may at some point add something new that could return a JSON object. For instance, a new function 'first_uast_as_json()'

return nil, err
}

nodes := make([]*uast.Node, len(protobufs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dashboard replaces Role ID by Role name. Most probably playground should be compatible with it:
https://github.com/bblfsh/dashboard/blob/master/server/server.go#L217

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened an issue so we don't forget about it when we start working on the integration with the uast frontend component:
#9


// TODO (carlosms) this only works if the query does not end in ;
// and does not have a limit. It will also fail for queries like
// DESCRIBE TABLE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gitbase doesn't support anything besides SELECT, SHOW, DESCRIBE.
Add limit makes sense only to SELECT queries. Ending ; also very easy to fix:

query = strings.TrimRight(strings.TrimSpace(query), ";")

not perfect but better than nothing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Implemented the todo with your suggestion in e6deab2

Copy link
Contributor

@dpordomingo dpordomingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!
LGTM

I'd modify some things, but nothing that could not be done later after merging.
(the issue that concerns me more, is the huge handler.Query method, too large imho; I'd split and extract DB logic to a new package as we did in CAT)

// TODO (carlosms) this only works if the query does not end in ;
// and does not have a limit. It will also fail for queries like
// DESCRIBE TABLE
query := fmt.Sprintf("%s LIMIT %d", queryRequest.Query, queryRequest.Limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this to a service that would take care of the query sent by the user; in the future it could do validations, return readable errors...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the functionality is so basic, for now I've moved it to a new function in e6deab2

// TODO (carlosms) this may not be an array always
var protobufs [][]byte
if err := json.Unmarshal(*val.(*[]byte), &protobufs); err != nil {
return nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return a 500 if one UAST cannot be read? I'd return the rows, and warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually found a problem with this. gitbase can also return a JSON column with an array of strings, for example in commits.parents

With the changes in e6deab2 if the unmarshal fails the data will be returned as a string.

return nil, err
}

return serializer.NewQueryResponse(tableData, columnNames), nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't be useful to send the type too? That way it won't be needed to check it in the frontend too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good suggestion. Done in e6deab2

if err == nil {
colData[columnNames[i]] = nodes
} else {
var strings []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why strings? Maybe []interface{} is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it thinking in the commits.parents column, which is an array of strings. But it actually doesn't even have to be an array, using interface{} will work for any JSON array or object we receive. Changed in 28fc331.

func addLimit(query string, limit int) string {
upper := strings.ToUpper(query)
if strings.Contains(upper, "DESCRIBE") || strings.Contains(upper, "SHOW") {
return query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider to rewrite it like this:

func addLimit(query string, limit int) string {
  query = strings.TrimRight(strings.TrimSpace(query), ";")
  if strings.HasPrefix(strings.ToUpper(query), "SELECT") {
    return fmt.Sprintf("%s LIMIT %d", query, limit)
  }
  return query

it covers more cases and imo easier to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, changed in 28fc331

Copy link
Contributor

@smacker smacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks good!

The only thing I would add tests.

@carlosms
Copy link
Contributor Author

@bzz, a comment for your consideration when adding your review.
We discussed via chat and we agree that since we are still at this very early dev stage we can do tests in a future PR.

@bzz
Copy link
Contributor

bzz commented Apr 27, 2018

@carlosms great job!

  1. Squash
    As it seems to be almost ready, do you think at this point it would be reasonable to ask to squash commits before a final pass?
    Asking, as +37,039 −11 in 126 files looks a bit harsh and all previous feedback seems to be already 👍 so may be 2 commits would do: 1 for the code added for this feature and 1 for noisy deps update?

  2. curl
    One more thing, on Query endpoint #8 (comment) - you did a great job in PR description with examples of intput/output of the query, but may I ask you to add exact curl commands to reproduce those? It becomes especially valuable, as we do not have tests yet.
    I would assume something like

    curl -X POST http://localhost:8080/query \
      -H "Content-Type: application/json" \
      -d '{"query":"...", "limit":"..."}'
    

    would do the trick, but imagine how valuable it will be to copy it from somewhere, if one needs to prepare for a sudden demo?

In my experience, it's valuable to have PRs as self-contained as possible - including code, at leqast some simple docs and at least some simple tests (completeness is not the goal at this point).

  1. Docs
    It's totally ok to add tests later if you wish to do so, but what I would also suggest to include here, is at least a simplest bits of documentation in here i.e most easy way I can see now is to create a /docs/rest-api.md with a single section on Query, and a paragraph that includes those few CURL commands.

// Otherwise gitbase will be asked for the max_allowed_packet column and the
// query will fail.
// The next release should make this parameter optional for us:
// https://github.com/go-sql-driver/mysql/pull/680
type appConfig struct {
Env string `envconfig:"ENV" default:"production"`
Host string `envconfig:"HOST" default:"0.0.0.0"`
Copy link
Contributor

@bzz bzz Apr 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed that somehow, not very related to the change, but while we are here - did not we have issues with this before, being not a valid IP on some OSes and decided that it safer to have default as localhost or 127.0.0.1 instead?

Remember discussing it with @smacker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not 0.0.0.0 the right one?
That's how it was configured in CAT, because that way the container is listening "everywhere"
https://github.com/src-d/code-annotation/pull/153/files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.0.0.0 is a correct one. We use this variable for the server. And server should listen on all IPs.

}

if err != nil {
return nil, err
Copy link
Contributor

@bzz bzz Apr 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on user input validation on API level - what is the expected behaviour of playground on /query endpoint, in case "limit":"20" is specified, in payload, instead of "limit":20 ?

As an API user, I would expect it to result in

{
    "status": 4XX,
    "errors": [
        {
            "status": 4XX,
            "title": "Error description"
        }
    ]
}

but I belive current implementation might result in 500 without any payload in response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved in 99af6d0. Now limit is optional, and if the json unmarshall fails the reply will contain:

{
    "status": 400,
    "errors": [
        {
            "status": 400,
            "title": "Bad Request. Expected body: { \"query\": \"SQL statement\", \"limit\": 1234 }"
        }
    ]
}

@carlosms
Copy link
Contributor Author

@bzz:

As it seems to be almost ready, do you think at this point it would be reasonable to ask to squash commits before a final pass?

I was planning to squash before the merge, but left the small commits so you could follow the PR discussions easier.

I've squashed the previous commits and added 2 small new ones. Please take another look and I'll squash again before merging.

create a /docs/rest-api.md with a single section on Query, and a paragraph that includes those few CURL commands.

Added in 7dc93b3.

@bzz
Copy link
Contributor

bzz commented Apr 27, 2018

Awesome, thank you very much for taking care @carlosms !

Sorry, I can not test it locally right now due to src-d/gitbase#255 but I wander, what what would be the result on the user input with multiple queries like

$curl -d '{"query":"SELECT name, hash, is_remote(name) AS is_remote FROM refs; SELECT hash, content FROM blobs;", "limit":20}' -H "Content-Type: application/json" -X POST http://localhost:8080/query

?

As a client of API, I would expect the limit argument to be added to each of the queries.

}

// addLimit adds LIMIT to the query, performing basic tests to skip it
// for DESCRIBE TABLE, SHOW TABLES, and avoid '; limit'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it would be enough to say that it's only applied to SELECT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed in fe5811f

@carlosms
Copy link
Contributor Author

@bzz

what what would be the result on the user input with multiple queries like

$curl -d '{"query":"SELECT name, hash, is_remote(name) AS is_remote FROM refs; SELECT hash, content FROM blobs;", "limit":20}' -H "Content-Type: application/json" -X POST http://localhost:8080/query

This:

{
    "status": 400,
    "errors": [
        {
            "status": 400,
            "title": "unknown error: syntax error at position 66 near 'select'",
            "mysqlCode": 1105
        }
    ]
}

And any other thing would be strange in my opinion.

@bzz
Copy link
Contributor

bzz commented Apr 27, 2018

I'm a bit confused then, how do you propose handling the case of user input with multiple queries?

Having input from the user in form of like

SELECT ... ;
SELECT ... ;

is a requirement gitbase playground should handle.

Right now, we discussed and decided, that frontend side of the gitbase does not make any modifications/parsing/validation of the user input and just passes it though to the gitbase.

We also discussed that backend side does query re-writing, applying the limit clause, before passing it to gitbase server.

Please, help me understand how those requirements are going to be satisfied with current implementation.

@smacker
Copy link
Contributor

smacker commented Apr 27, 2018

@bzz sorry, I don't remember such requirement in the design document. I also don't understand how it would work. Should it open 2 tabs? We didn't discuss it at all.

@carlosms
Copy link
Contributor Author

I also don't remember any mention to multiple queries 🤔

@bzz
Copy link
Contributor

bzz commented Apr 27, 2018

@carlosms @dpordomingo not sure, what do you guys suggest on how to proceed here, but glad that we stumbled up on this. From my side, I'm totally open for discussing/updating design doc next week on details of "multiple query user input" feature, potentially handing it in subsequent PRs and merging this as is now.

Please let me know!

Copy link
Contributor

@bzz bzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, given some answer to the discussion of multiple queries feature above

@dpordomingo
Copy link
Contributor

Is gitbase prepared to accept two queries, or does it return an error?
From @carlosms message it seems that it returns a syntax error, so if playground should not parse nor pre validate the user input, three current behavior seems to follow the agreed expectations, doesn't it?

@bzz
Copy link
Contributor

bzz commented May 3, 2018

Is gitbase prepared to accept two queries, or does it return an error?

sorry, was not able to test before src-d/gitbase#255 but may be @carlosms could clarify?

.. behavior seems to follow the agreed expectations, doesn't it?

I think it does that way

@carlosms
Copy link
Contributor Author

carlosms commented May 3, 2018

Is gitbase prepared to accept two queries, or does it return an error?

sorry, was not able to test before src-d/gitbase#255 but may be @carlosms could clarify?

It will result in error. If we need to support multiple queries separated by ; we would need to separate the strings either in the backend or the frontend.

And I still think this would not be the expected behaviour for most people. One query -> one tab; one request -> one response; anything else would be confusing.

@bzz
Copy link
Contributor

bzz commented May 3, 2018

Is gitbase prepared to accept two queries, or does it return an error?

It will result in error

👍 this is a strong argument for merging this work asap and iterate lated on, based on product feedback (as noted by @dpordomingo it follows the conventions we've agreed up on).

carlosms added 3 commits May 3, 2018 17:55
Latest release for mysql driver, 1.3, does not support
ColumnType.DatabaseTypeName. See
go-sql-driver/mysql#667

Signed-off-by: Carlos Martín <[email protected]>
Signed-off-by: Carlos Martín <[email protected]>
@carlosms carlosms merged commit b92ddd4 into src-d:master May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants