Skip to content

Add basic validation for route TLS configuration - checks that input is "syntactically" valid. #8366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2016

Conversation

ramr
Copy link
Contributor

@ramr ramr commented Apr 6, 2016

Partial Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1312292

@smarterclayton PTAL Thx

The other PR #8353 would be needed in addition to this for a complete fix but this one provides a partial fix to check input is valid PEM format.

# make release  || echo "run make, copy binary and build new docker image"  
oadm router --latest-images  --replicas=0  
oc env dc/router EXTENDED_VALIDATION=true  
oc scale  dc/router --replicas=1  

Updated with testing instructions/usage.

routeapi "github.com/openshift/origin/pkg/route/api"
)

// disableRouteTLSValidation returns whether or not to disable TLS config validation checks.
func disableRouteTLSValidation() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just create a new function and invoke it from the router in the unique_host check (or in an optional pre check step, based on a flag)?

@ramr
Copy link
Contributor Author

ramr commented Apr 6, 2016

[test]

@smarterclayton
Copy link
Contributor

Can we check the cert order by validation? We already do that in other
places in the code.

On Tue, Apr 5, 2016 at 10:19 PM, Ram Ranganathan [email protected]
wrote:

Partial Fix for https://bugzilla.redhat.com/show_bug.cgi?id=1312292

@smarterclayton https://github.com/smarterclayton PTAL Thx

The other PR #8353 #8353 would
be needed in addition to this for a complete fix but this one provides a

partial fix to check input is valid PEM format.

You can view, comment on, or merge this pull request online at:

#8366
Commit Summary

  • Add basic validation for route TLS configuration - checks that input
    is

File Changes

Patch Links:


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8366

@smarterclayton
Copy link
Contributor

We don't have to validate in the api server - for instance, rejecting a route because the cert is malformed from the router side is sufficient. My concern is letting haproxy do it (or having to do it for every single route) if the check is simple enough for us to do.

@smarterclayton
Copy link
Contributor

We can be more strict than what haproxy allows (especially if we give a good error message), as long as we can be sure we're not less strict.

@ramr
Copy link
Contributor Author

ramr commented Apr 6, 2016

Ok - will rework this and get some bits in from the other PR.

@smarterclayton
Copy link
Contributor

Do we think that will be sufficient? Do we know of anything else beyond
certs being invalid and cert order being wrong in the PEM? Do we need to
limit the types of certs?

On Wed, Apr 6, 2016 at 2:01 PM, Ram Ranganathan [email protected]
wrote:

Ok - will rework this and get some bits in from the other PR.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8366 (comment)

@ramr ramr force-pushed the route-tls-validation branch 3 times, most recently from 6f97439 to f60e526 Compare April 7, 2016 06:38
@ramr
Copy link
Contributor Author

ramr commented Apr 7, 2016

Ok @smarterclayton PTAL I pushed some changes. I added a check for cert verification + check key mismatches. Don't think we can do a full blown verification on cacerts or destcacerts (aka only support well known certifying authorities) as I suspect there could well be self signed ones on the edge [my testing] + private CAs on the haproxy-to-pod traffic end (re-encrypt) - i think we should allow for that.

@ramr
Copy link
Contributor Author

ramr commented Apr 7, 2016

[test]

@ramr ramr force-pushed the route-tls-validation branch from f60e526 to e3e7d76 Compare April 7, 2016 09:26
@ramr
Copy link
Contributor Author

ramr commented Apr 7, 2016

[test]

@smarterclayton
Copy link
Contributor

Define full blown verification of cacerts? What problems would openssl
have besides ordering? Does openssl have a check function we could call?
I'm asking for an enumeration of possible failure modes, whether
verification checks them, and unknowns in haproxy. If we don't know that,
we don't know whether the validation approach works.

On Thu, Apr 7, 2016 at 6:25 AM, OpenShift Bot [email protected]
wrote:

continuous-integration/openshift-jenkins/test FAILURE (
https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/2779/)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8366 (comment)

@ramr
Copy link
Contributor Author

ramr commented Apr 7, 2016

So what verification involves is - pem is valid, is a cert, checking the cert is valid (expiry + allowed domains), the certificate chain is valid (which would return an unknown authority for self-signed/generated CAs) and then optionally any extended key usages. We can do that for the edge certificates we have as we have the associated ca cert but not for the cacert/destination cacert themselves. imho, this seems a bit backward to do in golang code anyway when haproxy -c would just verify this for us - which is what the PR #8353 does. We are basically reverse engineering that and we cannot do any future-proofing guarantees anyway as haproxy code could change.

@smarterclayton
Copy link
Contributor

Can we test the certs in isolation rather than have to generate them into
the file? I.e. construct a simple haproxy -c that
verifies only those certs?

On Thu, Apr 7, 2016 at 2:52 PM, Ram Ranganathan [email protected]
wrote:

So what verification involves is - pem is valid, is a cert, checking the
cert is valid (expiry + allowed domains), the certificate chain is valid
(which would return an unknown authority for self-signed/generated CAs) and
then optionally any extended key usages. We can do that for the edge
certificates we have as we have the associated ca cert but not for the
cacert/destination cacert themselves. imho, this seems a bit backward to do
in golang code anyway when haproxy -c would just verify this for us -
which is what the PR #8353 #8353
does. We are basically reverse engineering that and we cannot do any
future-proofing guarantees anyway as haproxy code could change.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8366 (comment)

@ramr
Copy link
Contributor Author

ramr commented Apr 7, 2016

So that was my original intent but then there was a point made regarding testing in isolation vs with the rest of the config as there could be potential of failures when run in "tandem". Which was the reason for doing the validate-config on the generated config (just the new certs/keys/cacerts would be written but all the existing config + the config for the new service alias and its config/certs would be tested).
Edited new certs/keys/cacerts

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 7, 2016 via email

@ramr
Copy link
Contributor Author

ramr commented Apr 7, 2016

Due to overlapping backend configs. Some error cases I can think off could be due to with content switching overlapping frontend/backend modes, duplicate keys (which we catch in the service aliases map), duplicates ids and acls (would have to be a custom config though). Not a complete set by any stretch but just from a more defensive standpoint - checking in tandem vs isolation would address that and possibly be a bit more future-proof as it does catch more error cases.

@smarterclayton
Copy link
Contributor

The first couple you mentioned seem like things we have to catch for almost
all frameworks anyway - things under our control.

My concerns with #8353 (at least, as of now) are as follows:

  • We have to check one at a time. That's going to break any high density
    case on restart.
  • We have to change the structure of our queue to where it isn't layered
    anymore (we're basically doing transactional commits, so each change
    becomes an ordered transaction)
  • Any break that is a user break needs to be communicated back to the user
    somehow eventually

On Thu, Apr 7, 2016 at 4:00 PM, Ram Ranganathan [email protected]
wrote:

Due to overlapping backend configs. Some error cases I can think off could
be due to with content switching overlapping frontend/backend modes,
duplicate keys (which we catch in the service aliases map), duplicates ids
and acls (would have to be a custom config though). Not a complete set by
any stretch but just from a more defensive standpoint - checking in tandem
vs isolation would address that and possibly be a bit more future-proof as
it does catch more error cases.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#8366 (comment)

@ramr
Copy link
Contributor Author

ramr commented Apr 7, 2016

So as re: the concerns on #8353 :

  1. We can be smarter on that. A flag on the route - check if its set, skip otherwise. Set the flag
    if the route is modified (and set on creation).
  2. We don't need a structure change to the queue really because we just apply the transactions in order. They are batched up, yes but the checks are always in order. If something fails a check, its not added to the batch.
  3. That we can do with updating the status on the route (which is similar to a duplicate route hostname).

Edited numbering

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 8, 2016 via email

@ramr
Copy link
Contributor Author

ramr commented Apr 8, 2016

sounds good.

@ramr
Copy link
Contributor Author

ramr commented Apr 8, 2016

[test]

@@ -42,6 +42,7 @@ type TemplatePluginConfig struct {
StatsUsername string
StatsPassword string
IncludeUDP bool
ValidateRouteTLSConfig bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be ExtendedValidation which more stringently tests the route against a set of known good safe configuration.

if o.ExtendedValidation {
nextPlugin = controller.NewExtendedValidator(nextPlugin, statusPlugin)
}
plugin := controller.NewUniqueHost(nextPlugin, o.RouteSelectionFunc(), statusPlugin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast statusPlugin to RejectionRecorder here so we know we're not using its plugin functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@ramr ramr force-pushed the route-tls-validation branch from 1b2902b to 34b0242 Compare April 20, 2016 21:50
@ramr
Copy link
Contributor Author

ramr commented Apr 20, 2016

@liggitt made the fixes as per your recommendations, cleaned up remnants of the old implementation and added more tests. @smarterclayton @knobunc fyi.

@knobunc
Copy link
Contributor

knobunc commented Apr 21, 2016

For Online Beta they are disabling custom hostnames, so that means that we don't need to allow custom certs. We need to work out how ops can disable certs (perhaps by deploying a custom router).

We also need to get this patch landed for GA.

@ramr
Copy link
Contributor Author

ramr commented May 6, 2016

@smarterclayton /bump anything else needed on this?

@smarterclayton
Copy link
Contributor

Regarding sni lookup order, is a crt-list the solution? https://rafpe.ninja/2016/04/24/haproxy-ssl-domains-in-crt-list/

@smarterclayton
Copy link
Contributor

Will do final follow up on Monday.

@liggitt
Copy link
Contributor

liggitt commented May 7, 2016

is a crt-list the solution?

Looks like it. We probably need logic to decide which route's cert to use for a host when multiple routes have the same host (in scenarios where that is allowed)

@smarterclayton
Copy link
Contributor

[test]

1 similar comment
@smarterclayton
Copy link
Contributor

[test]

@smarterclayton
Copy link
Contributor

Test failure is real

@spinolacastro
Copy link
Contributor

Do you believe it will land in time to v1.2.0 ?

@smarterclayton
Copy link
Contributor

No, this is unfortunately too big for 1.2.0. You however will be able to use a 1.3.0.alpha track router image for that.

    input is "syntactically" valid.
  o Checkpoint initial code.
  o Add support for validating route tls config.
  o Add option for validating route tls config.
  o Validation fixes.
  o Check private key + cert mismatches.
  o Add tests.
  o Record route rejection.
  o Hook into add route processing + store invalid service alias configs
    in another place - easy to check prior errors on readmission.
  o Remove entry from invalid service alias configs upon route removal.
  o Add generated completions.
  o Bug fixes.
  o Recording rejecting routes is not working completely.
  o Fix status update problem - we should set the status to admitted
    only if we had no errors handling a route.
  o Rework to use a new controller - extended_validator as per
    @smarterclayton comments.
  o Cleanup validation as per @liggitt comments.
  o Update bash completions.
  o Fixup older validation unit tests.
  o Changes as per @liggitt review comments + cleanup tests.
  o Fix failing test.
@ramr ramr force-pushed the route-tls-validation branch from 34b0242 to cdf242e Compare May 16, 2016 17:25
@ramr
Copy link
Contributor Author

ramr commented May 16, 2016

rebased and fixed tests

@ramr
Copy link
Contributor Author

ramr commented May 16, 2016

[test]

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to cdf242e

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/3852/)

@ramr
Copy link
Contributor Author

ramr commented May 16, 2016

@smarterclayton PTAL - rebased and fixed tests + all tests pass now.

@smarterclayton
Copy link
Contributor

LGTM [merge]

@openshift-bot
Copy link
Contributor

openshift-bot commented May 17, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/5924/) (Image: devenv-rhel7_4216)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to cdf242e

@openshift-bot openshift-bot merged commit 14d77ab into openshift:master May 17, 2016
@ramr ramr deleted the route-tls-validation branch February 3, 2017 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants