Skip to content

Start function of manager object not always returns error if it happens. #2873

Closed
@Alexseij

Description

@Alexseij

In my case error happened during initialization l object:

l, err := leaderelection.NewLeaderElector(leaderelection.LeaderElectionConfig{
Lock: cm.resourceLock,
LeaseDuration: cm.leaseDuration,
RenewDeadline: cm.renewDeadline,
RetryPeriod: cm.retryPeriod,
Callbacks: leaderelection.LeaderCallbacks{
OnStartedLeading: func(_ context.Context) {
if err := cm.startLeaderElectionRunnables(); err != nil {
cm.errChan <- err
return
}
close(cm.elected)
},
OnStoppedLeading: func() {
if cm.onStoppedLeading != nil {
cm.onStoppedLeading()
}
// Make sure graceful shutdown is skipped if we lost the leader lock without
// intending to.
cm.gracefulShutdownTimeout = time.Duration(0)
// Most implementations of leader election log.Fatal() here.
// Since Start is wrapped in log.Fatal when called, we can just return
// an error here which will cause the program to exit.
cm.errChan <- errors.New("leader election lost")
},
},
ReleaseOnCancel: cm.leaderElectionReleaseOnCancel,
Name: cm.leaderElectionID,
})
if err != nil {
return err
}

Error contains this message:
leaseDuration must be greater than renewDeadline

After that error, function of manager object Start dont return it.

Because this code call here by function startLeaderElection:

// Start the leader election and all required runnables.
{
ctx, cancel := context.WithCancel(context.Background())
cm.leaderElectionCancel = cancel
go func() {
if cm.resourceLock != nil {
if err := cm.startLeaderElection(ctx); err != nil {
cm.errChan <- err
}
} else {
// Treat not having leader election enabled the same as being elected.
if err := cm.startLeaderElectionRunnables(); err != nil {
cm.errChan <- err
}
close(cm.elected)
}
}()
}

After this defers:

stopComplete := make(chan struct{})
defer close(stopComplete)
// This must be deferred after closing stopComplete, otherwise we deadlock.
defer func() {
// https://hips.hearstapps.com/hmg-prod.s3.amazonaws.com/images/gettyimages-459889618-1533579787.jpg
stopErr := cm.engageStopProcedure(stopComplete)
if stopErr != nil {
if err != nil {
// Utilerrors.Aggregate allows to use errors.Is for all contained errors
// whereas fmt.Errorf allows wrapping at most one error which means the
// other one can not be found anymore.
err = kerrors.NewAggregate([]error{err, stopErr})
} else {
err = stopErr
}
}
}()

And this defers call another defers one of which waiting for value in channel:

defer func() {
// Cancel leader election only after we waited. It will os.Exit() the app for safety.
if cm.resourceLock != nil {
// After asking the context to be cancelled, make sure
// we wait for the leader stopped channel to be closed, otherwise
// we might encounter race conditions between this code
// and the event recorder, which is used within leader election code.
cm.leaderElectionCancel()
<-cm.leaderElectionStopped
}
}()

But this channel is never being close because close function calls only when there is no error when l object initializing

if err != nil {
return err
}
// Start the leader elector process
go func() {
l.Run(ctx)
<-ctx.Done()
close(cm.leaderElectionStopped)
}()
return nil

I fix this issue in my fork repository, if this issue will be accepted I can do pull request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions