Support running on non-empty database #19900

serathius · 2025-05-09T14:51:28Z

What would you like to be added?

This task is a little harder as it requires diving into robustness linearization model, interested contributor beware!

Currently we are validating that database is empty before running tests

etcd/tests/antithesis/test-template/robustness/main.go

Lines 96 to 99 in 4976738

    
           r, err := traffic.CheckEmptyDatabaseAtStart(ctx, lg, hosts, ids, baseTime) 
        
           if err != nil { 
        
           	lg.Fatal("Failed empty database at start check", zap.Error(err)) 
        
           }

.

This is done by reading revision of etcd before any traffic is sent and checking if revision is equal 1

etcd/tests/robustness/traffic/traffic.go

Lines 207 to 227 in 4976738

    
           func CheckEmptyDatabaseAtStart(ctx context.Context, lg *zap.Logger, endpoints []string, ids identity.Provider, baseTime time.Time) (report.ClientReport, error) { 
        
           	c, err := client.NewRecordingClient(endpoints, ids, baseTime) 
        
           	if err != nil { 
        
           		return report.ClientReport{}, err 
        
           	} 
        
           	defer c.Close() 
        
           	for { 
        
           		rCtx, cancel := context.WithTimeout(ctx, RequestTimeout) 
        
           		resp, err := c.Get(rCtx, "key") 
        
           		cancel() 
        
           		if err != nil { 
        
           			lg.Warn("Failed to check if database empty at start, retrying", zap.Error(err)) 
        
           			continue 
        
           		} 
        
           		if resp.Header.Revision != 1 { 
        
           			return report.ClientReport{}, validate.ErrNotEmptyDatabase 
        
           		} 
        
           		break 
        
           	} 
        
           	return c.Report(), nil 
        
           }

. Revision in etcd is a global logical clock for operations that start at 1. This ensures that there were no transactions executed on etcd before so etcd is empty.

This is also checked as first step of validation

etcd/tests/robustness/validate/validate.go

Lines 113 to 127 in 4976738

    
           func validateEmptyDatabaseAtStart(reports []report.ClientReport) error { 
        
           	if len(reports) == 0 { 
        
           		return nil 
        
           	} 
        
           	for _, r := range reports { 
        
           		for _, op := range r.KeyValue { 
        
           			request := op.Input.(model.EtcdRequest) 
        
           			response := op.Output.(model.MaybeEtcdResponse) 
        
           			if response.Revision == 1 && request.IsRead() { 
        
           				return nil 
        
           			} 
        
           		} 
        
           	} 
        
           	return ErrNotEmptyDatabase 
        
           }

. This validation is because this is an implicit assumption of the linearization code, that without this assertion would return hard to discern error.

We can try to improve the linearization to allow non empty code. So why linearization doesn't support non-empty database? Because it needs to a complete view of database, so it can simulate it. Assuming it's empty was just easier to implement.

etcd/tests/robustness/model/deterministic.go

Lines 99 to 108 in 4976738

    
           func freshEtcdState() EtcdState { 
        
           	return EtcdState{ 
        
           		Revision: 1, 
        
           		// Start from CompactRevision equal -1 as etcd allows client to compact revision 0 for some reason. 
        
           		CompactRevision: -1, 
        
           		KeyValues:       map[string]ValueRevision{}, 
        
           		KeyLeases:       map[string]int64{}, 
        
           		Leases:          map[int64]EtcdLease{}, 
        
           	} 
        
           }

To improve it we could just replace the Read of revision 1 at the beginning with operation to download the state of database and change the model to accept this state as initial one.

Steps;

We still need to make a special request at before all other requests. But now instead of checking revision == 1, we just need to read the database state. Changes in

etcd/tests/antithesis/test-template/robustness/main.go

Lines 96 to 99 in 4976738

    
           r, err := traffic.CheckEmptyDatabaseAtStart(ctx, lg, hosts, ids, baseTime) 
        
           if err != nil { 
        
           	lg.Fatal("Failed empty database at start check", zap.Error(err)) 
        
           }

.

In validation we still need to confirm if the special Read request is present. However the conditions are different, we need to make sure it precedes all other requests that reads the whole database contents (or at least the prefix of all keys used in the robustness test). Changes in

etcd/tests/robustness/validate/validate.go

Lines 113 to 127 in 4976738

    
           func validateEmptyDatabaseAtStart(reports []report.ClientReport) error { 
        
           	if len(reports) == 0 { 
        
           		return nil 
        
           	} 
        
           	for _, r := range reports { 
        
           		for _, op := range r.KeyValue { 
        
           			request := op.Input.(model.EtcdRequest) 
        
           			response := op.Output.(model.MaybeEtcdResponse) 
        
           			if response.Revision == 1 && request.IsRead() { 
        
           				return nil 
        
           			} 
        
           		} 
        
           	} 
        
           	return ErrNotEmptyDatabase 
        
           }

.

We would want to add a way in EtcdState to represent a state that etcd is not initialized, during which the Step method will only allow a read requests. We could hardcode the same request as in step 2.

Why is this needed?

Allow multiple runs of antithesis locally.
/cc @henrybear327 @nwnt

The text was updated successfully, but these errors were encountered:

nwnt · 2025-05-11T03:11:24Z

I can have a look at this.

serathius · 2025-05-11T07:43:26Z

/assign @nwnt

serathius · 2025-05-19T19:37:23Z

Would recommend to skip this issue for now, it's pretty complicated and the benefits is limited. I would say it's low priority.

nwnt · 2025-05-20T03:34:31Z

Roger that. I'm unassigning this from myself for now then.

nwnt · 2025-05-20T03:34:41Z

/unassign

serathius added type/feature area/robustness-testing labels May 9, 2025

k8s-ci-robot assigned nwnt May 11, 2025

serathius changed the title ~~[Antithesis] Support running on non-empty database~~ Support running on non-empty database May 14, 2025

k8s-ci-robot unassigned nwnt May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support running on non-empty database #19900

Support running on non-empty database #19900

serathius commented May 9, 2025

nwnt commented May 11, 2025

Uh oh!

serathius commented May 11, 2025

Uh oh!

serathius commented May 19, 2025

Uh oh!

nwnt commented May 20, 2025

Uh oh!

nwnt commented May 20, 2025

Uh oh!

Support running on non-empty database #19900

Support running on non-empty database #19900

Comments

serathius commented May 9, 2025

What would you like to be added?

Why is this needed?

nwnt commented May 11, 2025

Uh oh!

serathius commented May 11, 2025

Uh oh!

serathius commented May 19, 2025

Uh oh!

nwnt commented May 20, 2025

Uh oh!

nwnt commented May 20, 2025

Uh oh!