Skip to content

Fix failing network partition test #4118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 25, 2025

Conversation

atakavci
Copy link
Contributor

@atakavci atakavci commented Mar 21, 2025

in the scenario test environment (the only place where this test works as of today), we have identified a critical timing-related issue with the impact of failure injection. Failure injection is initiated by the test itself, but there are no mechanism to:

  • confirm the exact moment when the failure injection takes effect
  • validate the completion of propagation of the injected failure

due to a bunch of reasons, failure injection might take longer than seconds, while our tests proceed relatively fast. As a result, tests fail prematurely—before the failure injection even takes affect.

To address this issue, we will try to utilize the injection API's built-in check mechanism. This will help us keep asserting and allow a sufficient time to receive the impact.
New implementation (using the injection API's isCompleted ) will keep trying during the given time window after the propogation is done and status confirmed from the calling side.

@atakavci atakavci requested review from tishun and ggivo March 21, 2025 14:43
@atakavci atakavci self-assigned this Mar 21, 2025
assertEquals("value", jedis.get(key));
jedis.del(key);
while (!actionResponse.isCompleted(ONE_HUNDRED_MILLISECONDS, TWO_SECONDS, FIVE_SECONDS)) {
for (int i = 0; i < 50; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using a blocking command like bloop. Should be more predictable than relaying on multiple get/set

  • issue blpop (guarantee that we have active connection)
  • trigger network failure (this should drop the connection )
  • wait for action to complete and validate there was a connection exception

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH these are really two different test scenarios:

  • what happens when there is a network failure during a blocking operation execution
  • what happens when there is a network failure during executing multiple short-lived operations

In a perfect world we should be testing both, but in this case I would go for either.

Copy link
Collaborator

@ggivo ggivo Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think we should test both,
Looking at the test, we are trying to simulate network failure while the connection is being actively used and check for exceptions. Blocking operation should keep the connection opened till the network issue is simulated. My idea is to hopefully reduce the flakiness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking operations removes timeouts so in case of nothing shows up on the line, test would end up hanging. Other point around test content, as @ggivo suggested, test case only one target which is to check the behavior in case of network failure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atakavci
What about adding a timeout to the test itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be more predictable than relaying on multiple get/set ...

i am not strongly against it but its not clear to me what would improve between;

  • multiple calls where we expect any one to fail in n seconds
  • single call we expect to fail in n seconds

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought some flakiness might come from not throwing an exception between the calls if the server is reconnected too fast, but it seems not very likely.
Agree, let's keep it as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, than i failed to make the PR stating the issue clear enough. Let me try to improve the description.

@atakavci atakavci merged commit f8e1be3 into redis:master Mar 25, 2025
15 of 17 checks passed
svc-squareup-copybara pushed a commit to cashapp/misk that referenced this pull request May 1, 2025
| Package | Type | Package file | Manager | Update | Change |
|---|---|---|---|---|---|
| [redis.clients:jedis](https://github.com/redis/jedis) | dependencies |
misk/gradle/libs.versions.toml | gradle | major | `5.2.0` -> `6.0.0` |

---

### Release Notes

<details>
<summary>redis/jedis (redis.clients:jedis)</summary>

### [`v6.0.0`](https://github.com/redis/jedis/releases/tag/v6.0.0):
6.0.0

#### Redis 8.0 support

Starting from version 8.0, Redis supports new data structures and
capabilities such as JSON, Search & Query, and TimeSeries by default.
This release improves Jedis compatibility with Redis 8.0.

##### Upgrading from previous releases

##### Search

This release introduces a client-side default dialect for Redis’ search
and query capabilities. By default, the client now overrides the
server-side dialect with version 2, automatically appending `DIALECT 2`
to commands like **FT.AGGREGATE** and **FT.SEARCH**.

**Important**: Be aware that the query dialect may impact the results
returned. If needed, you can revert to a different dialect version by
configuring the client accordingly.

```java
            UnifiedJedis jedis = new UnifiedJedis("redis://localhost:6379");
            jedis.setDefaultSearchDialect(1);  // DIALECT 1
```

You can find further details in the [query dialect
documentation](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/dialects/).

##### Discontinued features

This release also **removes** support for both **RedisGraph** and
**Triggers & Functions**(aka RedisGears v2).

#### 🔥 Breaking Changes

- Make default client side search dialect to 2
([#&#8203;4060](redis/jedis#4060))
- Remove Graph module support
([#&#8203;4073](redis/jedis#4073))
- Change FT.PROFILE to return generic object
([#&#8203;4067](redis/jedis#4067))
- Remove Triggers and Functions feature
([#&#8203;3969](redis/jedis#3969))
- COMMAND INFO reply contains subcommand detail
([#&#8203;4022](redis/jedis#4022))

#### 🧪 Experimental Features

- Support warning messages in search/aggregation query results
([#&#8203;3958](redis/jedis#3958))
- Add SslOptions
([#&#8203;3980](redis/jedis#3980))

#### 🚀 New Features

- Add tests for vector search INT8/UINT8 types
([#&#8203;4091](redis/jedis#4091))
- Support for new HFE API, hgetdel hgetex hsetex commands
([#&#8203;4095](redis/jedis#4095))
- Propagate cause for "Cluster retry deadline exceeded" exception
([#&#8203;4103](redis/jedis#4103))
- Support INFO command in UnifiedJedis (simplified)
([#&#8203;4079](redis/jedis#4079))
- \[code cleanup] Jedis client to implement CommandCommands interface
([#&#8203;4077](redis/jedis#4077))
- Extend EXECABORT with "previous errors"
[#&#8203;4084](redis/jedis#4084)
([#&#8203;4090](redis/jedis#4090))
- Add SslOptions
([#&#8203;3980](redis/jedis#3980))
- Token based authentication integration with core extension
([#&#8203;4011](redis/jedis#4011))
- Implement command (no arg)
([#&#8203;4026](redis/jedis#4026))

#### 🐛 Bug Fixes

- Fix for bug
[#&#8203;4003](redis/jedis#4003). Better
message instead of ArrayIndexOutOfBoundsExce
([#&#8203;4109](redis/jedis#4109))
- Fix pubsub when cache enabled
([#&#8203;4086](redis/jedis#4086))
- Bump org.apache.commons:commons-pool2 from 2.12.0 to 2.12.1
([#&#8203;4080](redis/jedis#4080))
- COMMAND INFO reply contains subcommand detail
([#&#8203;4022](redis/jedis#4022))

#### 🧰 Maintenance

- Bump test infra to 8.0-RC2
([#&#8203;4155](redis/jedis#4155))
- DOC-5110 added hash search examples
([#&#8203;4151](redis/jedis#4151))
- Bump org.apache.maven.plugins:maven-surefire-plugin from 3.5.2 to
3.5.3 ([#&#8203;4136](redis/jedis#4136))
- Bump org.jacoco:jacoco-maven-plugin from 0.8.12 to 0.8.13
([#&#8203;4137](redis/jedis#4137))
- Speed up cluster tests
([#&#8203;4150](redis/jedis#4150))
- Bump org.apache.httpcomponents.client5:httpclient5-fluent from 5.4.2
to 5.4.4 ([#&#8203;4153](redis/jedis#4153))
- Fix for connectionAuthWithExpiredTokenTest
([#&#8203;4142](redis/jedis#4142))
- Migrate test to JUnit5
([#&#8203;4139](redis/jedis#4139))
- Document pgp keys
([#&#8203;4125](redis/jedis#4125))
- Bump jackson.version from 2.18.2 to 2.18.3
([#&#8203;4106](redis/jedis#4106))
- Add tests for setGet with Parameters
([#&#8203;4127](redis/jedis#4127))
- Fix failing network partition test
([#&#8203;4118](redis/jedis#4118))
- Test support for DefaultAzureCredential
([#&#8203;4113](redis/jedis#4113))
- Update redis server test versions
([#&#8203;4114](redis/jedis#4114))
- Update stale issue workflow
([#&#8203;4101](redis/jedis#4101))
- Bump net.revelc.code.formatter:formatter-maven-plugin from 2.11.0 to
2.16.0 ([#&#8203;4098](redis/jedis#4098))
- Basic documention for TBA support with some examples
([#&#8203;4102](redis/jedis#4102))
- Bump org.apache.maven.plugins:maven-compiler-plugin from 3.13.0 to
3.14.0 ([#&#8203;4097](redis/jedis#4097))
- Bump org.awaitility:awaitility from 4.2.2 to 4.3.0
([#&#8203;4099](redis/jedis#4099))
- Enforce code style format
([#&#8203;4087](redis/jedis#4087))
- Update redisjson.md
([#&#8203;4083](redis/jedis#4083))
- Bump org.json:json from
[`2024030`](redis/jedis@20240303) to
[`2025010`](redis/jedis@20250107)
([#&#8203;4049](redis/jedis#4049))
- Bump com.google.code.gson:gson from 2.11.0 to 2.12.1
([#&#8203;4082](redis/jedis#4082))
- Bump org.apache.httpcomponents.client5:httpclient5-fluent from 5.4.1
to 5.4.2 ([#&#8203;4081](redis/jedis#4081))
- Bump org.apache.commons:commons-pool2 from 2.12.0 to 2.12.1
([#&#8203;4080](redis/jedis#4080))
- Fix the Java example code for Lists using RPUSH
([#&#8203;4074](redis/jedis#4074))
- Use v4 of few GitHub actions workflow artifacts
([#&#8203;4075](redis/jedis#4075))
- Change FT.PROFILE to return generic object
([#&#8203;4067](redis/jedis#4067))
- Remove SearchConfigTest
([#&#8203;4072](redis/jedis#4072))
- Test modules CONFIG support
([#&#8203;4043](redis/jedis#4043))
- Test modules ACL support
([#&#8203;4042](redis/jedis#4042))
- Test with 8.0-M04-pre
([#&#8203;4069](redis/jedis#4069))
- Fix TBA cluster integration tests
([#&#8203;4068](redis/jedis#4068))
- DOC-4445 server management command examples
([#&#8203;4056](redis/jedis#4056))
- Update actions/checkout, actions/setup-java and codecov/codecov-action
([#&#8203;4066](redis/jedis#4066))
- DOC-4732 added geo index examples
([#&#8203;4059](redis/jedis#4059))
- DOC-4440 added auth command examples using Jedis class
([#&#8203;4058](redis/jedis#4058))
- Revert failing GitHub artifacts for Publish Docs
([#&#8203;4065](redis/jedis#4065))
- Use v3 of GitHub deploy-pages for Publish Docs
([#&#8203;4064](redis/jedis#4064))
- Use v3 of GitHub upload-pages-artifact for Publish Docs
([#&#8203;4063](redis/jedis#4063))
- Upgrade GitHub Python artifact for Publish Docs
([#&#8203;4062](redis/jedis#4062))
- Use v4 of upload artifact
([#&#8203;4061](redis/jedis#4061))
- DOC-4475 examples for llen, lpop, lpush, lrange, rpop, and rpush
([#&#8203;4054](redis/jedis#4054))
- DOC-4495 sadd and smembers examples
([#&#8203;4052](redis/jedis#4052))
- Fix sporadic test failing with OOM
([#&#8203;4053](redis/jedis#4053))
- Introduces test matrix based on Redis versions \[8.0-M1, 7.4.1, 7.2.6,
6.2.16] ([#&#8203;4015](redis/jedis#4015))
- Remove List tests asserting timeouts
([#&#8203;4051](redis/jedis#4051))
- DOC-4450 added hgetall and hvals command examples
([#&#8203;4050](redis/jedis#4050))
- Minor fix with Token-Based-Authentication integration tests
([#&#8203;4044](redis/jedis#4044))
- Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.11.1 to
3.11.2 ([#&#8203;4039](redis/jedis#4039))
- DOC-4560 pipe/transaction examples for docs
([#&#8203;4038](redis/jedis#4038))
- Bump jackson.version from 2.18.1 to 2.18.2
([#&#8203;4034](redis/jedis#4034))
- Make reply of COMMAND INFO compatible with older Redis versions
([#&#8203;4031](redis/jedis#4031))
- Make reply of ACL LOG compatible with older Redis versions
([#&#8203;4030](redis/jedis#4030))
- Add examples and tutorials page
([#&#8203;4024](redis/jedis#4024))
- Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.10.1 to
3.11.1 ([#&#8203;4007](redis/jedis#4007))
- Bump org.apache.maven.plugins:maven-surefire-plugin from 3.5.1 to
3.5.2 ([#&#8203;4008](redis/jedis#4008))
- DOC-4345 added JSON search examples for home page
([#&#8203;4010](redis/jedis#4010))
- Bump org.apache.httpcomponents.client5:httpclient5-fluent from 5.4 to
5.4.1 ([#&#8203;4009](redis/jedis#4009))
- Bump jackson.version from 2.18.0 to 2.18.1
([#&#8203;4006](redis/jedis#4006))
- Mkdocs unify docs
([#&#8203;3999](redis/jedis#3999))
- Update links in README
([#&#8203;3974](redis/jedis#3974))
- Codecove has released beta version of Test Analytics feature
([#&#8203;3996](redis/jedis#3996))
- Fix flaky tests with 'await'
([#&#8203;3972](redis/jedis#3972))
- Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.10.0 to
3.10.1 ([#&#8203;3994](redis/jedis#3994))
- Add javadoc to clear up implicit behavior
([#&#8203;3991](redis/jedis#3991))
- Fix JavaDoc warnings
([#&#8203;3990](redis/jedis#3990))
- Bump org.apache.maven.plugins:maven-surefire-plugin from 3.5.0 to
3.5.1 ([#&#8203;3989](redis/jedis#3989))
- Bump org.apache.maven.plugins:maven-gpg-plugin from 3.2.6 to 3.2.7
([#&#8203;3976](redis/jedis#3976))
- Bump com.kohlschutter.junixsocket:junixsocket-core from 2.10.0 to
2.10.1 ([#&#8203;3978](redis/jedis#3978))
- Bump jackson.version from 2.17.2 to 2.18.0
([#&#8203;3977](redis/jedis#3977))
- DOC-4317 fixed flaky tests
([#&#8203;3984](redis/jedis#3984))
- Jedis 5.2.0 is released; bump snapshot version to 5.3.0
([#&#8203;3975](redis/jedis#3975))

#### Contributors

We'd like to thank all the contributors who worked on this release!

[@&#8203;andy-stark-redis](https://github.com/andy-stark-redis),
[@&#8203;atakavci](https://github.com/atakavci),
[@&#8203;ggivo](https://github.com/ggivo),
[@&#8203;joshrotenberg](https://github.com/joshrotenberg),
[@&#8203;ozennou](https://github.com/ozennou),
[@&#8203;sanaulla123](https://github.com/sanaulla123),
[@&#8203;sazzad16](https://github.com/sazzad16),
[@&#8203;smadasu](https://github.com/smadasu),
[@&#8203;thachlp](https://github.com/thachlp),
[@&#8203;tishun](https://github.com/tishun) and
[@&#8203;uglide](https://github.com/uglide)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "after 6pm every weekday,before 2am
every weekday" in timezone Australia/Melbourne, Automerge - At any time
(no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Never, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://github.com/renovatebot/renovate).

GitOrigin-RevId: 12d0d485257f42dab8fcbd650e93cf770225dac5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants