-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Fix failing network partition test #4118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix failing network partition test #4118
Conversation
assert with multiple expected msg
assertEquals("value", jedis.get(key)); | ||
jedis.del(key); | ||
while (!actionResponse.isCompleted(ONE_HUNDRED_MILLISECONDS, TWO_SECONDS, FIVE_SECONDS)) { | ||
for (int i = 0; i < 50; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using a blocking command like bloop
. Should be more predictable than relaying on multiple get/set
- issue blpop (guarantee that we have active connection)
- trigger network failure (this should drop the connection )
- wait for action to complete and validate there was a connection exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH these are really two different test scenarios:
- what happens when there is a network failure during a blocking operation execution
- what happens when there is a network failure during executing multiple short-lived operations
In a perfect world we should be testing both, but in this case I would go for either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think we should test both,
Looking at the test, we are trying to simulate network failure while the connection is being actively used and check for exceptions. Blocking operation should keep the connection opened till the network issue is simulated. My idea is to hopefully reduce the flakiness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking operations removes timeouts so in case of nothing shows up on the line, test would end up hanging. Other point around test content, as @ggivo suggested, test case only one target which is to check the behavior in case of network failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atakavci
What about adding a timeout to the test itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be more predictable than relaying on multiple get/set ...
i am not strongly against it but its not clear to me what would improve between;
- multiple calls where we expect any one to fail in n seconds
- single call we expect to fail in n seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought some flakiness might come from not throwing an exception between the calls if the server is reconnected too fast, but it seems not very likely.
Agree, let's keep it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, than i failed to make the PR stating the issue clear enough. Let me try to improve the description.
| Package | Type | Package file | Manager | Update | Change | |---|---|---|---|---|---| | [redis.clients:jedis](https://github.com/redis/jedis) | dependencies | misk/gradle/libs.versions.toml | gradle | major | `5.2.0` -> `6.0.0` | --- ### Release Notes <details> <summary>redis/jedis (redis.clients:jedis)</summary> ### [`v6.0.0`](https://github.com/redis/jedis/releases/tag/v6.0.0): 6.0.0 #### Redis 8.0 support Starting from version 8.0, Redis supports new data structures and capabilities such as JSON, Search & Query, and TimeSeries by default. This release improves Jedis compatibility with Redis 8.0. ##### Upgrading from previous releases ##### Search This release introduces a client-side default dialect for Redis’ search and query capabilities. By default, the client now overrides the server-side dialect with version 2, automatically appending `DIALECT 2` to commands like **FT.AGGREGATE** and **FT.SEARCH**. **Important**: Be aware that the query dialect may impact the results returned. If needed, you can revert to a different dialect version by configuring the client accordingly. ```java UnifiedJedis jedis = new UnifiedJedis("redis://localhost:6379"); jedis.setDefaultSearchDialect(1); // DIALECT 1 ``` You can find further details in the [query dialect documentation](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/dialects/). ##### Discontinued features This release also **removes** support for both **RedisGraph** and **Triggers & Functions**(aka RedisGears v2). #### 🔥 Breaking Changes - Make default client side search dialect to 2 ([#​4060](redis/jedis#4060)) - Remove Graph module support ([#​4073](redis/jedis#4073)) - Change FT.PROFILE to return generic object ([#​4067](redis/jedis#4067)) - Remove Triggers and Functions feature ([#​3969](redis/jedis#3969)) - COMMAND INFO reply contains subcommand detail ([#​4022](redis/jedis#4022)) #### 🧪 Experimental Features - Support warning messages in search/aggregation query results ([#​3958](redis/jedis#3958)) - Add SslOptions ([#​3980](redis/jedis#3980)) #### 🚀 New Features - Add tests for vector search INT8/UINT8 types ([#​4091](redis/jedis#4091)) - Support for new HFE API, hgetdel hgetex hsetex commands ([#​4095](redis/jedis#4095)) - Propagate cause for "Cluster retry deadline exceeded" exception ([#​4103](redis/jedis#4103)) - Support INFO command in UnifiedJedis (simplified) ([#​4079](redis/jedis#4079)) - \[code cleanup] Jedis client to implement CommandCommands interface ([#​4077](redis/jedis#4077)) - Extend EXECABORT with "previous errors" [#​4084](redis/jedis#4084) ([#​4090](redis/jedis#4090)) - Add SslOptions ([#​3980](redis/jedis#3980)) - Token based authentication integration with core extension ([#​4011](redis/jedis#4011)) - Implement command (no arg) ([#​4026](redis/jedis#4026)) #### 🐛 Bug Fixes - Fix for bug [#​4003](redis/jedis#4003). Better message instead of ArrayIndexOutOfBoundsExce ([#​4109](redis/jedis#4109)) - Fix pubsub when cache enabled ([#​4086](redis/jedis#4086)) - Bump org.apache.commons:commons-pool2 from 2.12.0 to 2.12.1 ([#​4080](redis/jedis#4080)) - COMMAND INFO reply contains subcommand detail ([#​4022](redis/jedis#4022)) #### 🧰 Maintenance - Bump test infra to 8.0-RC2 ([#​4155](redis/jedis#4155)) - DOC-5110 added hash search examples ([#​4151](redis/jedis#4151)) - Bump org.apache.maven.plugins:maven-surefire-plugin from 3.5.2 to 3.5.3 ([#​4136](redis/jedis#4136)) - Bump org.jacoco:jacoco-maven-plugin from 0.8.12 to 0.8.13 ([#​4137](redis/jedis#4137)) - Speed up cluster tests ([#​4150](redis/jedis#4150)) - Bump org.apache.httpcomponents.client5:httpclient5-fluent from 5.4.2 to 5.4.4 ([#​4153](redis/jedis#4153)) - Fix for connectionAuthWithExpiredTokenTest ([#​4142](redis/jedis#4142)) - Migrate test to JUnit5 ([#​4139](redis/jedis#4139)) - Document pgp keys ([#​4125](redis/jedis#4125)) - Bump jackson.version from 2.18.2 to 2.18.3 ([#​4106](redis/jedis#4106)) - Add tests for setGet with Parameters ([#​4127](redis/jedis#4127)) - Fix failing network partition test ([#​4118](redis/jedis#4118)) - Test support for DefaultAzureCredential ([#​4113](redis/jedis#4113)) - Update redis server test versions ([#​4114](redis/jedis#4114)) - Update stale issue workflow ([#​4101](redis/jedis#4101)) - Bump net.revelc.code.formatter:formatter-maven-plugin from 2.11.0 to 2.16.0 ([#​4098](redis/jedis#4098)) - Basic documention for TBA support with some examples ([#​4102](redis/jedis#4102)) - Bump org.apache.maven.plugins:maven-compiler-plugin from 3.13.0 to 3.14.0 ([#​4097](redis/jedis#4097)) - Bump org.awaitility:awaitility from 4.2.2 to 4.3.0 ([#​4099](redis/jedis#4099)) - Enforce code style format ([#​4087](redis/jedis#4087)) - Update redisjson.md ([#​4083](redis/jedis#4083)) - Bump org.json:json from [`2024030`](redis/jedis@20240303) to [`2025010`](redis/jedis@20250107) ([#​4049](redis/jedis#4049)) - Bump com.google.code.gson:gson from 2.11.0 to 2.12.1 ([#​4082](redis/jedis#4082)) - Bump org.apache.httpcomponents.client5:httpclient5-fluent from 5.4.1 to 5.4.2 ([#​4081](redis/jedis#4081)) - Bump org.apache.commons:commons-pool2 from 2.12.0 to 2.12.1 ([#​4080](redis/jedis#4080)) - Fix the Java example code for Lists using RPUSH ([#​4074](redis/jedis#4074)) - Use v4 of few GitHub actions workflow artifacts ([#​4075](redis/jedis#4075)) - Change FT.PROFILE to return generic object ([#​4067](redis/jedis#4067)) - Remove SearchConfigTest ([#​4072](redis/jedis#4072)) - Test modules CONFIG support ([#​4043](redis/jedis#4043)) - Test modules ACL support ([#​4042](redis/jedis#4042)) - Test with 8.0-M04-pre ([#​4069](redis/jedis#4069)) - Fix TBA cluster integration tests ([#​4068](redis/jedis#4068)) - DOC-4445 server management command examples ([#​4056](redis/jedis#4056)) - Update actions/checkout, actions/setup-java and codecov/codecov-action ([#​4066](redis/jedis#4066)) - DOC-4732 added geo index examples ([#​4059](redis/jedis#4059)) - DOC-4440 added auth command examples using Jedis class ([#​4058](redis/jedis#4058)) - Revert failing GitHub artifacts for Publish Docs ([#​4065](redis/jedis#4065)) - Use v3 of GitHub deploy-pages for Publish Docs ([#​4064](redis/jedis#4064)) - Use v3 of GitHub upload-pages-artifact for Publish Docs ([#​4063](redis/jedis#4063)) - Upgrade GitHub Python artifact for Publish Docs ([#​4062](redis/jedis#4062)) - Use v4 of upload artifact ([#​4061](redis/jedis#4061)) - DOC-4475 examples for llen, lpop, lpush, lrange, rpop, and rpush ([#​4054](redis/jedis#4054)) - DOC-4495 sadd and smembers examples ([#​4052](redis/jedis#4052)) - Fix sporadic test failing with OOM ([#​4053](redis/jedis#4053)) - Introduces test matrix based on Redis versions \[8.0-M1, 7.4.1, 7.2.6, 6.2.16] ([#​4015](redis/jedis#4015)) - Remove List tests asserting timeouts ([#​4051](redis/jedis#4051)) - DOC-4450 added hgetall and hvals command examples ([#​4050](redis/jedis#4050)) - Minor fix with Token-Based-Authentication integration tests ([#​4044](redis/jedis#4044)) - Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.11.1 to 3.11.2 ([#​4039](redis/jedis#4039)) - DOC-4560 pipe/transaction examples for docs ([#​4038](redis/jedis#4038)) - Bump jackson.version from 2.18.1 to 2.18.2 ([#​4034](redis/jedis#4034)) - Make reply of COMMAND INFO compatible with older Redis versions ([#​4031](redis/jedis#4031)) - Make reply of ACL LOG compatible with older Redis versions ([#​4030](redis/jedis#4030)) - Add examples and tutorials page ([#​4024](redis/jedis#4024)) - Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.10.1 to 3.11.1 ([#​4007](redis/jedis#4007)) - Bump org.apache.maven.plugins:maven-surefire-plugin from 3.5.1 to 3.5.2 ([#​4008](redis/jedis#4008)) - DOC-4345 added JSON search examples for home page ([#​4010](redis/jedis#4010)) - Bump org.apache.httpcomponents.client5:httpclient5-fluent from 5.4 to 5.4.1 ([#​4009](redis/jedis#4009)) - Bump jackson.version from 2.18.0 to 2.18.1 ([#​4006](redis/jedis#4006)) - Mkdocs unify docs ([#​3999](redis/jedis#3999)) - Update links in README ([#​3974](redis/jedis#3974)) - Codecove has released beta version of Test Analytics feature ([#​3996](redis/jedis#3996)) - Fix flaky tests with 'await' ([#​3972](redis/jedis#3972)) - Bump org.apache.maven.plugins:maven-javadoc-plugin from 3.10.0 to 3.10.1 ([#​3994](redis/jedis#3994)) - Add javadoc to clear up implicit behavior ([#​3991](redis/jedis#3991)) - Fix JavaDoc warnings ([#​3990](redis/jedis#3990)) - Bump org.apache.maven.plugins:maven-surefire-plugin from 3.5.0 to 3.5.1 ([#​3989](redis/jedis#3989)) - Bump org.apache.maven.plugins:maven-gpg-plugin from 3.2.6 to 3.2.7 ([#​3976](redis/jedis#3976)) - Bump com.kohlschutter.junixsocket:junixsocket-core from 2.10.0 to 2.10.1 ([#​3978](redis/jedis#3978)) - Bump jackson.version from 2.17.2 to 2.18.0 ([#​3977](redis/jedis#3977)) - DOC-4317 fixed flaky tests ([#​3984](redis/jedis#3984)) - Jedis 5.2.0 is released; bump snapshot version to 5.3.0 ([#​3975](redis/jedis#3975)) #### Contributors We'd like to thank all the contributors who worked on this release! [@​andy-stark-redis](https://github.com/andy-stark-redis), [@​atakavci](https://github.com/atakavci), [@​ggivo](https://github.com/ggivo), [@​joshrotenberg](https://github.com/joshrotenberg), [@​ozennou](https://github.com/ozennou), [@​sanaulla123](https://github.com/sanaulla123), [@​sazzad16](https://github.com/sazzad16), [@​smadasu](https://github.com/smadasu), [@​thachlp](https://github.com/thachlp), [@​tishun](https://github.com/tishun) and [@​uglide](https://github.com/uglide) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 6pm every weekday,before 2am every weekday" in timezone Australia/Melbourne, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Never, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). GitOrigin-RevId: 12d0d485257f42dab8fcbd650e93cf770225dac5
in the scenario test environment (the only place where this test works as of today), we have identified a critical timing-related issue with the impact of failure injection. Failure injection is initiated by the test itself, but there are no mechanism to:
due to a bunch of reasons, failure injection might take longer than seconds, while our tests proceed relatively fast. As a result, tests fail prematurely—before the failure injection even takes affect.
To address this issue, we will try to utilize the injection API's built-in check mechanism. This will help us keep asserting and allow a sufficient time to receive the impact.
New implementation (using the injection API's
isCompleted
) will keep trying during the given time window after the propogation is done and status confirmed from the calling side.