Shards did not rebalance immediately after hardware failure on data node

### Elasticsearch Version

5.6.16

### Installed Plugins

_No response_

### Java Version

_bundled_

### OS Version

Linux cluster-1-master-2 4.14.326-245.539.amzn2.x86_64 #1 SMP Tue Sep 26 09:59:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

### Problem Description

We encountered an issue where Elasticsearch did not automatically rebalance shards after a hardware failure on one of our data nodes. This caused an extended period of degraded performance until manual intervention was performed.


EC2 instance went unhealthy on 2024-10-28 20:40 UTC 
EC2 instance became healthy back on 2024-10-28 21:10 UTC


Elastic search rebalanced the shards at 2024-10-28 21:10 UTC only.

Wanted to know the reason for high time to rebalance the shard. Have attached the master node logs


[es_oct_28.log](https://github.com/user-attachments/files/17622677/es_oct_28.log)



ES cluster settings

## Cluster level settings.
cluster.name: fc-use1-00-conversation-cluster-1

## Discovery settings.
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: [XXX]

# discovery.zen.ping.timeout: 15s
discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_timeout: 10s
discovery.zen.fd.ping_retries: 6


action.auto_create_index: true

indices.memory.index_buffer_size: 30%

indices.store.throttle.max_bytes_per_sec: 100mb

## Node level settings.
node.data: false
node.master: true
node.name: cluster-1-master-1



cluster.routing.allocation.awareness.force.zone.values: zone-a,zone-b,zone-c,zone-d,zone-e,zone-f
cluster.routing.allocation.awareness.attributes: zone

http.enabled: true

## Loopback interface


### Steps to Reproduce

Steps to Reproduce:

- Simulate a hardware failure on one of the data nodes by stopping or disconnecting the instance.
- Observe the state of shard allocation and rebalancing.
- Notice that Elasticsearch does not immediately initiate shard rebalancing across available nodes.

### Logs (if relevant)

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shards did not rebalance immediately after hardware failure on data node #116203

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Cluster level settings.

Discovery settings.

discovery.zen.ping.timeout: 15s

Node level settings.

Loopback interface

Steps to Reproduce

Logs (if relevant)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Shards did not rebalance immediately after hardware failure on data node #116203

Description

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Cluster level settings.

Discovery settings.

discovery.zen.ping.timeout: 15s

Node level settings.

Loopback interface

Steps to Reproduce

Logs (if relevant)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions