-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: replace events library with shdict #12353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@@ -417,7 +359,20 @@ end | |||
|
|||
|
|||
function _M.dump_data() | |||
return {config = local_conf.discovery.nacos, services = applications or {}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config is removed from here as it's a security risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense.
local http = require("resty.http") | ||
local uri = "http://127.0.0.1:" .. ngx.var.server_port | ||
|
||
-- Wait for 2 seconds for APISIX initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test is modified to add this sleep because: Since retry mechanism is removed, priviliged agent fills data in shdict might take some time so we want to send the request after that.
apisix/discovery/nacos/init.lua
Outdated
local value = nacos_dict:get(key) | ||
if value then | ||
local nodes = core.json.decode(value) | ||
if nodes and #nodes > 0 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dump_data
is for debug purpose, if value in shared dict for this key is empty , we should return the real empty value for debugging instead hide it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing you missed: remove unused key from shared dict in every fetch_full_registry
run
apisix/discovery/nacos/init.lua
Outdated
local old_curr_service_in_use = curr_service_in_use | ||
curr_service_in_use = service_names | ||
-- remove services that are not in use anymore | ||
for key, _ in pairs(old_curr_service_in_use) do | ||
if not service_names[key] then | ||
nacos_dict:delete(key) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local old_curr_service_in_use = curr_service_in_use | |
curr_service_in_use = service_names | |
-- remove services that are not in use anymore | |
for key, _ in pairs(old_curr_service_in_use) do | |
if not service_names[key] then | |
nacos_dict:delete(key) | |
end | |
end | |
-- remove services that are not in use anymore | |
for key, _ in pairs(curr_service_in_use) do | |
if not service_names[key] then | |
nacos_dict:delete(key) | |
end | |
end | |
curr_service_in_use = service_names |
@bzp2010 I ran a benchmark to test whether the Configuration usedconfig.yaml apisix:
node_listen: 1984
deployment:
role: data_plane
role_data_plane:
config_provider: yaml
nginx_config:
worker_processes: 8
discovery:
nacos:
host:
- "http://nacos:[email protected]:8848"
prefix: "/nacos/v1/"
fetch_interval: 1
weight: 1
timeout:
connect: 2000
send: 2000
read: 5000 apisix.yaml routes:
-
uri: /hello
upstream:
service_name: APISIX-NACOS
discovery_type: nacos
type: roundrobin
#END Here are the results. On master❯ wrk -c 10 -t 5 -d 30s -R 50000 http://localhost:1984/hello
Running 30s test @ http://localhost:1984/hello
5 threads and 10 connections
Thread calibration: mean lat.: 2999.418ms, rate sampling interval: 10395ms
Thread calibration: mean lat.: 2885.546ms, rate sampling interval: 10125ms
Thread calibration: mean lat.: 2857.330ms, rate sampling interval: 10059ms
Thread calibration: mean lat.: 2967.000ms, rate sampling interval: 10231ms
Thread calibration: mean lat.: 2953.790ms, rate sampling interval: 10256ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.12s 3.06s 16.47s 57.49%
Req/Sec 4.52k 82.05 4.63k 60.00%
689405 requests in 30.00s, 110.45MB read
Requests/sec: 22980.47
Transfer/sec: 3.68MB
After changes❯ wrk -c 10 -t 5 -d 30s -R 50000 http://localhost:1984/hello
Running 30s test @ http://localhost:1984/hello
5 threads and 10 connections
Thread calibration: mean lat.: 3060.053ms, rate sampling interval: 10584ms
Thread calibration: mean lat.: 3084.401ms, rate sampling interval: 10772ms
Thread calibration: mean lat.: 3254.041ms, rate sampling interval: 11198ms
Thread calibration: mean lat.: 3093.539ms, rate sampling interval: 10985ms
Thread calibration: mean lat.: 3062.424ms, rate sampling interval: 10772ms
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.89s 3.36s 17.74s 56.68%
Req/Sec 4.13k 39.11 4.18k 60.00%
623009 requests in 30.00s, 99.81MB read
Requests/sec: 20767.18
Transfer/sec: 3.33MB ResultThe QPS dropped from 22980.47 to 20767.18 with ~2k difference. I think this should be acceptable given the issue with events library this PR solves. According to me, it's an okay tradeoff |
Breakout from: #12263
Fixes #
Currently nacos discovery uses event library and during testing it was found out that when using lua-resty-events, sometimes not all workers get the events. And inconsistencies emerge. Moreover the idiomatic way to share data between workers is through a shared dict. Therefore this PR migrates nacos from older events mechanism to newer shared dict way. Now only priviliged agent will fetch data from nacos and write to shdict while all workers will read from it.
Checklist