-
Notifications
You must be signed in to change notification settings - Fork 110
Tcp long conn metrics and accesslogs (new) #1298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…p_probe_info Signed-off-by: Yash Patel <[email protected]>
Codecov ReportAttention: Patch coverage is
... and 4 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
…probe_info Signed-off-by: Yash Patel <[email protected]>
@hzxuzhonghu can you look at this pr, I have used a more improved approach |
Not sure why you open a new pr, basically it is more facilitating to update on the existing one. Which is more easy for reviewing |
I have done a lot of experiments with the previous pr, it contains lots of unnecessary commits, also this time, I’ve taken a different and cleaner approach, which will make it much easier to review and merge. |
bpf/kmesh/probes/probe.h
Outdated
@@ -3,6 +3,7 @@ | |||
|
|||
#ifndef __KMESH_BPF_PROBE_H__ | |||
#define __KMESH_BPF_PROBE_H__ | |||
#define LONG_CONN_THRESHOLD_TIME (5 * 1000000000ULL) // 5s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move down after the include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i have misunderstood it to move down one line !!
bpf/kmesh/probes/probe.h
Outdated
bpf_sk_storage_delete(&map_of_sock_storage, sk); | ||
} | ||
|
||
static inline void observe_on_recv(struct bpf_sock *sk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove, i donot see it is used now. I guess you are writing for recv metric later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for inconvience, it is different approach, it do not need different hooks for send and recv only one hook do the work, currently i have drafted that pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only do in recv is not right, some tcp connections only have send no recv message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that is the case i can use
SEC("cgroup_skb/egress")
int sendmsg_cgrp_prog(struct __sk_buff *skb)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i you want i can also use both the hooks for send and recv
SEC("cgroup_skb/ingress")
and
SEC("cgroup_skb/egress")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAK, we have sk_msg, we donont need cgroup_skb/egress
now. cgroup_skb/ingress
may still need to acquire the relatively realtime recv bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yp969803 can you please explain about this confusion, could we duplicately calculate the send/recv bytes
bpf/kmesh/probes/tcp_probe.h
Outdated
@@ -35,18 +35,20 @@ struct tcp_probe_info { | |||
__u32 type; | |||
struct bpf_sock_tuple tuple; | |||
struct orig_dst_info orig_dst; | |||
__u32 sent_bytes; | |||
__u32 received_bytes; | |||
__u64 conn_id; /* sock_cookie */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we donot need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using conn_id in the userspace as key for the map
tcp_conns := make(map[uint64]connMetric)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use tuple as key, sock cookie is meaningless in userspace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different connections with same src and dest address has same tuple keys,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics actually are accumulated at application to application level. So it does not matter at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want, for long connections, we can also accumulate metrics at connection level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should and we must
bpf/kmesh/workload/recvmsg.c
Outdated
#include "probe.h" | ||
|
||
SEC("cgroup_skb/ingress") | ||
int recvmsg_prog(struct __sk_buff *skb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove and handle recv in the other pr. And i do not see it in the proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I donot see the send msg logic, do you miss it?
@hzxuzhonghu I am using a different approach here, which does not require send_msg logic, I will make changes to the proposal for this pr to make the understanding of the approach clear, also I will explain the approach in today's meet |
e5b081c
to
5b3a8ef
Compare
These are the final logs
|
Signed-off-by: Yash Patel <[email protected]> rfac: added workload include to performance_probe.h Signed-off-by: Yash Patel <[email protected]> rfac: moved is_managed_bt_kmesh_skb to bpf_common.h Signed-off-by: Yash Patel <[email protected]> chore: run make gen Signed-off-by: Yash Patel <[email protected]> rfac: changed recv.c to cgrp_skb.c Signed-off-by: Yash Patel <[email protected]> rfac: created skb.go to load and attach cgroup_skb Signed-off-by: Yash Patel <[email protected]> rfac: tcp_probe.h Signed-off-by: Yash Patel <[email protected]> rfac: skb.go Signed-off-by: Yash Patel <[email protected]>
timeInfo := fmt.Sprintf("%v", uptime) | ||
sourceInfo := fmt.Sprintf("src.addr=%s, src.workload=%s, src.namespace=%s", accesslog.sourceAddress, accesslog.sourceWorkload, accesslog.sourceNamespace) | ||
destinationInfo := fmt.Sprintf("dst.addr=%s, dst.service=%s, dst.workload=%s, dst.namespace=%s", accesslog.destinationAddress, accesslog.destinationService, accesslog.destinationWorkload, accesslog.destinationNamespace) | ||
connectionInfo := fmt.Sprintf("direction=%s, sent_bytes=%d, received_bytes=%d, srtt=%dus, min_rtt=%dus, duration=%vms", accesslog.direction, data.sentBytes, data.receivedBytes, data.srtt, data.minRtt, (float64(data.duration) / 1000000.0)) | ||
connectionInfo := fmt.Sprintf("connection_id=%d, start_time=%s, direction=%s, status=%s, sent_bytes=%d, received_bytes=%d, packet_loss=%d, retransmissions=%d, srtt=%dus, min_rtt=%dus, duration=%vms", conn_metrics.connId, startTimeInfo, accesslog.direction, accesslog.status, conn_metrics.sentBytes, conn_metrics.receivedBytes, conn_metrics.packetLost, conn_metrics.totalRetrans, data.srtt, data.minRtt, (float64(data.duration) / 1000000.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we donot need to build accesslog periodically.
pkg/controller/telemetry/metric.go
Outdated
@@ -86,17 +102,18 @@ type statistics struct { | |||
ConnectSuccess uint32 | |||
Direction uint32 | |||
State uint32 | |||
_ uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add some comments
pkg/controller/telemetry/metric.go
Outdated
@@ -106,11 +123,13 @@ type connectionDataV4 struct { | |||
OriginalAddr uint32 | |||
OriginalPort uint16 | |||
_ uint16 | |||
_ [3]uint32 | |||
_ [4]uint32 | |||
ConnId uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
pkg/controller/telemetry/metric.go
Outdated
@@ -383,9 +431,10 @@ func (m *MetricController) Run(ctx context.Context, mapOfTcpInfo *ebpf.Map) { | |||
} | |||
} | |||
|
|||
func buildV4Metric(buf *bytes.Buffer) (requestMetric, error) { | |||
func buildV4Metric(buf *bytes.Buffer, tcp_conns map[uint64]connMetric) (requestMetric, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LiZhenCheng9527 ptal
@hzxuzhonghu i have addressed all your comments, you can look into it when you get time |
Signed-off-by: Yash Patel <[email protected]>
bpf/kmesh/probes/probe.h
Outdated
@@ -81,5 +83,26 @@ static inline void observe_on_close(struct bpf_sock *sk) | |||
} | |||
|
|||
tcp_report(sk, tcp_sock, storage, BPF_TCP_CLOSE); | |||
bpf_sk_storage_delete(&map_of_sock_storage, sk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove, this kind of map does not need to delete manually
I think i have requested many many times, we donot need |
I will remove the connection_id from the access logs
I can use tuple as a key, but then I have to declare a new struct in userspace, also this struct will have variable size for IP v4 and v6 , which is a problem, also the size of the key will also increase, I think there is no flaw in using conn_id in the userspace as a map key because it is much more smaller size than the tupple and have fix size. |
Signed-off-by: Yash Patel <[email protected]>
@nlgwcy can you review the ebpf part when you get time !! |
Can you use the src dst {ip: port}, it can not conflict during the small interval |
@hzxuzhonghu removed conn_id from everywhere as per your request, you can merge the pr now |
bpf/include/bpf_common.h
Outdated
@@ -44,6 +44,8 @@ struct { | |||
|
|||
struct sock_storage_data { | |||
__u64 connect_ns; | |||
__u64 last_report_ns; | |||
__u64 sock_cookie; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove now, right?
…ric.go, also rmoved conn_id in tcp_probe_info Signed-off-by: Yash Patel <[email protected]> rfac: rmoved conn_id from tcp_probe_info Signed-off-by: Yash Patel <[email protected]>
@nlgwcy Is there any other comments |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Some tests needed, but can be added later
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #1211
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Yes