Skip to content

Fix cgroup_skb/* get sk_storage failed #1350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 30, 2025

Conversation

hzxuzhonghu
Copy link
Member

What type of PR is this?
/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #1348

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@kmesh-bot kmesh-bot added the kind/bug Something isn't working label Apr 28, 2025
@@ -260,6 +260,23 @@ static inline void remove_kmesh_managed_ip(__u32 family, __u32 ip4, __u32 *ip6)
BPF_LOG(ERR, KMESH, "remove ip failed, err is %d\n", err);
}

static inline bool sock_conn_from_sim(struct __sk_buff *skb)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, kmesh simulate 0.0.0.2:930 or 0.0.0.2:931 when it need to manage the pod

@hzxuzhonghu
Copy link
Member Author

cc @yp969803

@Kuromesi

@hzxuzhonghu hzxuzhonghu changed the title Cgroup skb Fix Cgroup_skb get sk_storage failed Apr 28, 2025
@hzxuzhonghu hzxuzhonghu changed the title Fix Cgroup_skb get sk_storage failed Fix cgroup_skb/* get sk_storage failed Apr 28, 2025
Copy link

codecov bot commented Apr 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 46.26%. Comparing base (bd40bbe) to head (44253ce).
Report is 23 commits behind head on main.

see 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a07412...44253ce. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yp969803
Copy link
Contributor

yp969803 commented Apr 28, 2025

have u checked it locally, using bpf logs, it is better if we check the stuffs locally before merging to prevent bugs.
you can use this manifest to simulate long connection, deploy in default ns

apiVersion: v1
kind: Pod
metadata:
  name: ws-server
  labels:
    app: ws-server
spec:
  containers:
  - name: ws-server
    image: node:18
    command: ["/bin/sh", "-c"]
    args:
      - |
        mkdir server 
        cd server 
        npm init -y
        npm install ws
        cat << EOF > server.js
        const WebSocket = require('ws');
        const server = new WebSocket.Server({ port: 8080 });
        server.on('listening', () => {
          console.log('🚀 WebSocket server started on ws://localhost:8080');
        });

        server.on('connection', socket => {
          console.log('Client connected');
          
          // Send data every 1 seconds
          setInterval(() => {
            socket.send(`Hello world`);
            console.log('Sent message to client');
          }, 1000);

          socket.on('message', message => {
            console.log(`Received`);
          });

          socket.on('close', () => {
            console.log('Client disconnected');
          });
        });
        EOF
        node server.js

    ports:
    - containerPort: 8080

---
apiVersion: v1
kind: Service
metadata:
  name: ws-server-service
  labels:
    app: ws-server
spec:
  selector:
    app: ws-server
  ports:
  - protocol: TCP
    port: 8080        # Port exposed by the service
    targetPort: 8080  # Port the server container listens on

--- 

apiVersion: v1
kind: Pod
metadata:
  name: ws-client
  labels:
    app: ws-client
spec:
  containers:
  - name: ws-client
    image: node:18
    command: ["/bin/sh", "-c"]
    args:
      - |
        mkdir client
        cd client
        npm init -y
        npm install ws 
        cat <<EOF > client.js
        const WebSocket = require('ws');
        const socket = new WebSocket('ws://ws-server-service.default.svc.cluster.local:8080');

        socket.on('open', () => {
          console.log('Connected to server');
          socket.send('Hello from client');
          console.log('Sent message to server');
        });

        socket.on('message', data => {
          console.log(`Received`);

        });

        socket.on('close', () => {
          console.log('Connection closed');
        });
        EOF
        node client.js

@hzxuzhonghu
Copy link
Member Author

what do you mean?

/retest

@yp969803
Copy link
Contributor

i was just asking if u have checked it locally, does the normal packets reaching to the observed_data function

@hzxuzhonghu
Copy link
Member Author

I am not sure if kmesh restart would influence the long connection metric. For this patch, it does not bring regression for long connection

@hzxuzhonghu hzxuzhonghu mentioned this pull request Apr 29, 2025
3 tasks
__u16 enable_port = ENABLE_KMESH_PORT;
__u16 disable_port = DISABLE_KMESH_PORT;
__u16 dst_port = (__u16)(skb->remote_port >> 16);
if (bpf_ntohs(dst_port) != enable_port && bpf_ntohs(dst_port) != disable_port)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (bpf_ntohs(dst_port) != enable_port && bpf_ntohs(dst_port) != disable_port)
if (bpf_ntohs(dst_port) != ENABLE_KMESH_PORT && bpf_ntohs(dst_port) != DISABLE_KMESH_PORT)

Signed-off-by: Zhonghu Xu <[email protected]>
Signed-off-by: Zhonghu Xu <[email protected]>
@hzxuzhonghu
Copy link
Member Author

/retest

@hzxuzhonghu hzxuzhonghu requested a review from nlgwcy April 30, 2025 01:55
@hzxuzhonghu
Copy link
Member Author

@nlgwcy Another look

@nlgwcy
Copy link
Contributor

nlgwcy commented Apr 30, 2025

/lgtm
/approve

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nlgwcy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot merged commit 1d2122a into kmesh-net:main Apr 30, 2025
11 of 12 checks passed
@hzxuzhonghu hzxuzhonghu deleted the cgroup_skb branch April 30, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected log from cgroup_skb
4 participants