Skip to content

[release-1.1] rfac: outputAccesslog func to skip accesslog at conn establishment #1382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/proposal/tcp_long_connection_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ We will update the functions of metric.go for periodic updating the workload and

#### Exposing long connection prometheus metrics

We will expose metrics for the connections whose duration exceeds 30 seconds. Not exposing metrics for short connection as it can lead to lot of metrics and they are also not suitable for prometheus metrics because prometheus itself has a scrape interval of maximum 15s, and short-lived connections may start and end between scrapes, resulting in incomplete or misleading data. By focusing only on longer-lived connections, we ensure the metrics are stable, meaningful, and better aligned with Prometheus’s time-series data model.
We will expose metrics for the connections whose duration exceeds 5 seconds. Not exposing metrics for short connection as it can lead to lot of metrics and they are also not suitable for prometheus metrics because prometheus itself has a scrape interval of 5s, and short-lived connections may start and end between scrapes, resulting in incomplete or misleading data. By focusing only on longer-lived connections, we ensure the metrics are stable, meaningful, and better aligned with Prometheus’s time-series data model.

We can have a another component in future which reports realtime information about connections like cilium hubble.

Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/telemetry/accesslog.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ func (l *logInfo) withDestinationService(service *workloadapi.Service) *logInfo

func OutputAccesslog(data requestMetric, conn_metrics connMetric, accesslog logInfo) {
// Skip output access log on connection establishment
if data.state == TCP_ESTABLISHED && data.duration < LONG_CONN_METRIC_THRESHOLD {
if data.state == TCP_ESTABLISHED && conn_metrics.totalReports == 1 {
return
}
logStr := buildAccesslog(data, conn_metrics, accesslog)
Expand Down
16 changes: 8 additions & 8 deletions pkg/controller/telemetry/metric.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ import (

const (
TCP_ESTABLISHED = uint32(1)
TCP_CLOSTED = uint32(7)
TCP_CLOSED = uint32(7)

connection_success = uint32(1)

Expand Down Expand Up @@ -512,7 +512,7 @@ func (m *MetricController) Run(ctx context.Context, mapOfTcpInfo *ebpf.Map) {
connectionLabels = m.buildConnectionMetric(&data)
}
if m.EnableAccesslog.Load() {
// accesslogs at interval of 5 sec during connection lifecycle and at close of connection
// accesslogs at interval of 5 sec during connection lifecycle if connectionMetrics is enabled and at close of connection
OutputAccesslog(data, tcpConns[data.conSrcDstInfo], accesslog)
}

Expand All @@ -526,7 +526,7 @@ func (m *MetricController) Run(ctx context.Context, mapOfTcpInfo *ebpf.Map) {
}
m.mutex.Unlock()

if data.state == TCP_CLOSTED {
if data.state == TCP_CLOSED {
delete(tcpConns, data.conSrcDstInfo)
}
}
Expand Down Expand Up @@ -848,7 +848,7 @@ func (m *MetricController) updateWorkloadMetricCache(data requestMetric, labels
if data.state == TCP_ESTABLISHED && metric.totalReports == 1 {
v.WorkloadConnOpened = v.WorkloadConnOpened + 1
}
if data.state == TCP_CLOSTED {
if data.state == TCP_CLOSED {
v.WorkloadConnClosed = v.WorkloadConnClosed + 1
}
if data.success != connection_success {
Expand All @@ -863,7 +863,7 @@ func (m *MetricController) updateWorkloadMetricCache(data requestMetric, labels
if data.state == TCP_ESTABLISHED && metric.totalReports == 1 {
newWorkloadMetricInfo.WorkloadConnOpened = 1
}
if data.state == TCP_CLOSTED {
if data.state == TCP_CLOSED {
newWorkloadMetricInfo.WorkloadConnClosed = 1
}
if data.success != connection_success {
Expand All @@ -883,7 +883,7 @@ func (m *MetricController) updateServiceMetricCache(data requestMetric, labels s
if data.state == TCP_ESTABLISHED && metric.totalReports == 1 {
v.ServiceConnOpened = v.ServiceConnOpened + 1
}
if data.state == TCP_CLOSTED {
if data.state == TCP_CLOSED {
v.ServiceConnClosed = v.ServiceConnClosed + 1
}
if data.success != connection_success {
Expand All @@ -896,7 +896,7 @@ func (m *MetricController) updateServiceMetricCache(data requestMetric, labels s
if data.state == TCP_ESTABLISHED && metric.totalReports == 1 {
newServiceMetricInfo.ServiceConnOpened = 1
}
if data.state == TCP_CLOSTED {
if data.state == TCP_CLOSED {
newServiceMetricInfo.ServiceConnClosed = 1
}
if data.success != connection_success {
Expand All @@ -923,7 +923,7 @@ func (m *MetricController) updateConnectionMetricCache(data requestMetric, connD
newConnectionMetricInfo.ConnTotalRetrans = float64(connData.totalRetrans)
m.connectionMetricCache[labels] = &newConnectionMetricInfo
}
if data.state == TCP_CLOSTED {
if data.state == TCP_CLOSED {
deleteLock.Lock()
deleteConnection = append(deleteConnection, &labels)
deleteLock.Unlock()
Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/telemetry/metric_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -535,7 +535,7 @@ func TestBuildConnectionMetricsToPrometheus(t *testing.T) {
receivedBytes: 0x0000004,
packetLost: 0x0000001,
totalRetrans: 0x0000002,
state: TCP_CLOSTED,
state: TCP_CLOSED,
},
labels: connectionMetricLabels{
reporter: "source",
Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/telemetry/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ var (
Help: "The total number of TCP connections failed to a service.",
}, serviceLabels)

// Metrics to track the status of long lived TCP connections (duration > 30s)
// Metrics to track the status of long lived TCP connections
tcpConnectionTotalSendBytes = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "kmesh_tcp_connection_sent_bytes_total",
Expand Down
Loading