Skip to content

Refactor dns #1247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 8, 2025
Merged

Refactor dns #1247

merged 11 commits into from
Apr 8, 2025

Conversation

LiZhenCheng9527
Copy link
Contributor

What type of PR is this?
/kind enhancement

What this PR does / why we need it:
Refactored the DNS resolution module to decouple it from the CDS in Kernel-Native mode. This facilitates the reuse of the DNS resolution module in Dual-Engine mode.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@LiZhenCheng9527 LiZhenCheng9527 added this to the v1.1 milestone Feb 21, 2025
@kmesh-bot kmesh-bot added the kind/enhancement New feature or request label Feb 21, 2025
Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 62.93333% with 139 lines in your changes missing coverage. Please review.

Project coverage is 45.65%. Comparing base (726da17) to head (2f054db).
Report is 99 commits behind head on main.

Files with missing lines Patch % Lines
pkg/controller/ads/dns.go 71.35% 46 Missing and 13 partials ⚠️
pkg/dns/dns.go 34.48% 52 Missing and 5 partials ⚠️
pkg/dns/utils.go 78.46% 10 Missing and 4 partials ⚠️
pkg/controller/ads/ads_controller.go 50.00% 7 Missing and 1 partial ⚠️
pkg/controller/controller.go 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
pkg/controller/controller.go 0.00% <0.00%> (ø)
pkg/controller/ads/ads_controller.go 68.05% <50.00%> (-5.63%) ⬇️
pkg/dns/utils.go 78.46% <78.46%> (ø)
pkg/dns/dns.go 51.54% <34.48%> (-22.66%) ⬇️
pkg/controller/ads/dns.go 71.35% <71.35%> (ø)

... and 14 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 938599d...2f054db. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LiZhenCheng9527
Copy link
Contributor Author

LiZhenCheng9527 commented Feb 24, 2025

PTAL
/cc @hzxuzhonghu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we do provide a dns pkg fundamental capability and make ads/workload bussiness related part in their own pkg

}

func (adsResolver *AdsDnsResolver) refreshAdsWorker() {
for adsResolver.refreshAdsDns() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe such arch should be shared with workload

pkg/dns/dns.go Outdated
}

return ready
type PendingResolveDomain struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plean to support dual-engine then?
If you are going to add another struct here, wouldn't that still coupling

)

// adsDnsResolver is DNS resolver of Kernel Native
type AdsDnsResolver struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
type AdsDnsResolver struct {
type DnsResolver struct {

dnsRefreshQueue workqueue.TypedDelayingInterface[any]
}

func NewAdsDnsResolver(adsCache *AdsCache) (*AdsDnsResolver, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is under ads pkg, so we can remove the redundant

Suggested change
func NewAdsDnsResolver(adsCache *AdsCache) (*AdsDnsResolver, error) {
func NewDnsResolver(adsCache *AdsCache) (*DnsResolver, error) {

@hzxuzhonghu hzxuzhonghu requested review from Copilot and removed request for Copilot February 27, 2025 06:36
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@hzxuzhonghu hzxuzhonghu requested review from Copilot and removed request for Copilot February 27, 2025 07:02
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR refactors and decouples the DNS resolution module from the CDS in Kernel-Native mode to support reuse in Dual-Engine mode. The changes introduce a fake DNS server for testing, a new ADS DNS Resolver, controller updates to use the new resolver, and significant refactoring in the DNS resolver implementation.

  • Added a fakeDNSServer implementation in pkg/dns/utils.go.
  • Created a new AdsDnsResolver in pkg/controller/ads/dns.go and updated the controller to utilize it.
  • Refactored pkg/dns/dns.go by cleaning up internal structures and updating locking and resolution logic.

Reviewed Changes

File Description
pkg/dns/utils.go Introduces a fake DNS server for testing purposes.
pkg/controller/ads/dns.go Implements a new ADS DNS Resolver for Kernel-Native mode.
pkg/controller/controller.go Updates the controller to use the new ADS DNS Resolver.
pkg/dns/dns.go Refactors the DNS resolver implementation, updating locks and API.

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment on lines +95 to +98
func (s *fakeDNSServer) SetHosts(domain string, surfix int) {
s.mu.Lock()
defer s.mu.Unlock()
s.hosts[dns.Fqdn(domain)] = surfix
Copy link
Preview

Copilot AI Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name 'surfix' appears to be a typo. It should likely be renamed to 'suffix' for clarity.

Suggested change
func (s *fakeDNSServer) SetHosts(domain string, surfix int) {
s.mu.Lock()
defer s.mu.Unlock()
s.hosts[dns.Fqdn(domain)] = surfix
func (s *fakeDNSServer) SetHosts(domain string, suffix int) {
s.mu.Lock()
defer s.mu.Unlock()
s.hosts[dns.Fqdn(domain)] = suffix

Copilot uses AI. Check for mistakes.

pkg/dns/dns.go Outdated
Comment on lines 113 to 146
r.RLock()
_, exist := r.cache[dr.domainName]
entry.addresses = addrs
r.RUnlock()
// if the domain is no longer watched, no need to refresh it
if !exist {
return true
}
r.resolve(dr)
r.adsCache.ClusterCache.Flush()
return true
}

Copy link
Preview

Copilot AI Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A write operation is performed under a read lock, which can lead to race conditions. Consider acquiring a full write lock when updating 'entry.addresses'.

Copilot uses AI. Check for mistakes.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR refactors the DNS resolution module to decouple it from the CDS for Kernel-Native mode, enabling reuse in Dual-Engine mode. Key changes include:

  • Addition of a fake DNS server in pkg/dns/utils.go for testing and simulation.
  • Introduction of an Ads DNS resolver and its integration in pkg/controller/ads/dns.go and pkg/controller/controller.go.
  • Updates to the DNS resolver implementation in pkg/dns/dns.go to simplify the interface and remove dependencies on AdsCache.

Reviewed Changes

File Description
pkg/dns/utils.go Adds a fake DNS server implementation for testing DNS responses and host mapping.
pkg/controller/ads/dns_test.go Includes tests verifying DNS cluster overwrite, concurrent cache updates, and domain handling.
pkg/controller/ads/dns.go Introduces the Ads DNS resolver that integrates with the cluster cache and DNS module.
pkg/controller/controller.go Modifies controller start-up to use the new Ads DNS resolver for Kernel-Native mode.
pkg/dns/dns.go Refactors the DNS resolver to remove unused parameters and adjust the refresh loop handling.

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines +95 to +98
func (s *fakeDNSServer) SetHosts(domain string, surfix int) {
s.mu.Lock()
defer s.mu.Unlock()
s.hosts[dns.Fqdn(domain)] = surfix
Copy link
Preview

Copilot AI Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name 'surfix' appears to be a misspelling; consider renaming it to 'suffix' for clarity.

Suggested change
func (s *fakeDNSServer) SetHosts(domain string, surfix int) {
s.mu.Lock()
defer s.mu.Unlock()
s.hosts[dns.Fqdn(domain)] = surfix
func (s *fakeDNSServer) SetHosts(domain string, suffix int) {
s.mu.Lock()
defer s.mu.Unlock()
s.hosts[dns.Fqdn(domain)] = suffix

Copilot uses AI. Check for mistakes.

pkg/dns/dns.go Outdated
Comment on lines 119 to 125
_, ttl, err := r.resolve(e.Domain)
if err != nil {
log.Errorf("failed to dns resolve: %v", err)
return false
}
if ttl > e.RefreshRate {
ttl = e.RefreshRate
}
if ttl == 0 {
ttl = DeRefreshInterval
}
r.RefreshQueue.AddAfter(e, ttl)
r.AdsDnsChan <- e.Domain

Copy link
Preview

Copilot AI Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the refreshDns function, returning false upon a DNS resolution error stops the resolver loop. Consider handling errors with a retry mechanism instead of terminating the refresh loop.

Copilot uses AI. Check for mistakes.

pkg/dns/dns.go Outdated
ttl = DeRefreshInterval
}
r.RefreshQueue.AddAfter(e, ttl)
r.AdsDnsChan <- e.Domain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notify adsDnsResolver that the domain has completed dns resolution


func (adsResolver *AdsDnsResolver) StartAdsDnsResolver(stopCh <-chan struct{}) {
go adsResolver.startAdsResolver()
go adsResolver.dnsResolver.StartDnsResolver()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is messy cand confusing, it is unclear what is AdsResolver and what is dns resolver

Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ADD SOME COMMENTS,it is very hard to review

Comment on lines 100 to 104
domains := getPendingResolveDomain(cds)
hostNames := make(map[string]struct{})

for k := range domains {
hostNames[k] = struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need convert after calling getPendingResolveDomain, why not directly return what you want

clusterName := cluster.GetName()
hostInfo := getHostInfo(cluster)
info := clusterInfo{
info: hostInfo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why store ip address?

func (r *dnsController) startDnsController() {
for clusters := range r.Clusters {
go func(clusters []*clusterv3.Cluster) {
r.resolveDomains(clusters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not safe running async in separate routines,

think about this case:

t0: a resolveDomains thread

t1: another resolveDomains

If t1 is run before t0, even some step.The state machine would be broken

Signed-off-by: LiZhenCheng9527 <[email protected]>
Signed-off-by: LiZhenCheng9527 <[email protected]>
// Handle cds updates when a hostname completes resolution
go r.refreshAdsWorker(stopCh)
// Consumption of clusters to be resolved.
go r.startDnsController()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks confusing with StartDnsResolver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already change function name


func (r *dnsController) startDnsController() {
for clusters := range r.Clusters {
r.resolveDomains(clusters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not actually resolve domains, confusing with the resolver. This is to get the pending domains, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will change function name.

for _, cluster := range cds {
clusterName := cluster.GetName()
info := getHostName(cluster)
r.pendingClusterInfo[clusterName] = info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this line, i can understand pendingClusterInfo is not a good name. please rename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to pendingHostnamesByCluster

// Already have record in dns cache
if addresses != nil {
r.updateClusters(v, addresses)
go r.cache.ClusterCache.Flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? You are getting one domain resolved address and then flush the cluster

domain := <-r.dnsResolver.DnsChan
pendingDomain := r.getClustersByDomain(domain)
addrs := r.dnsResolver.GetDNSAddresses(domain)
r.updateClusters(pendingDomain, addrs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many go routines aare there callling cluster flush? There is already one above in resolveDomains

Signed-off-by: LiZhenCheng9527 <[email protected]>
Signed-off-by: LiZhenCheng9527 <[email protected]>
@LiZhenCheng9527
Copy link
Contributor Author

/cc @hzxuzhonghu

@kmesh-bot kmesh-bot requested a review from hzxuzhonghu March 27, 2025 09:34
@hzxuzhonghu hzxuzhonghu requested a review from Copilot March 27, 2025 09:37
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the DNS resolution module to decouple it from the CDS in Kernel-Native mode, enabling its reuse in Dual-Engine mode.

  • Introduces a fake DNS server implementation for testing purposes.
  • Updates the DNSResolver API including channel names, caching, and refresh logic.
  • Integrates the updated DNS module with the ADS controller and adjusts related tests.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/dns/utils.go Added fakeDNSServer implementation with helper methods
pkg/dns/dns.go Refactored DNSResolver API, caching and refresh mechanisms
pkg/controller/controller.go Updated dependency injection for DNS resolver integration
pkg/controller/ads/dns*.go Modified ADS controller to use the new DNS resolver methods
pkg/controller/ads/dns_test.go Updated tests to verify the refactored DNS resolution and ADS behavior
Comments suppressed due to low confidence (1)

pkg/dns/dns.go:300

  • The parameter name 'time' conflicts with the imported 'time' package. Consider renaming it to 'delay' or 'duration' to avoid shadowing.
func (r *DNSResolver) AddDomainInQueue(info *DomainInfo, time time.Duration) {

}

func (r *dnsController) processDomains(cds []*clusterv3.Cluster) {
domains, hostNames := getPendingResolveDomain(cds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two maps with same keys? If so can you pass domains to RemoveUnwatchDomain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pendingResolveDomain structure is in the ads package. If domains are transferred to RemoveUnwantedDomain, cyclic invoking of the package is triggered.
The pendingResolveDomain is placed in the ads package because it contains clusters, which is unique to the kernel-native mode. So I wanted to put it in my ads bag.

// Already have record in dns cache
if addresses != nil {
// k(domain) has been resolved, triggering refreshWorker.
r.dnsResolver.DnsChan <- k
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not write a same channel in two packages

Signed-off-by: LiZhenCheng9527 <[email protected]>
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot merged commit 5745678 into kmesh-net:main Apr 8, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants