Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft PR for #6722 -- cache TTLs for individual IP addresses in DNS responses. #6732

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

hkiiita
Copy link
Member

@hkiiita hkiiita commented Oct 11, 2024

This is a draft PR for work towards #6722 .

@hkiiita
Copy link
Member Author

hkiiita commented Oct 11, 2024

I think i should have branched off cleanly from main so that the commits i made for e2e tests wouldnt have appeared here. Sorry about that.

First commit for this branch is initial commit towards resolving issue #6722

and second commit is added test case

if _, exist := oldDNSMeta.responseIPs[ipStr]; !exist {
ipMetaDataHolder[ipStr] = ipWithTTL{
ip: ip,
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better if you call time.Now() a single time, earlier in the function

// This old IP also exists in current response, so update it with new received TTl.
ipMetaDataHolder[ipStr] = ipWithTTL{
ip: ipMetaData.ip,
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the new TTL (or rather expiration time) should be the max of the old value and the new value
This way we always honor the TTL that was received previously by other queries potentially from other apps.


type ipWithTTL struct {
ip net.IP
ttl time.Time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a ttl is usually a duration, so I would suggest retaining the expirationTime name for this field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the changes from your other PR from this PR

if _, exist := oldDNSMeta.responseIPs[ipStr]; !exist {
ipMetaDataHolder[ipStr] = ipWithTTL{
ip: ip,
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we don't really want the concept of "lowest TTL" from a specific DNS response any more. Each IP included in the response should have its own TTL, so parseDNSResponse should associate the correct TTL to each IP

Am I missing something @tnqn @Dyanngg ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, parseDNSResponse should probably return map[string]uint32 (ip->ttl).

Comment on lines 446 to 448
ipMetaDataHolder[ipStr] = ipWithTTL{
ip: ip,
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second),
ip: ipMeta.ip,
expirationTime: ipMeta.expirationTime,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this equivalent to ipMetaDataHolder[ipStr] = ipMeta?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -489,22 +500,22 @@ func (f *fqdnController) onDNSResponse(
}
// The FQDN will be added to the queue only after `lowestTTL` value which
// would already have been derived using the minTTL logic.
f.dnsQueryQueue.AddAfter(fqdn, time.Duration(lowestTTL)*time.Second)
f.dnsQueryQueue.AddAfter(fqdn, maxTimeToReQuery.Sub(currentTime))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem right: the time after which we should automatically re-query should be a "min", not a "max"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right , i mean we need Min because when any of these IP expires its is then that we have to send DNS queries . Got your point.

 -- updated min and max logic for individual IPs and for adding items to queue
updated tests.

Signed-off-by: Hemant <[email protected]>
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
}
fqdn := strings.ToLower(msg.Question[0].Name)
lowestTTL := uint32(math.MaxUint32) // a TTL must exist in the RRs
responseIPs := map[string]net.IP{}
// Case 1: In the upcoming patch an admin would set this to a maximum value, which will be read from the configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that in with the minTTL feature, expirationTime will be maxTime(minTTL, r.Header().Ttl instead. However, without a minTTL set, it should still be minTime(maxConfiguredTTL, r.Header().Ttl)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg Get it now, and indeed i was actually getting a little confused as i was trying to put the upcoming patch related modifications here, which i understand now, will happen in upcoming days when i start working on the patch itself.

ipMetaDataHolder := make(map[string]ipWithTTL)
minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)

minTime := func(t1, t2 time.Time) time.Time {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: func earlierOf(t1, t2) / laterOf(t1, t2)

Copy link
Member Author

@hkiiita hkiiita Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg Yes, the names make more meaning.

f.fqdnSelectorMutex.Lock()
defer f.fqdnSelectorMutex.Unlock()
oldDNSMeta, exist := f.dnsEntryCache[fqdn]
ipMetaDataHolder := make(map[string]ipWithTTL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move the variable definitions before the lock, and keep if exist right beneath the existence check makes it more readable

Copy link
Member Author

@hkiiita hkiiita Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg Thank you , that was helpful in sense that i did realise that the code looked more readable with ease after this.

Comment on lines 492 to 496
ipMetaDataHolder[ipStr] = ipWithTTL{
ip: ipMeta.ip,
expirationTime: ipMeta.expirationTime,
}
minTimeToReQuery = minTime(ipMeta.expirationTime, minTimeToReQuery)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since thees lines appear four times at least, might make sense to make it a helper.

Copy link
Member Author

@hkiiita hkiiita Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg Yes, that was a nice improvement.

earlierOf and laterOf functions.
reeduced verbosity by converting repeated code as anonymous function

Signed-off-by: Hemant <[email protected]>
@@ -76,11 +76,16 @@ func (fs *fqdnSelectorItem) matches(fqdn string) bool {
// expirationTime of the records, which is the DNS response
// receiving time plus lowest applicable TTL.
type dnsMeta struct {
expirationTime time.Time
//expirationTime time.Time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if no longer needed

pkg/agent/controller/networkpolicy/fqdn.go Show resolved Hide resolved
@@ -253,8 +258,8 @@ func (f *fqdnController) getIPsForFQDNSelectors(fqdns []string) []net.IP {
}
for fqdn := range fqdnsMatched {
if dnsMeta, ok := f.dnsEntryCache[fqdn]; ok {
for _, ip := range dnsMeta.responseIPs {
matchedIPs = append(matchedIPs, ip)
for _, ipWithMetaData := range dnsMeta.responseIPs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/ipWithMetaData/ipData


currentTime := time.Now()
ipWithTTLMap := make(map[string]ipWithTTL)
minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what time is this supposed to be? At the very least there should be a comment to explain. But based on usage, I am not sure why we are not just using the zero value for time.Time:

var minTimeToReQuery time.Time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas Actually ,

minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)

i had chosen above expression to establish a reference point for tracking the minimum time to re-query DNS requests as IPs expire. Since our goal in this PR for this bug/issue is to send requests as and when an IP expires, initializing minTimeToReQuery to a maximum time value allows us to compare and update it as we process various time values , ultimately allowing us to determine the earliest re-query time.

Copy link
Contributor

@Dyanngg Dyanngg Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you could do though is to make earlierOf take time.Time pointers as parameters, and return the non-nil value if one of t1/t2 is nil. That way you would not need to initialize minTimeToReQuery. Note that in that case the return value should also be a pointer IMO (since earlierOf(nil, nil) is nil), and when it's used eventually you should check that minTimeToReQuery is not nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dyanngg , yes i think using that approach we can avoid initialising minTimeToReQuery .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the pointer solution, but it seems to me that we could just pick a reasonable time such as now + 24*time.Hour and use that (with an appropriate comment of course). The time is only used if len(ipWithTTLMap) > 0 anyway. And it seems reasonable to resend a query after 24h even if we don't really need to? At least it won't impact correctness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think this was the point i was missing . Maybe i had set it to too large a value under assumption that i needed a very large value. The 24 hour time frame seems more reasonable as a replacement to existing large value that i had put.

minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)
addressUpdate := false

ipMapFiller := func(ip string, ipMeta ipWithTTL, minTime *time.Time) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of this function (ipMapFiller) is really hurting the readability of the code. I would suggest something like addIPtoMap / addIPToCache / addIPToNewCache / setIPInMap / ...

Additionally, minTime does not need to be a parameter to this function: you are always using &minTimeToReQuery as the argument value, and minTimeToReQuery is captured by the closure (by reference).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , my bad. The closure captures the variable minTimeToReQuery and should not be passed as an argument.

t.Errorf("Expected TTL for %s to be %v, got %v", ipStr, expectedTTL, ipMeta.expirationTime)
}
}
println("\n\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use a print statement in a test like this

Comment on lines 604 to 605
// sample IP with time simulating expired time, i thought that using negative time will
// simulate expired time as it will always equate to before when compared to any time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be more concise in your comments:

// expired IP that should be removed from the cache when onDNSResponse is called

// we get new IP
"192.0.2.3": {ip: net.ParseIP("192.0.2.3"), expirationTime: currentTime.Add(10 * time.Second)},
// and an exisiting IP
"192.0.2.1": {ip: net.ParseIP("192.0.2.1"), expirationTime: currentTime.Add(10 * time.Second)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the test more useful if the new TTL for this IP is different from the current one? If the new TTL in the response is greater than the existing one, the new one should be sued. If it is less than the existing one, the current one should remain in use.

},
},
{
name: "old IP expired",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a strict subset of the previous test, and doesn't need to be included

there are 2 possible approaches here:

  1. have a single unit test case with multiple different IPs, ensuring that all the branches / cases of the onDNSResponse are tested
  2. have multiple test cases, each testing a different branch / scenario

Usually, we prefer the second approach because it makes test failures easier to identify / troubleshoot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes are unrelated to the rest of the PR, and should be removed

…ead of previous very large value.

changed the name of closure to addIPToCache
refactored addIPToCache to skip passing of minTimeToReQuery as parameter as its accessible automatically being inside scope of outer function.
updated tests to cover inidividual scenarios .

Signed-off-by: Hemant <[email protected]>
f.setFQDNMatchSelector(fqdn, selectorItem)
for ipStr, ipMeta := range newDNSresponseIPs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not make sense to be in part of the for loop... We don't want to call addIPToCache for as many times as the number of selectorItems matching the fqdn

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg yes updated it.

if mustCacheResponse {

if len(ipWithTTLMap) > 0 {
addressUpdate = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. Isn't addressUpdate supposed to be determined by whether FQDN <-> IP mappings have changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dyanngg updated .

@@ -405,80 +409,102 @@ func (f *fqdnController) deleteRuleSelectedPods(ruleID string) error {

func (f *fqdnController) onDNSResponse(
fqdn string,
responseIPs map[string]net.IP,
lowestTTL uint32,
newDNSresponseIPs map[string]ipWithTTL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the camel format, it should be newDNSResponseIPs, however, I feel DNS and response is kind of duplicate with the function name, and given you name the struct ipWithTTL, I would just name the parameter newIPWithTTLs, or newIPWithTTLMap to correspond to the variable at L424.

Comment on lines 426 to 427
//minTimeToReQuery Establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
minTimeToReQuery := time.Now().Add(24 * time.Hour)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//minTimeToReQuery Establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire.
minTimeToReQuery := time.Now().Add(24 * time.Hour)
// minTimeToRequery establishes a maximum reference time for tracking the minimum requery time to DNS, as IPs expire.
minTimeToRequery := time.Now().Add(24 * time.Hour)

// This IP entry has already expired and not seen in the latest DNS response.
// It should be removed from the cache.

for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs {
for cachedIPStr, cachedIPMeta := range cachedDNSMeta.responseIPs {

// It should be removed from the cache.

for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs {
if _, exist := newDNSresponseIPs[cachedIpStr]; !exist {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if _, exist := newDNSresponseIPs[cachedIpStr]; !exist {
if newIPMeta, exist := newDNSresponseIPs[cachedIpStr]; !exist {

then use it at L461, instead of querying the map another time.

if mustCacheResponse {

if len(ipWithTTLMap) > 0 {
addressUpdate = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when no IP is changed, I think the map is not empty, why this is set to true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants