-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft PR for #6722 -- cache TTLs for individual IP addresses in DNS responses. #6732
base: main
Are you sure you want to change the base?
Conversation
I think i should have branched off cleanly from main so that the commits i made for e2e tests wouldnt have appeared here. Sorry about that. First commit for this branch is initial commit towards resolving issue #6722 and second commit is added test case |
if _, exist := oldDNSMeta.responseIPs[ipStr]; !exist { | ||
ipMetaDataHolder[ipStr] = ipWithTTL{ | ||
ip: ip, | ||
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better if you call time.Now()
a single time, earlier in the function
// This old IP also exists in current response, so update it with new received TTl. | ||
ipMetaDataHolder[ipStr] = ipWithTTL{ | ||
ip: ipMetaData.ip, | ||
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the new TTL (or rather expiration time) should be the max of the old value and the new value
This way we always honor the TTL that was received previously by other queries potentially from other apps.
|
||
type ipWithTTL struct { | ||
ip net.IP | ||
ttl time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a ttl is usually a duration, so I would suggest retaining the expirationTime
name for this field
test/e2e/fqdn_dns_cache_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the changes from your other PR from this PR
if _, exist := oldDNSMeta.responseIPs[ipStr]; !exist { | ||
ipMetaDataHolder[ipStr] = ipWithTTL{ | ||
ip: ip, | ||
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, parseDNSResponse
should probably return map[string]uint32
(ip->ttl).
Signed-off-by: Hemant <[email protected]>
Signed-off-by: Hemant <[email protected]>
d491609
to
5fbf0f1
Compare
…dn_test.go Signed-off-by: Hemant <[email protected]>
ipMetaDataHolder[ipStr] = ipWithTTL{ | ||
ip: ip, | ||
ttl: time.Now().Add(time.Duration(lowestTTL) * time.Second), | ||
ip: ipMeta.ip, | ||
expirationTime: ipMeta.expirationTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this equivalent to ipMetaDataHolder[ipStr] = ipMeta
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -489,22 +500,22 @@ func (f *fqdnController) onDNSResponse( | |||
} | |||
// The FQDN will be added to the queue only after `lowestTTL` value which | |||
// would already have been derived using the minTTL logic. | |||
f.dnsQueryQueue.AddAfter(fqdn, time.Duration(lowestTTL)*time.Second) | |||
f.dnsQueryQueue.AddAfter(fqdn, maxTimeToReQuery.Sub(currentTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem right: the time after which we should automatically re-query should be a "min", not a "max"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right , i mean we need Min because when any of these IP expires its is then that we have to send DNS queries . Got your point.
-- updated min and max logic for individual IPs and for adding items to queue updated tests. Signed-off-by: Hemant <[email protected]>
} | ||
fqdn := strings.ToLower(msg.Question[0].Name) | ||
lowestTTL := uint32(math.MaxUint32) // a TTL must exist in the RRs | ||
responseIPs := map[string]net.IP{} | ||
// Case 1: In the upcoming patch an admin would set this to a maximum value, which will be read from the configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that in with the minTTL feature, expirationTime will be maxTime(minTTL, r.Header().Ttl
instead. However, without a minTTL set, it should still be minTime(maxConfiguredTTL, r.Header().Ttl)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dyanngg Get it now, and indeed i was actually getting a little confused as i was trying to put the upcoming patch related modifications here, which i understand now, will happen in upcoming days when i start working on the patch itself.
ipMetaDataHolder := make(map[string]ipWithTTL) | ||
minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999) | ||
|
||
minTime := func(t1, t2 time.Time) time.Time { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: func earlierOf(t1, t2) / laterOf(t1, t2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dyanngg Yes, the names make more meaning.
f.fqdnSelectorMutex.Lock() | ||
defer f.fqdnSelectorMutex.Unlock() | ||
oldDNSMeta, exist := f.dnsEntryCache[fqdn] | ||
ipMetaDataHolder := make(map[string]ipWithTTL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move the variable definitions before the lock, and keep if exist
right beneath the existence check makes it more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dyanngg Thank you , that was helpful in sense that i did realise that the code looked more readable with ease after this.
ipMetaDataHolder[ipStr] = ipWithTTL{ | ||
ip: ipMeta.ip, | ||
expirationTime: ipMeta.expirationTime, | ||
} | ||
minTimeToReQuery = minTime(ipMeta.expirationTime, minTimeToReQuery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since thees lines appear four times at least, might make sense to make it a helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dyanngg Yes, that was a nice improvement.
earlierOf and laterOf functions. reeduced verbosity by converting repeated code as anonymous function Signed-off-by: Hemant <[email protected]>
@@ -76,11 +76,16 @@ func (fs *fqdnSelectorItem) matches(fqdn string) bool { | |||
// expirationTime of the records, which is the DNS response | |||
// receiving time plus lowest applicable TTL. | |||
type dnsMeta struct { | |||
expirationTime time.Time | |||
//expirationTime time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove if no longer needed
@@ -253,8 +258,8 @@ func (f *fqdnController) getIPsForFQDNSelectors(fqdns []string) []net.IP { | |||
} | |||
for fqdn := range fqdnsMatched { | |||
if dnsMeta, ok := f.dnsEntryCache[fqdn]; ok { | |||
for _, ip := range dnsMeta.responseIPs { | |||
matchedIPs = append(matchedIPs, ip) | |||
for _, ipWithMetaData := range dnsMeta.responseIPs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/ipWithMetaData/ipData
|
||
currentTime := time.Now() | ||
ipWithTTLMap := make(map[string]ipWithTTL) | ||
minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what time is this supposed to be? At the very least there should be a comment to explain. But based on usage, I am not sure why we are not just using the zero value for time.Time
:
var minTimeToReQuery time.Time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas Actually ,
minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999)
i had chosen above expression to establish a reference point for tracking the minimum time to re-query DNS requests as IPs expire. Since our goal in this PR for this bug/issue is to send requests as and when an IP expires, initializing minTimeToReQuery
to a maximum time value allows us to compare and update it as we process various time values , ultimately allowing us to determine the earliest re-query time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you could do though is to make earlierOf
take time.Time
pointers as parameters, and return the non-nil value if one of t1/t2 is nil. That way you would not need to initialize minTimeToReQuery
. Note that in that case the return value should also be a pointer IMO (since earlierOf(nil, nil) is nil), and when it's used eventually you should check that minTimeToReQuery
is not nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dyanngg , yes i think using that approach we can avoid initialising minTimeToReQuery
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the pointer solution, but it seems to me that we could just pick a reasonable time such as now + 24*time.Hour
and use that (with an appropriate comment of course). The time is only used if len(ipWithTTLMap) > 0
anyway. And it seems reasonable to resend a query after 24h even if we don't really need to? At least it won't impact correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i think this was the point i was missing . Maybe i had set it to too large a value under assumption that i needed a very large value. The 24 hour time frame seems more reasonable as a replacement to existing large value that i had put.
minTimeToReQuery := time.Unix(1<<63-62135596801, 999999999) | ||
addressUpdate := false | ||
|
||
ipMapFiller := func(ip string, ipMeta ipWithTTL, minTime *time.Time) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of this function (ipMapFiller
) is really hurting the readability of the code. I would suggest something like addIPtoMap
/ addIPToCache
/ addIPToNewCache
/ setIPInMap
/ ...
Additionally, minTime
does not need to be a parameter to this function: you are always using &minTimeToReQuery
as the argument value, and minTimeToReQuery
is captured by the closure (by reference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes , my bad. The closure captures the variable minTimeToReQuery
and should not be passed as an argument.
t.Errorf("Expected TTL for %s to be %v, got %v", ipStr, expectedTTL, ipMeta.expirationTime) | ||
} | ||
} | ||
println("\n\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use a print statement in a test like this
// sample IP with time simulating expired time, i thought that using negative time will | ||
// simulate expired time as it will always equate to before when compared to any time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be more concise in your comments:
// expired IP that should be removed from the cache when onDNSResponse is called
// we get new IP | ||
"192.0.2.3": {ip: net.ParseIP("192.0.2.3"), expirationTime: currentTime.Add(10 * time.Second)}, | ||
// and an exisiting IP | ||
"192.0.2.1": {ip: net.ParseIP("192.0.2.1"), expirationTime: currentTime.Add(10 * time.Second)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the test more useful if the new TTL for this IP is different from the current one? If the new TTL in the response is greater than the existing one, the new one should be sued. If it is less than the existing one, the current one should remain in use.
}, | ||
}, | ||
{ | ||
name: "old IP expired", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a strict subset of the previous test, and doesn't need to be included
there are 2 possible approaches here:
- have a single unit test case with multiple different IPs, ensuring that all the branches / cases of the
onDNSResponse
are tested - have multiple test cases, each testing a different branch / scenario
Usually, we prefer the second approach because it makes test failures easier to identify / troubleshoot.
test/e2e/framework.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes are unrelated to the rest of the PR, and should be removed
…ead of previous very large value. changed the name of closure to addIPToCache refactored addIPToCache to skip passing of minTimeToReQuery as parameter as its accessible automatically being inside scope of outer function. updated tests to cover inidividual scenarios . Signed-off-by: Hemant <[email protected]>
f.setFQDNMatchSelector(fqdn, selectorItem) | ||
for ipStr, ipMeta := range newDNSresponseIPs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not make sense to be in part of the for loop... We don't want to call addIPToCache
for as many times as the number of selectorItems matching the fqdn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dyanngg yes updated it.
if mustCacheResponse { | ||
|
||
if len(ipWithTTLMap) > 0 { | ||
addressUpdate = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. Isn't addressUpdate
supposed to be determined by whether FQDN <-> IP mappings have changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dyanngg updated .
@@ -405,80 +409,102 @@ func (f *fqdnController) deleteRuleSelectedPods(ruleID string) error { | |||
|
|||
func (f *fqdnController) onDNSResponse( | |||
fqdn string, | |||
responseIPs map[string]net.IP, | |||
lowestTTL uint32, | |||
newDNSresponseIPs map[string]ipWithTTL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the camel format, it should be newDNSResponseIPs
, however, I feel DNS and response is kind of duplicate with the function name, and given you name the struct ipWithTTL
, I would just name the parameter newIPWithTTLs
, or newIPWithTTLMap
to correspond to the variable at L424.
//minTimeToReQuery Establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire. | ||
minTimeToReQuery := time.Now().Add(24 * time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//minTimeToReQuery Establishes a maximum reference time for tracking the minimum re-query time to DNS, as IPs expire. | |
minTimeToReQuery := time.Now().Add(24 * time.Hour) | |
// minTimeToRequery establishes a maximum reference time for tracking the minimum requery time to DNS, as IPs expire. | |
minTimeToRequery := time.Now().Add(24 * time.Hour) |
// This IP entry has already expired and not seen in the latest DNS response. | ||
// It should be removed from the cache. | ||
|
||
for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs { | |
for cachedIPStr, cachedIPMeta := range cachedDNSMeta.responseIPs { |
// It should be removed from the cache. | ||
|
||
for cachedIpStr, cachedIpMeta := range cachedDNSMeta.responseIPs { | ||
if _, exist := newDNSresponseIPs[cachedIpStr]; !exist { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if _, exist := newDNSresponseIPs[cachedIpStr]; !exist { | |
if newIPMeta, exist := newDNSresponseIPs[cachedIpStr]; !exist { |
then use it at L461, instead of querying the map another time.
if mustCacheResponse { | ||
|
||
if len(ipWithTTLMap) > 0 { | ||
addressUpdate = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when no IP is changed, I think the map is not empty, why this is set to true?
Signed-off-by: Hemant <[email protected]>
This is a draft PR for work towards #6722 .