Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace shared-informers' methods with cached client #461

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 17 additions & 15 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ import (
toolchainv1alpha1 "github.com/codeready-toolchain/api/api/v1alpha1"
"github.com/codeready-toolchain/registration-service/pkg/auth"
"github.com/codeready-toolchain/registration-service/pkg/configuration"
"github.com/codeready-toolchain/registration-service/pkg/informers"
"github.com/codeready-toolchain/registration-service/pkg/log"
"github.com/codeready-toolchain/registration-service/pkg/proxy"
"github.com/codeready-toolchain/registration-service/pkg/proxy/metrics"
Expand Down Expand Up @@ -85,12 +84,7 @@ func main() {
}
}

informer, informerShutdown, err := informers.StartInformer(cfg)
if err != nil {
panic(err.Error())
}

app, err := server.NewInClusterApplication(*informer)
app, err := server.NewInClusterApplication(cl)
if err != nil {
panic(err.Error())
}
Expand Down Expand Up @@ -121,11 +115,6 @@ func main() {
}
proxySrv := p.StartProxy(proxy.DefaultPort)

// stop the informer when proxy server shuts down
proxySrv.RegisterOnShutdown(func() {
informerShutdown <- struct{}{}
})

// ---------------------------------------------
// Registration Service
// ---------------------------------------------
Expand Down Expand Up @@ -202,9 +191,22 @@ func newCachedClient(ctx context.Context, cfg *rest.Config) (client.Client, erro

// populate the cache backed by shared informers that are initialized lazily on the first call
// for the given GVK with all resources we are interested in from the host-operator namespace
objectsToList := []client.ObjectList{&toolchainv1alpha1.ToolchainConfigList{}, &corev1.SecretList{}}
for i := range objectsToList {
if err := hostCluster.GetClient().List(ctx, objectsToList[i], client.InNamespace(configuration.Namespace())); err != nil {
objectsToList := map[string]client.ObjectList{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how we could make it easier to NOT forget to update this list when we need to access more resource kinds via the client... Maybe add a comment to https://github.com/codeready-toolchain/host-operator/blob/master/deploy/registration-service/registration-service.yaml#L20 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We list only those objects we want to cache upfront. As it is mentioned in the comment above this code, the cache works in a lazy way so it starts caching GVK resources only after it is fetched for the first time.
In other words, the cache is populated only after making the first call on the GVK resource.

If we didn't do the list here, then the very first call to the reg-service/proxy would take a lot of time because the client would start populating the cache with the resources.
By listing the resources before the reg-service pod gets ready, we ensure that the cache is fully synced with all resources we need to have cached.

Copy link
Contributor

@alexeykazakov alexeykazakov Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got that. My point is that it's easy to forget to add a new resource to this list. Let's say we are introducing some change to the proxy so it now needs to access a new resource with kind X. If we don't add it to this list then the proxy performance may degrade for the first request. It will degrade again after the proxy is restarted for every replica. So for example if there is 10 replicas of reg-service/proxy then up to 10 requests per deployment/version would be impacted.
You can't access a new kind without adding this X resource to the operator role in https://github.com/codeready-toolchain/host-operator/blob/master/deploy/registration-service/registration-service.yaml#L20
So I was wondering if that yaml is a good place to add a reminder to update the cache initialization if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the call:

  1. I'll add a comment to the Role template
  2. I'll drop the ConfigMap from the Role
  3. I'll try to make the list for loop a bit more generic so it goes over all toolchain kinds except for some specific ones like TierTemplate which we don't want to cache (there can be many of them and they are huge resources)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for 1. and 2. see the PRs:
codeready-toolchain/host-operator#1089
#463

As for the 3. item, it's easily doable to go over all kinds of the toolchain api, but there is one problem. The scheme contains api of both host & member, and at the level of the scheme we cannot decide which one is the host resource and which one belongs to the member. We could list available api groups from the cluster and exclude those kinds that are not available, but this wouldn't work in single-cluster environments. In addition to that, host-operator SA doesn't have permission to read/list member-operator CRDs, so it would fail on authorization as well.
In other words, there are too many complications in doing it in a generic way so it's not worth it.

"MasterUserRecord": &toolchainv1alpha1.MasterUserRecordList{},
"Space": &toolchainv1alpha1.SpaceList{},
"SpaceBinding": &toolchainv1alpha1.SpaceBindingList{},
"ToolchainStatus": &toolchainv1alpha1.ToolchainStatusList{},
"UserSignup": &toolchainv1alpha1.UserSignupList{},
"ProxyPlugin": &toolchainv1alpha1.ProxyPluginList{},
"NSTemplateTier": &toolchainv1alpha1.NSTemplateTierList{},
"ToolchainConfig": &toolchainv1alpha1.ToolchainConfigList{},
"BannedUser": &toolchainv1alpha1.BannedUserList{},
"Secret": &corev1.SecretList{}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Secret": &corev1.SecretList{}}
"Secret": &corev1.SecretList{},
"Config": &corev1.ConfigMapList{}}

While I don't see we actually access CMs in reg-service it's listed here:
https://github.com/codeready-toolchain/host-operator/blob/master/deploy/registration-service/registration-service.yaml#L43
So I wonder should we either add it to the cache initialization or remove it from the role.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Matous added these resources based on the informers we were creating in informers.go. If my memory serves me correctly, the list of resources there are those that are needed in the proxy flow and so we pre-populate the cache for those resources so that the proxy is as fast as possible even on first time use by a user. I don't remember whether ConfigMaps are part of that flow but I don't think it hurts to add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajivnathan is correct, we want to cache only those resources we access in reg-service. Caching anything else doesn't make much sense.

We could theoretically drop a list of some reources like Secrets, NSTemplateTiers, ProxyPlugins, ToolchainStatuses, and ToolchainConfigs because we don't expect many resources to be present in the namespace, so the very first call wouldn't take much more time compared to when it is already cached. But I thought that it's also completely fine to pre-populate the cache with all resources the reg-service and proxy touch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the ConfigMap, if I'm not mistaken, the reg-service doesn't touch any ConfigMap in the toolchain-host-operator namespace. The permission in the Role is most likely a leftover from the time before we introduced ToolchainConfig and used CM to configure the service and operators.
That being said, while it wouldn't theoretically hurt (as we cache resources only in the host-operator namespace) I would rather keep the cache minimal and focused only on those resources that are really accessed by the reg-service.
I would rather avoid the situation when we would keep adding more and more resources to the cache without knowing why we are doing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache can be crucial for performance. Especially for the proxy. And it's easy to miss this (don't add specific resources to the initialization step) during development. We don't have any performance tests in our CI.
If neither reg-service or proxy touches CMs then IMO we should remove it from the role.
And I would argue that it would be safer to keep everything reg-service & proxy access in the cache initialization.

I checked it looks like the CM are not used by reg-service and proxy. So let's create a separate PR to remove it from the role.


for resourceName := range objectsToList {
log.Infof(nil, "Syncing informer cache with %s resources", resourceName)
if err := hostCluster.GetClient().List(ctx, objectsToList[resourceName], client.InNamespace(configuration.Namespace())); err != nil {
log.Errorf(nil, err, "Informer cache sync failed for %s", resourceName)
return nil, err
}
}
Expand Down
11 changes: 11 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ require (
cloud.google.com/go/auth v0.3.0 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.2 // indirect
cloud.google.com/go/compute/metadata v0.3.0 // indirect
github.com/BurntSushi/toml v0.4.1 // indirect
github.com/Masterminds/goutils v1.1.1 // indirect
github.com/Masterminds/semver/v3 v3.1.1 // indirect
github.com/Masterminds/sprig/v3 v3.2.2 // indirect
github.com/ProtonMail/go-crypto v0.0.0-20230217124315-7d5c6f04bbb8 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/bytedance/sonic v1.11.2 // indirect
Expand All @@ -67,6 +71,7 @@ require (
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
github.com/googleapis/gax-go/v2 v2.12.3 // indirect
github.com/gorilla/mux v1.8.0 // indirect
github.com/huandu/xstrings v1.3.1 // indirect
github.com/klauspost/cpuid/v2 v2.2.7 // indirect
github.com/kr/pretty v0.3.1 // indirect
github.com/lestrrat-go/backoff/v2 v2.0.8 // indirect
Expand All @@ -77,7 +82,12 @@ require (
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.4 // indirect
github.com/migueleliasweb/go-github-mock v0.0.18 // indirect
github.com/mitchellh/copystructure v1.0.0 // indirect
github.com/mitchellh/reflectwalk v1.0.0 // indirect
github.com/prometheus/procfs v0.9.0 // indirect
github.com/redhat-cop/operator-utils v1.3.3-0.20220121120056-862ef22b8cdf // indirect
github.com/shopspring/decimal v1.2.0 // indirect
github.com/spf13/cast v1.3.1 // indirect
github.com/twitchyliquid64/golang-asm v0.15.1 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
github.com/valyala/fasttemplate v1.2.2 // indirect
Expand All @@ -94,6 +104,7 @@ require (
google.golang.org/grpc v1.63.2 // indirect
k8s.io/apiextensions-apiserver v0.25.0 // indirect
k8s.io/component-base v0.25.0 // indirect
k8s.io/kubectl v0.24.0 // indirect
k8s.io/utils v0.0.0-20220728103510-ee6ede2d64ed // indirect
)

Expand Down
Loading
Loading