Skip to content
This repository has been archived by the owner on Aug 17, 2023. It is now read-only.

Create dev mode instructions for operator deployment #467

Open
Tomcli opened this issue Dec 8, 2020 · 15 comments
Open

Create dev mode instructions for operator deployment #467

Tomcli opened this issue Dec 8, 2020 · 15 comments

Comments

@Tomcli
Copy link
Member

Tomcli commented Dec 8, 2020

Currently, the operator always watching the Kubeflow resources to reconcile when something is missing. This is good for production environment, but not very friendly when we need to remove and test resources in our development and testing setup. It would be nice to have a dev_mode flag to disable the operator watcher for development.

/cc @moficodes

@vpavlin
Copy link
Member

vpavlin commented Dec 8, 2020

How would the reconcilation be triggered then? (I.e. what would the operator do if not watch:) )

@moficodes
Copy link
Contributor

I think the goal is to make the operator more like kfctl. With the dev mode operator is just wrapping kfctl running the command once and thats it.

its useful for quickly iterating and testing the operator deployment.

@moficodes
Copy link
Contributor

I can take a look at it.

@moficodes
Copy link
Contributor

/assign

@vpavlin
Copy link
Member

vpavlin commented Dec 8, 2020

Why not just use kfctl then?

Or even better, use the operator-sdk tooling for development - https://github.com/operator-framework/getting-started#2-run-locally-outside-the-cluster

@Tomcli
Copy link
Member Author

Tomcli commented Dec 8, 2020

This is coming from one of our users who doesn't have much experience as a devops. We probably don't have to disable all the watchers, we only want to disable the watcher for monitoring the k8s resources https://github.com/kubeflow/kfctl/blob/master/pkg/controller/kfdef/kfdef_controller.go#L119

@Tomcli
Copy link
Member Author

Tomcli commented Dec 8, 2020

also, this is an opt-out feature, so it shouldn't change the behavior of the current operator deployment.

@Tomcli
Copy link
Member Author

Tomcli commented Dec 8, 2020

Why not just use kfctl then?

Or even better, use the operator-sdk tooling for development - https://github.com/operator-framework/getting-started#2-run-locally-outside-the-cluster

For most of our users, kfctl is sufficient in this case. However, we have some users that are using window or have very little experience with terminal. So able to use operator for development would be nice for them.

@vpavlin
Copy link
Member

vpavlin commented Dec 8, 2020

Can you help me to understand the use case again - maybe with more details? It sounds like there is a very specific case which would get treatment in the operator where it should rather be treated by educating the user(s).

@Tomcli
Copy link
Member Author

Tomcli commented Dec 8, 2020

Since the default behavior for operator now is to reapply the kfdef if there a delete event from any kfctl resource, users that made changes to the Kubeflow deployment with kubectl edit instead of updating kubeflow/manifests will lose their configuration. I do agree educating the users is the right approach, but I'm seeing some users are afraid to use operator when they see a big learning curve for deployment.

I suggest only use this flag for users that are deploying Kubeflow by themselves in a dev setup. So those who are interested in the Kubeflow project will be more committed to learn about kustomize and kfdef to deploy Kubeflow with the operator in the right way.

@nakfour
Copy link
Member

nakfour commented Dec 10, 2020

@Tomcli I don't think it is a good idea to override a normal operator workflow to satisfy a small set of users. Another option they can do as @tumido pointed out is to install the operator, install Kubeflow and then pull down the operator pod instance to 0. This will remove the operator pod watching and doing the reconcile function. I am absolutely not a fan of adding code that breaks the fundamental function of an operator.

@Tomcli
Copy link
Member Author

Tomcli commented Dec 10, 2020

Thanks @nakfour, pull down the operator pod instance to 0 can be a good option. Then we probably want to add some instructions for:

  1. How to stop watching kubeflow deployment (using kubectl, k8s/ocp ui to cover different audiences)
  2. When to resume watching (e.g. deleting kubeflow, update kfdef)

Hopefully this way we should able to help out our users without changing the operator behaviors.

@Tomcli Tomcli changed the title Create a dev mode for operator deployment Create a dev mode instructions for operator deployment Dec 10, 2020
@Tomcli Tomcli changed the title Create a dev mode instructions for operator deployment Create dev mode instructions for operator deployment Dec 10, 2020
@tumido
Copy link

tumido commented Dec 10, 2020

I was looking for this issue and couldn't find it.😁

Precisely as @nakfour says. My experience with dev setup, when working on adjusting ODH components, I've found out that only either manual kfctl or scaling down the operator after the initial deploy gives me the control I need.

If you need to test the operator interaction with your kfdef, the best way is to let it operate. And if you need to manually modify the manifests after the initial deploy, you should pause the operator - scaling it down is by far the most easy option.

This way you also have control over the updated manifests from the repositories specified in kfdef since the operator holds the repository cache in the pods, so when you scale it up again, you have the most fresh manifests available.

I think, if you need to do manual adjustmets, you need to turn the autopilot off first.

@tumido
Copy link

tumido commented Dec 11, 2020

btw, @Tomcli this way the whole "dev mode" toggle experience can be as simple as this:

Disable operator

oc patch deployment opendatahub-operator -n openshift-operators -p '{"spec":{"replicas":0}}'

Enable operator

oc patch deployment opendatahub-operator -n openshift-operators -p '{"spec":{"replicas":1}}'

you can also alias it in you bash to something shorter, which makes it even more convenient to use. 🙂

@Tomcli
Copy link
Member Author

Tomcli commented Dec 11, 2020

Thanks @tumido, I can add these instructions to the kubeflow/website and close this issue.

@moficodes moficodes removed their assignment Feb 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants