From 286d0055d99eadcf775b53223aebed8743526df7 Mon Sep 17 00:00:00 2001 From: Joao Pedro Silva Date: Wed, 31 Jul 2024 17:52:21 +0200 Subject: [PATCH] FIX: Reconcile treafik service with canary at 0 Setting the weight to 100 on both services makes 50% of the traffic go to each service. This made our canary enter an infinity loop while promoting a new version and the traefik service go altered. The traefik service should not be changed as it is managed by flagger but getting stuck in an infinity loop is not great. The loop happened because during promotion with `StepWeightPromotion` when the traefik service gets reconciled the weights are reset. After that the getroutes makes [this calculus](https://github.com/fluxcd/flagger/blob/9a224a0c906354fcfcbc01d4d2df987389301e68/pkg/router/traefik.go#L163-L164) for the weights which returns 0 for the canary and then it would later not be able to exit [this](https://github.com/fluxcd/flagger/blob/v1.36.1/pkg/controller/scheduler.go#L491-L546). Besides this change do you know why are we treating the weights as percentages? Should I also change the get routes function to calculate the percentage based on the weights or it is coded like that because it is expected that flagger keeps the weights with those constraints? Signed-off-by: Joao Pedro Silva --- pkg/router/traefik.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pkg/router/traefik.go b/pkg/router/traefik.go index 0625fecbe..a87981ae9 100644 --- a/pkg/router/traefik.go +++ b/pkg/router/traefik.go @@ -108,7 +108,7 @@ func (tr *TraefikRouter) Reconcile(canary *flaggerv1.Canary) error { Name: canaryName, Namespace: canary.Namespace, Port: canary.Spec.Service.Port, - Weight: 100, + Weight: 0, }, ) }