-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snowbridge v3 #1068
base: main
Are you sure you want to change the base?
Snowbridge v3 #1068
Conversation
✅ Deploy Preview for snowplow-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Note that we may wish to hold off on merging until such a time as we're ready to announce |
|
||
As of Snowbridge 2.4.2, the kinesis target does not treat kinesis write throughput exceptions as this type of failure. Rather it has an in-built backoff and retry, which will persist until each event in the batch is either successful, or fails for a different reason. | ||
|
||
Before verst 3.0.0, Snowbridge treats every kind of target failure the same - it will retry 5 times. If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes, and will proceed without acking the failed Messages. As long as the source's acking model allows for it, these will be re-processed through Snowbridge again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before verst 3.0.0, Snowbridge treats every kind of target failure the same - it will retry 5 times. If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes, and will proceed without acking the failed Messages. As long as the source's acking model allows for it, these will be re-processed through Snowbridge again. | |
Before version 3.0.0, Snowbridge treats every kind of target failure the same - it will retry 5 times. If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes, and will proceed without acking the failed Messages. As long as the source's acking model allows for it, these will be re-processed through Snowbridge again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes,
Is it true though? It sounds like MsgFailed
is reported only when ALL 5 attempts fail, but I think it's reported after each write failure, no?
|
||
A transient failure is a failure which we expect to succeed again on retry. For example some temporary network error, or when we encounter throttling. Typically you would configure a short backoff for this type of failure. When we encounter a transient failure, we keep processing the rest of the data as normal, under the expectation that everyhting is operating as normal. The failed data is retried after a backoff. | ||
|
||
A setup failure is one which we don't expect to be immediately resolved, for example an incorrect address, or an invalid API Key. Typically you would configue a long backoff for this type of failure, under the assumption that the issue needs to be fixed with either a configuration change or a change to the target itself (eg. permissions need to be granted). When we encounter a setup error, we stop attempting to process any data, and the whole app waits for the backoff period before trying again. Setup errors will be retried 5 times, before the app crashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we stop attempting to process any data
We don't do anything explicit to stop processing in this case. Right now setup is pretty much like transient, but with much longer backoff. In the future we might add monitoring/alerts/health toggle for setup errors, but it's not there now.
In practice, if you mark your HTTP response as setup error in config, it probably means nothing gets through and we indeed 'stop' processing anything. But there is no code in Snowbridge that would say stop pulling from source now, we hit setup error!
.
Theoretically it's possible to have both: setup and transient simultaneously. Then it means your setup error probably shouldn't be configured as setup error.
|
||
A setup failure is one which we don't expect to be immediately resolved, for example an incorrect address, or an invalid API Key. Typically you would configue a long backoff for this type of failure, under the assumption that the issue needs to be fixed with either a configuration change or a change to the target itself (eg. permissions need to be granted). When we encounter a setup error, we stop attempting to process any data, and the whole app waits for the backoff period before trying again. Setup errors will be retried 5 times, before the app crashes. | ||
|
||
As of v3.0.0, only the http target can be configured to return setup errors, via the response rules feature - configuration details for response rules can be found in [the http target configuration section](/docs/destinations/forwarding-events/snowbridge/configuration/targets/http/index.md). For all other targets, all errors returned will be considered transient, and behaviour can be configured using the `tranisent` block of the retry configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of v3.0.0, only the http target can be configured to return setup errors, via the response rules feature - configuration details for response rules can be found in [the http target configuration section](/docs/destinations/forwarding-events/snowbridge/configuration/targets/http/index.md). For all other targets, all errors returned will be considered transient, and behaviour can be configured using the `tranisent` block of the retry configuration. | |
As of v3.0.0, only the http target can be configured to return setup errors, via the response rules feature - configuration details for response rules can be found in [the http target configuration section](/docs/destinations/forwarding-events/snowbridge/configuration/targets/http/index.md). For all other targets, all errors returned will be considered transient, and behaviour can be configured using the `transient` block of the retry configuration. |
|
||
`setup` means that this error is not retryable, but is something which can only be resolved by a change in configuration or a change to the target. An example of this is an authentication failure - retrying will fix the issue, the resolution is to grant the appropriate permissions, or provide the correct API key. | ||
|
||
Data that matches a setup response rule is handled by a retey as determined in the `setup` configuration block of [retry configuration](/docs/destinations/forwarding-events/snowbridge/configuration/retries/index.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data that matches a setup response rule is handled by a retey as determined in the `setup` configuration block of [retry configuration](/docs/destinations/forwarding-events/snowbridge/configuration/retries/index.md). | |
Data that matches a setup response rule is handled by a retry as determined in the `setup` configuration block of [retry configuration](/docs/destinations/forwarding-events/snowbridge/configuration/retries/index.md). |
|
||
`jq` runs a jq command on the message data, and outputs the result of the command. While jq supports multi-element results, commands must output only a single element - this single element can be an array data type. | ||
|
||
The provided command must return a boolean result. `false` filters the message out, `true` keeps it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this line shouldn't be here as it's only for filter, right?
PR to update docs for Snowbridge v3
Things to check:
@pondzix - I would appreciate your paying close attention to whether everything is accurate. Specifically I'm not 100% sure of the part that states that setup errors mean we'll stop processing all data for the retry period. Is that accurate?
@stanch - in addition to a normal PR review, I have marked some features as beta, wdyt about this? Happy to discuss to explain!