Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowbridge v3 #1068

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Snowbridge v3 #1068

wants to merge 15 commits into from

Conversation

colmsnowplow
Copy link
Contributor

PR to update docs for Snowbridge v3

Things to check:

@pondzix - I would appreciate your paying close attention to whether everything is accurate. Specifically I'm not 100% sure of the part that states that setup errors mean we'll stop processing all data for the retry period. Is that accurate?

@stanch - in addition to a normal PR review, I have marked some features as beta, wdyt about this? Happy to discuss to explain!

Copy link

netlify bot commented Nov 7, 2024

Deploy Preview for snowplow-docs ready!

Name Link
🔨 Latest commit c1109ca
🔍 Latest deploy log https://app.netlify.com/sites/snowplow-docs/deploys/672de6474619650008584d52
😎 Deploy Preview https://deploy-preview-1068--snowplow-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@colmsnowplow
Copy link
Contributor Author

Note that we may wish to hold off on merging until such a time as we're ready to announce


As of Snowbridge 2.4.2, the kinesis target does not treat kinesis write throughput exceptions as this type of failure. Rather it has an in-built backoff and retry, which will persist until each event in the batch is either successful, or fails for a different reason.

Before verst 3.0.0, Snowbridge treats every kind of target failure the same - it will retry 5 times. If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes, and will proceed without acking the failed Messages. As long as the source's acking model allows for it, these will be re-processed through Snowbridge again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Before verst 3.0.0, Snowbridge treats every kind of target failure the same - it will retry 5 times. If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes, and will proceed without acking the failed Messages. As long as the source's acking model allows for it, these will be re-processed through Snowbridge again.
Before version 3.0.0, Snowbridge treats every kind of target failure the same - it will retry 5 times. If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes, and will proceed without acking the failed Messages. As long as the source's acking model allows for it, these will be re-processed through Snowbridge again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all 5 attempts fail, it will be reported as a 'MsgFailed' for monitoring purposes,

Is it true though? It sounds like MsgFailed is reported only when ALL 5 attempts fail, but I think it's reported after each write failure, no?


A transient failure is a failure which we expect to succeed again on retry. For example some temporary network error, or when we encounter throttling. Typically you would configure a short backoff for this type of failure. When we encounter a transient failure, we keep processing the rest of the data as normal, under the expectation that everyhting is operating as normal. The failed data is retried after a backoff.

A setup failure is one which we don't expect to be immediately resolved, for example an incorrect address, or an invalid API Key. Typically you would configue a long backoff for this type of failure, under the assumption that the issue needs to be fixed with either a configuration change or a change to the target itself (eg. permissions need to be granted). When we encounter a setup error, we stop attempting to process any data, and the whole app waits for the backoff period before trying again. Setup errors will be retried 5 times, before the app crashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we stop attempting to process any data

We don't do anything explicit to stop processing in this case. Right now setup is pretty much like transient, but with much longer backoff. In the future we might add monitoring/alerts/health toggle for setup errors, but it's not there now.

In practice, if you mark your HTTP response as setup error in config, it probably means nothing gets through and we indeed 'stop' processing anything. But there is no code in Snowbridge that would say stop pulling from source now, we hit setup error!.

Theoretically it's possible to have both: setup and transient simultaneously. Then it means your setup error probably shouldn't be configured as setup error.


A setup failure is one which we don't expect to be immediately resolved, for example an incorrect address, or an invalid API Key. Typically you would configue a long backoff for this type of failure, under the assumption that the issue needs to be fixed with either a configuration change or a change to the target itself (eg. permissions need to be granted). When we encounter a setup error, we stop attempting to process any data, and the whole app waits for the backoff period before trying again. Setup errors will be retried 5 times, before the app crashes.

As of v3.0.0, only the http target can be configured to return setup errors, via the response rules feature - configuration details for response rules can be found in [the http target configuration section](/docs/destinations/forwarding-events/snowbridge/configuration/targets/http/index.md). For all other targets, all errors returned will be considered transient, and behaviour can be configured using the `tranisent` block of the retry configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As of v3.0.0, only the http target can be configured to return setup errors, via the response rules feature - configuration details for response rules can be found in [the http target configuration section](/docs/destinations/forwarding-events/snowbridge/configuration/targets/http/index.md). For all other targets, all errors returned will be considered transient, and behaviour can be configured using the `tranisent` block of the retry configuration.
As of v3.0.0, only the http target can be configured to return setup errors, via the response rules feature - configuration details for response rules can be found in [the http target configuration section](/docs/destinations/forwarding-events/snowbridge/configuration/targets/http/index.md). For all other targets, all errors returned will be considered transient, and behaviour can be configured using the `transient` block of the retry configuration.


`setup` means that this error is not retryable, but is something which can only be resolved by a change in configuration or a change to the target. An example of this is an authentication failure - retrying will fix the issue, the resolution is to grant the appropriate permissions, or provide the correct API key.

Data that matches a setup response rule is handled by a retey as determined in the `setup` configuration block of [retry configuration](/docs/destinations/forwarding-events/snowbridge/configuration/retries/index.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Data that matches a setup response rule is handled by a retey as determined in the `setup` configuration block of [retry configuration](/docs/destinations/forwarding-events/snowbridge/configuration/retries/index.md).
Data that matches a setup response rule is handled by a retry as determined in the `setup` configuration block of [retry configuration](/docs/destinations/forwarding-events/snowbridge/configuration/retries/index.md).


`jq` runs a jq command on the message data, and outputs the result of the command. While jq supports multi-element results, commands must output only a single element - this single element can be an array data type.

The provided command must return a boolean result. `false` filters the message out, `true` keeps it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this line shouldn't be here as it's only for filter, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants