-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding source-hubspot-native connector #1438
Conversation
at this point in time, all tests passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some initial comments!
source-hubspot-native/test.flow.yaml
Outdated
- "-m" | ||
- source_hubspot_native | ||
config: config.yaml | ||
bindings: | ||
- resource: | ||
name: companies | ||
interval: PT420S |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for adding this explicit interval to the test configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i got it right, when a user does not generate their own bindings, we generate then by the default common.Resource
resource, and the default wait value for estuary is 7 minutes. That's why this specific value at this place ( since discovery would see that no interval was set, i've added that ) .
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that does make sense. I am wondering how/why these bindings were originally set to have no explicit interval
. I don't 100% know what happens in that case, and have raised the question in the #saas-connectors slack channel.
And 7 minutes is a pretty weird value, lol. We use 30 seconds in a lot of other places, or maybe 5 minutes, but I'm not sure about a default wait value for estuary of 7 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know why but in my head 7 minutes was the default interval. Really don't remember where that came from ( probably need my meds ). @williamhbaker Should i leave the default interval of 30 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30 seconds seems like a reasonable default interval for bindings that do not otherwise have restrictive rate limits, and are incremental in nature. We wouldn't want to re-fetch a huge snapshot every 30 seconds for example, so bindings like that should have larger intervals. I'll leave it to you to decide on the intervals based on your knowledge of how hubspot works, since I'm not very familiar with it other than these general statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not leave the interval out for everything that can do incremental @Luishfs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dyaffe @williamhbaker Hubspot limits us by 100 requests every 10 seconds, and given that we have 27+ streams, i think having a 30+ seconds breather for each stream would really help to not reach the limit. I will test with some values and get more data on this matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williamhbaker Added intervals in 4171ca4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still seeing this PT420S
interval in test.flow.yaml
, which I don't think is what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@williamhbaker Forgot to run discover, rebased the commit here fda3378
...spot-native/tests/snapshots/source_hubspot_native_tests_test_snapshots__capture__stdout.json
Show resolved
Hide resolved
...pot-native/tests/snapshots/source_hubspot_native_tests_test_snapshots__discover__stdout.json
Show resolved
Hide resolved
to set 'row_id' from BaseDocument meta as required, a simple modification to the BaseDocument was made by removing the default value from the field and adding it to the Field() method
Raised pagination limit number
@williamhbaker I believe i've answer all the initial comments! |
4171ca4
to
da3558d
Compare
Small streams have 30 seconds interval longer streams have a 1 minute interval between syncs Removed unused custo stream Venues
da3558d
to
fda3378
Compare
@williamhbaker Tests are now passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. See comment about interval on one of the streams. I didn't go through and check every single one line by line against the table in the PR description, but I did find that discrepancy at a glance. I trust that you'll make sure things match up before merging.
- resource: | ||
name: properties | ||
interval: P1D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just spot checking: The table in the PR description says that this is supposed to be 1 day. But its 30 seconds here. Which one is it supposed to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting that!
Can i get the docs PR merged? I will add then to the spec and address this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following how the docs PR merging is a condition to addressing this. But, I made some comments on the docs PR. @jonwihl may want to take a look at it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated at ae65e51
Description:
Adding
source-hubspot-native
connectorDocumentation links affected:
estuary/flow#1440
Notes for reviewers:
Tests were made using a custom Hubspot account.
Each stream was tested one at a time with each stream. In order
to allow for the cursor to catch recent data, my
cut_off
/start_date
variableswere set using
- timedelta(days=365)
.Pro streams data could not be validated. Because of that, they were passed on the base V3
model which should handle all of their pagination cases. ( workflows needed a special pagination function
and resource).
Pro streams are:
Streams Interval
Hubspot imposes a API rate limit of 100 requests per 10 seconds.
V3 Objects can require more than one request per stream, this happens because of the "Associations" and "batch" options, which uses more API requests than usual.
Because of that, a interval of 30 seconds was added, so that smaller streams ( like Companies or Campaigns ) don't add-up to the rate limit.
V1 Objects that can require a longer capture time were set to 60 seconds so that on general their capture can happen after the quicker V3 Object and dont trigger 429 responses that often. Since they can usually take hours to occur, real-time transfers are already blocked, but a longer wait time period can harm smaller clients that dont require long hours of capture.
Finally, Properties were set to 1 day. They are a really small stream that does not require real-time processing and wont affect usage.
This change is