Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding source-hubspot-native connector #1438

Merged
merged 17 commits into from
Apr 12, 2024
Merged

Conversation

Luishfs
Copy link
Collaborator

@Luishfs Luishfs commented Apr 1, 2024

Description:

Adding source-hubspot-native connector

Documentation links affected:
estuary/flow#1440

Notes for reviewers:

Tests were made using a custom Hubspot account.
Each stream was tested one at a time with each stream. In order
to allow for the cursor to catch recent data, my cut_off/start_date variables
were set using - timedelta(days=365) .

Pro streams data could not be validated. Because of that, they were passed on the base V3
model which should handle all of their pagination cases. ( workflows needed a special pagination function
and resource).
Pro streams are:

  • goal_targets
  • feedback_submissions
  • workflows

Streams Interval

Stream Interval
Companies 30 seconds
Contacts 30 seconds
Deals 30 seconds
Engagements 30 seconds
Contact Lists 1 minute
Contact Lists Membership 1 minute
Subscription Changes 1 minute
Email Events 1 minute
Ticket Pipelines 1 minute
Deal Pipelines 1 minute
Campaigns 1 minute
Engagements Calls 30 seconds
Engagements Emails 30 seconds
Engagements Tasks 30 seconds
Engagements Notes 30 seconds
Goal Targets 30 seconds
Line Items 30 seconds
Tickets 30 seconds
Email Subscriptions 30 seconds
Marketing Forms 30 seconds
Owners 30 seconds
Properties 1 day
Feedback submissions 30 seconds
Marketing Emails 30 seconds
Workflows 30 seconds
All Custom objects 1 minute

Hubspot imposes a API rate limit of 100 requests per 10 seconds.
V3 Objects can require more than one request per stream, this happens because of the "Associations" and "batch" options, which uses more API requests than usual.
Because of that, a interval of 30 seconds was added, so that smaller streams ( like Companies or Campaigns ) don't add-up to the rate limit.

V1 Objects that can require a longer capture time were set to 60 seconds so that on general their capture can happen after the quicker V3 Object and dont trigger 429 responses that often. Since they can usually take hours to occur, real-time transfers are already blocked, but a longer wait time period can harm smaller clients that dont require long hours of capture.

Finally, Properties were set to 1 day. They are a really small stream that does not require real-time processing and wont affect usage.


This change is Reviewable

@Luishfs Luishfs added docs pending Improvements or additions to documentation noted or in progress python Pull requests that update Python code labels Apr 1, 2024
@Luishfs Luishfs self-assigned this Apr 1, 2024
@Luishfs Luishfs linked an issue Apr 2, 2024 that may be closed by this pull request
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some initial comments!

- "-m"
- source_hubspot_native
config: config.yaml
bindings:
- resource:
name: companies
interval: PT420S
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for adding this explicit interval to the test configuration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i got it right, when a user does not generate their own bindings, we generate then by the default common.Resource resource, and the default wait value for estuary is 7 minutes. That's why this specific value at this place ( since discovery would see that no interval was set, i've added that ) .
Does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that does make sense. I am wondering how/why these bindings were originally set to have no explicit interval. I don't 100% know what happens in that case, and have raised the question in the #saas-connectors slack channel.

And 7 minutes is a pretty weird value, lol. We use 30 seconds in a lot of other places, or maybe 5 minutes, but I'm not sure about a default wait value for estuary of 7 minutes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know why but in my head 7 minutes was the default interval. Really don't remember where that came from ( probably need my meds ). @williamhbaker Should i leave the default interval of 30 seconds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 seconds seems like a reasonable default interval for bindings that do not otherwise have restrictive rate limits, and are incremental in nature. We wouldn't want to re-fetch a huge snapshot every 30 seconds for example, so bindings like that should have larger intervals. I'll leave it to you to decide on the intervals based on your knowledge of how hubspot works, since I'm not very familiar with it other than these general statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave the interval out for everything that can do incremental @Luishfs ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyaffe @williamhbaker Hubspot limits us by 100 requests every 10 seconds, and given that we have 27+ streams, i think having a 30+ seconds breather for each stream would really help to not reach the limit. I will test with some values and get more data on this matter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamhbaker Added intervals in 4171ca4

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still seeing this PT420S interval in test.flow.yaml, which I don't think is what we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williamhbaker Forgot to run discover, rebased the commit here fda3378

source-hubspot-native/tests/test_snapshots.py Show resolved Hide resolved
source-hubspot-native/source_hubspot_native/resources.py Outdated Show resolved Hide resolved
to set 'row_id' from BaseDocument meta as required, a simple  modification to
the BaseDocument was made by removing the default value from the field and
adding it to the Field() method
Raised pagination limit number
@Luishfs
Copy link
Collaborator Author

Luishfs commented Apr 3, 2024

@williamhbaker I believe i've answer all the initial comments!

@Luishfs Luishfs force-pushed the luis/source-hubspot-native branch from 4171ca4 to da3558d Compare April 9, 2024 16:20
Small streams have 30 seconds interval
longer streams have a 1 minute interval between syncs
Removed unused custo stream Venues
@Luishfs Luishfs force-pushed the luis/source-hubspot-native branch from da3558d to fda3378 Compare April 9, 2024 16:33
@Luishfs
Copy link
Collaborator Author

Luishfs commented Apr 9, 2024

@williamhbaker Tests are now passing

Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. See comment about interval on one of the streams. I didn't go through and check every single one line by line against the table in the PR description, but I did find that discrepancy at a glance. I trust that you'll make sure things match up before merging.

- resource:
name: properties
interval: P1D
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spot checking: The table in the PR description says that this is supposed to be 1 day. But its 30 seconds here. Which one is it supposed to be?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting that!
Can i get the docs PR merged? I will add then to the spec and address this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following how the docs PR merging is a condition to addressing this. But, I made some comments on the docs PR. @jonwihl may want to take a look at it too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at ae65e51

@Luishfs Luishfs added docs complete / NA No (more) doc work related to this PR and removed docs pending Improvements or additions to documentation noted or in progress labels Apr 12, 2024
@dyaffe dyaffe merged commit 44fe0d0 into main Apr 12, 2024
52 of 55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs complete / NA No (more) doc work related to this PR python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new connector: source-hubspot-native
3 participants