Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activities of type Meeting impact the bookmark and stop extraction #20

Open
tconbeer opened this issue Feb 20, 2020 · 7 comments
Open

Comments

@tconbeer
Copy link

tconbeer commented Feb 20, 2020

It's possible for an activity to be created when a Meeting is set in CloseIO. That activity has a date_created that is equal to the scheduled date of the meeting, which is likely in the future. These records also have a status of upcoming and a source of calendar.

The Close API probably shouldn't work this way, but if that won't change, we need to update the tap to either use a different timestamp (there is a date_updated field that is probably better anyway) or special-case these Meeting activities to exclude them from either the sync or the bookmark calculation (which should be a simple comparison of record in the method here)

cc @apbenn

@tconbeer
Copy link
Author

If I open a PR against this issue, would it be accepted? This tap seems unmaintained.

@pmauro
Copy link

pmauro commented Mar 5, 2021

Why aren't we using the date_updated field for the activities table\Stream? This is the only Stream that uses date_updated instead of date_created as its BOOK_KEY (line 29 of tap-closeio/streams.py). I'd make the change and issue a PR but others may be better suited to test the fix. (I'm also curious why the BOOK_KEY for activities is the only one that doesn't use date_updated. This seems intentional.)

@pmauro
Copy link

pmauro commented Mar 5, 2021

It looks like @cosimon tried to fix this:

https://github.com/singer-io/tap-closeio/compare/fix/ignore-future-dated-activities

I still think that keying on date_updated is more correct, but both changes should probably be pulled.

@pmauro
Copy link

pmauro commented Mar 5, 2021

More digging. @cdlethem committed the change I'm suggesting to his/her fork. We could just pull request from that if the QA's been done...

singer-io:master...cdlethem:master

@cdlethem
Copy link

cdlethem commented Mar 5, 2021

For what it's worth, I couldn't find a reason that activities used the date_created field as a key, perhaps the author had in mind activities like opportunity_status_changed which wouldn't ever be altered. I chose to change it to date_updated instead of filtering out future activities, because for my use case, I am concerned with how many future meetings a user has scheduled, for instance.

@pmauro
Copy link

pmauro commented Mar 5, 2021

Ha. Well the future meetings -- this is redundant in 99% of cases -- are what are killing me. It breaks the ETL.

@rahimnathwani
Copy link

@cdlethem
"I couldn't find a reason that activities used the date_created field as a key"

I was curious about this so looked at the paginated_sync() function. It looks like:

  1. date_created is being used to filter activities we get from the endpoint, but
  2. we're not passing date_created as a parameter to the Close API

Is #2 correct? If so, it seems like (i) the README's rationale for not using date_created doesn't make sense, and (ii) you are right, in that we could just change that to any other field that exists in the response objects.

Have I understood that correctly, that the tap is requesting all activities ever, even when doing an incremental sync for the last day?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants