Add select_streams option (to be added in the config file) to generate catalog file with some stream(s) pre-selected #9
+7
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
First of all, apologies if I should have waited for an issue to be approved before creating this PR. It's just that I really need this feature, and I was going to use my forked repo; but then I thought maybe it would be helpful for everyone. I would love your opinion on this.
Description of change
It's very weird to me one needs to run the
discovery
mode, and then somehow (manually, programmatically, or even with tools like singer-discover) change thecatalog
file in order to select which stream you want to retrieve data from in thesync
step (which will probably be sheets, in this case).Singer taps are a great tool and I think it should be as plug-and-play as possible; and if one uses it when running periodic tasks (which is my case), it just doesn't make sense to manually change any file. Personally, I don't like the idea of programmatically changing the
catalog
file (as, to be honest, I've seen some people doing) if there's a way to generate it with the desired stream already selected.The idea here is to add the possibility of including an option in the
config
file, so the discovery part generates thecatalog.json
with a (or some) stream(s) pre-selected. This way, there's no intermediate step between the discovery and the sync, in order to get the data you want.The solution is based on the fact that:
sync
: https://github.com/singer-io/tap-google-sheets/blob/4d4082c829/tap_google_sheets/sync.py#L362Relates to #8
Manual QA steps
select_streams
in the config file:file_metadata
) or the name of the sheet (e.g.Sheet 1
)tap-google-sheets --config config.json --discover > catalog.json
) should generate the catalog file with the option"selected": true
in the schema corresponding to the stream defined in theselect_streams
option.Risks
select_streams
is the best name.Rollback steps