Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New table does not always load (all) data #37

Open
skjbulcher opened this issue Nov 19, 2020 · 1 comment
Open

New table does not always load (all) data #37

skjbulcher opened this issue Nov 19, 2020 · 1 comment

Comments

@skjbulcher
Copy link

Hi,

I'm playing around with singer.io for the first time. I've set up a tap-mysql to target-bigquery script. My test database contains two rows that I'm using as a playground.

I noticed that target-bigquery is capable of creating the BigQuery tables based on the input from tap-mysql, which I love 😍 I wasn't looking forward to creating tables manually. However, when I examined the contents of a new table after running target-bigquery, it consistently lacks one or both records from the tap. The target, however, always reports that it "Loaded 2 row(s) into testganger:template /projects/ebs-it/datasets/testganger/tables/template", regardless of the contents of the table.

Curious, I ran the target several times on an existing database, and in every case two records were always created. The issue appears to be limited to the creation of new tables.

Steps to reproduce

  1. Set up tap-mysql with a small table - two rows.
  2. Run tap-mysql and output the state to a file (let's call it state.log)
  3. Run target-bigquery -c config.json < state.log to load the data into BigQuery
  4. Run target-bigquery -c config.json < state.log to load the data into BigQuery a second time.

What happened
The table is created properly in BigQuery. Runing select * from <table> results in the display of between 2-4 records.

What I expected
Runing select * from <table> results in the displays exactly 4 records every time.

@skjbulcher
Copy link
Author

Based on the symptoms, it looks like target-bigquery immediately tries to send the records to new tables after the table is created, without confirming that the table exists yet. I'm not familiar with the BigQuery API, but might be possible to fix with a delay or query to verify the table is created?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant