Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation and setup script fixes #68

Open
wants to merge 10 commits into
base: sk/waveform-dev
Choose a base branch
from

Conversation

jeremyestein
Copy link
Collaborator

Fixes #56 , waveform turned off in production (but optional in validation)

Copy link

PR checklist

Default guide for a PR (if multiple PRs for the work, only keep one version of it and link to it on the other PRs)

  • From the UCLH data science desktop, a validation run has been set off
  • load times
    in UCL teams has been populated with the run information
  • During the run, glowroot has been checked for any queries which are taking a substantial proportion of the
    total processing time. This can be useful to identify indexes that are required.
  • After the run, look for any unexpected errors in the etl_per_message_logging table, the error_search.sql file
    on the shared drive can be used for this \\sharefs6\UCLH6\EMAP\Shared\EmapSqlScripts\devops\error_search.sql.
    Create an issue if you find an unexpected exception and is not related to the changes you've made, otherwise
    fix them!
  • After the run, populate the end time in
    load times
  • Let Aasiyah know about the completed validation and give her information on the changes and where to start
    with the validation
  • Check validation report and give any feedback to Aasiyah if there are any changes needed on her side,
    iterate on getting the validation to match at least 99% (validation and emap code).

@jeremyestein jeremyestein marked this pull request as ready for review October 10, 2024 22:10
List<IdsMaster> msg = idsSession.createQuery(
"select i from IdsMaster i where i.unid >= :fromUnid and i.persistdatetime >= :fromDatetime order by i.unid",
"select i from IdsMaster i where i.persistdatetime >= :fromDatetime order by i.persistdatetime",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a feeling like when we didn't specify this, that we wouldn't get them ordered by unid in production. Might be worth doing a test of this against the IDS

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. If I've got this query right, it should be doing the check:

select * from (
	select
	    ids.unid,
	    ids.persistdatetime,
	    row_number() over (order by unid) as by_unid,
	    row_number() over (order by persistdatetime) as by_dt
	from public.tbl_ids_master ids
) x
where
x.by_unid != x.by_dt
;

So far, the only ones that fail the test occur between 1 and 2 am on the last Sunday of each October 🙄 (at least as far as 2022)

Might suggest that the persistdatetime is not in fact timezone aware, as it claims to be :(

I think this doesn't make any difference in the case where we use a date that is not in this ambiguous window. This is already a bug if hl7-reader should record progress that happens to fall in this window.

I will have a think about what's best here...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR:

  • The IDS timestamp field is broken and we have been imperfectly working around it; this change makes that workaround worse
  • In practice it likely only affects validation runs, not production
  • It only applies if requesting a start/end date that occurs in the ambiguous date zone between 1am and 2am at the end of October (that I sometimes refer to as "programmer's halloween" ;)
  • It doesn't affect order of message delivery in any case
  • It could be worked around with a little manual binary searching, but probably not worth the effort
  • the advantage of the new way is that I don't have to wait an hour every time I do a validation run!

Details:
Note that this doesn't affect the order in which we process the messages, it just affects the starting/ending unid when a date bound has been requested (true only in validation rather than production I believe?). The current unid is then incremented in the expected way.

However, it has I think a ~50% chance of missing up to an hour of messages if and only if the given date bound occurs between 1am and 2am on this day of the year.

In this diagram the arrowed line represents a series of messages with monotonically increasing unids, where there is a jump back an hour because of the local datetime incorrectly being used.
image

Imagine that the requested bound is b1, then using my query it will chose message m1 because it's slightly nearer in time to b1, when it should have chosen m2 (which has a lower unid).
If the bound is b2, then it will correctly chose m3, because the message that occurred in BST happened to be nearer to b2 in local time.

The old query gives the correct unid for times b1 and b2, but would fail for the short time that sits in between the last message before 2am and 2am itself. In that case it would wrongly pick m4 instead of m5.

When the bug occurs, it just means that the unid is unstable with respect to small changes in the input date. However, given that the input date is inherently ambiguous due to the IDS' use of local time, it's not clear what the user would be hoping for anyway.

You could fix it better by using the old query, but first doing a binary search to search for a better :fromUnid timestamp (targetting the timestamp at least one hour before what the user actually asked for to avoid the ambiguity). This would at least give it stable (but still wrong) behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants