-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Matching algo performance + Exclude duplicate matchings #1142
Conversation
Run the matching algorithm every n days while simulating match requests of real users.
As can be seen from the diff, this only has a small impact on the matched subjects while improving the matching state rate from 10% to 30%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Keep pupils a bit longer in the match pool to find a better match. This increases wait time from 4 days to 17 days on average, but increases both the number of subjects and states matched. Not sure if the trade-off is worth it, but now we have a mechanism for it that can be tuned.
'new', | ||
1, | ||
{ | ||
matchCountSum: 1044, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the new algorithm is able to create more matches with the same "quality" (about the same number of matching subjects and states) I think it is worth to switch.
'old', | ||
1000, | ||
{ | ||
matchCountSum: 883, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that the old algorithm prioritizes the number of matching states over the number of matching subjects and the number of matches created ...
[ | ||
'new', | ||
1000, | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using such a fixed time series as a benchmark is not so optimal - it can easily lead to overfitting onto this time series. Some sort of shuffling of the input would be good, but on the other hand that might loose some time effects that are actually there (e.g. pupils / students signing up in bulk after marketing campaigns)
Splits the tests into small unit tests that are fast to run (
npm run unit
) and perf tests that might take longer (npm run perf
) -npm run test
runs all of them in the barrier.Adds a performance test for the matching algorithm, which simulates a real world matching - pupils and students continously request matches, every N days the matching algorithm is run using the current pools, matched users are removed from the pool. At the end, reports how many matches are created, how many subjects and states match, and how long the pupils and students had to wait for the match.
Extends the Matching Algorithm to exclude duplicate matches (as the old implementation does). Currently implemented as a postfilter after the matching, dont think it happens often and is worth considering in the assignment itself.
https://github.com/corona-school/project-user/issues/1316