Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeded Random Number Generator #11

Merged

Conversation

basic-bgnr
Copy link
Contributor

@basic-bgnr basic-bgnr commented Apr 20, 2024

Added optional cmdline parameter -s, --seed to generate random number with seed value, useful to verify test cases.

The program always return same output when run with same SEEDVALUE and same input value. i.e. the result become deterministic. This is useful to verify the correctness of program by testing the output against same input and same SEEDVALUE when changes are introduced in the algorithm.

When --seed parameter is not provided, reverts to python default implementation to generate random number.

Added optional cmdline parameter -s, --seed to generate RNG, useful to verify test cases.
When --seed parameter is not provided, reverts to python default implementation to generate random number
@sapradhan
Copy link
Collaborator

sapradhan commented Apr 21, 2024

For this problem, there is no "expected" or "correct" center allocation. Tests should focus on whether the constraints mentioned in #4 are satisfied.

Even if we constrain the random sequence, the output is inherently dependent on the implementation detail. It may turn out you will need to change test assertions every time you tweak something in the implementation.
Introducing determinism here will result in hard and brittle tests which will make maintenance difficult IMO. More tests is not necessarily always a good thing.

@basic-bgnr
Copy link
Contributor Author

the output is inherently dependent on the implementation detail.

The output is and always will be dependent upon the implementation details.
But hear this, for a fixed set of constraint (#4) (assuming the program satisfy all the constraints correctly) , the output for each run of the program is dependent only upon the value of the random seed (for the same constraint and input values). But if the constraints are revised/amended in any way, the old test cases must be discarded in favor of new one.

My logic for introducing testing is to verify unrelated feature addition pull request which could change the program output and get merged. Right now, we've no way of verifying if the program behaves as intended, other than reviewing the code manually.

@sapradhan
Copy link
Collaborator

I agree there should be tests but my request is not to write overly specific tests -

Let's take a scenario School A has 416 students and with a fixed seed, it allocates students to
Center A 200, Center B 200, Center C 16 all within given distance threshold, an acceptable solution. You are probably hinting that for given input, test should assert that output should EXACTLY match this. Lets say now somebody tries to implement #13 and output is Center B 200 Center C 216 which is another valid solution. The test will fail, test must now be updated to match this. But what about Center B 216 Center C 200 OR Center C 200 Center D 216 which are also valid solutions, but the test fails.

  • You put in effort to write the test in the first place.
  • The test fails, the contributor must assess whether is its their code or the test is the problem
  • They need to update the code
    and the cycle repeats, time and effort wasted.

One of two things will happen -

  • Contributors will deterred from making any change, specially newer ones who lack deep understanding of the domain
  • Devs simply take their output and update the tests to match it to make the test pass.
    These are not hypothetical scenarios, I have actually seen both of happen in real life. just for the sake of driving up code coverage numbers?

Done with the rant what is my proposal?
write tests only as specific as the requirement, few specific examples from #4 -

  • आफ्नै विद्यालयमा केन्द्र पार्न नहुने
    for each row in output check scode and cscode are different.
  • दुई विद्यालयका परीक्षार्थीको केन्द्र एक अर्कामा पर्न नहुने, अर्थात् कुनै विद्यालयका परीक्षार्थीको केन्द्र परेको विद्यालयका परीक्षार्थीहरूको केन्द्र अघिल्लो विद्यालयमा पार्न नहुने ।
    see if there are any rows in output where school= scode1, center=cscode2 and in another row school=cscode2, center=scode1
  • प्रत्येक पटक केन्द्र तोक्ने प्रोग्राम चलाउदा फरक फरक नतिजा आउने गरी ऱ्यान्डमाइज भएको हुनु पर्ने - randomness is actually part of requirement repeat the test say 10, or 100 times, verify output is different and all other test passes on each run.

@justfoolingaround
Copy link

justfoolingaround commented Apr 22, 2024

I believe a seeded random generator is a great idea. Just for a different reason than writing a static test case upstream.

प्रत्येक पटक केन्द्र तोक्ने प्रोग्राम चलाउदा फरक फरक नतिजा आउने गरी ऱ्यान्डमाइज भएको हुनु पर्ने

An optional argument still ensures that the results vary for every execution.

The seed may be used during development. If a developer is making code optimizations that should be independent of the results, they can use a specific seed to see if the results are actually independent of the changes in the code.

Of course in the early stages, code optimizations would not be expected as algorithms or approaches may vary until the codebase approaches a level of stability.

Furthermore, it may also be used for conveniently sharing a result across devices although this may be a bit far-fetched.

Contributors will deterred from making any change, specially newer ones who lack deep understanding of the domain

I believe this argument is against a static test case upstream (which I'm against as well), but shouldn't the project coerce the contributors to ensure that their results stay valid? Contributors need to be coerced to justify their breaking changes to any codebase.

Devs simply take their output and update the tests to match it to make the test pass.

Again, this is against a static test case upstream but still, this isn't how open-source development works. Even in the event of a static test case upstream, any change will be visible to anyone so you cannot just change the test file. Not to mention such tests should be handled by an automated system.

@sapradhan
Copy link
Collaborator

@justfoolingaround agree on reproducibility for development. only reservation was against static tests

@sumanashrestha
Copy link
Collaborator

Please confirm random = random.Random(None) is equivalent to not initializing random in the existing code

@basic-bgnr
Copy link
Contributor Author

Please confirm random = random.Random(None) is equivalent to not initializing random in the existing code

Every PRNG needs a seed value to initialize and most implementation uses system time for seeding because it's always changing. In case of python, when seed argument is set to None, it reverts to using system time which is also its default implementation.

docs over here (https://docs.python.org/3/library/random.html#random.Random)

class random.Random([seed])
Class that implements the default pseudo-random number generator used by the random module.

and here (https://docs.python.org/3/library/random.html#random.seed)

random.seed(a=None, version=2)
Initialize the random number generator.

If a is omitted or None, the current system time is used . ......

@sumanashrestha sumanashrestha merged commit b654b2e into moest-np:main Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants