You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In DataLoader class, there are an argument seed_for_shuffle which controls reproducibilty that when using DataLoader with infinite=False and shuffle=True (between two different trains, the data feeding the network will be always the same). But why when infinite=True not doing the same? even if batch is infinite, between two different train you may want reproducibility.
So, there are two alternatives:
When doing return np.random.choice(self.indices, self.batch_size, replace=True, p=self.sampling_probabilities) in line 118 DataLoader do it using self.rs (i.e. return self.rs.choice(self.indices, self.batch_size, replace=True, p=self.sampling_probabilities)).
Instead of set self.rs at init of DataLoader instance, do np.random.seed().
Thanks for your time!
The text was updated successfully, but these errors were encountered:
To ensure that even if infinite=True we have reproducibility, the second option must be done. With the first one, if other lines of code calling np.random are executed, this reproducibility is lost, because the main seed is consumed. For example if the transformation we pass to the Single/MultiThreadedAugmentor is MirrorTransform, if we pass to the latter axes=(0,) vs axes=(0, 1) we will not get the same data back from the DataLoader having done only np.random.seed(seed), because inside MirrorTransforn we have executed 1 and 2 times, respectively, methods involving np.random. I have seen it with this case.
Moreover. It must be the two options to ensure full reproducibility! This way, between different trainings where we only want to change, for example, the optimiser, we ensure that the same random transformations are always done in both trainings.
Hi,
In DataLoader class, there are an argument
seed_for_shuffle
which controls reproducibilty that when using DataLoader withinfinite=False
andshuffle=True
(between two different trains, the data feeding the network will be always the same). But why wheninfinite=True
not doing the same? even if batch is infinite, between two different train you may want reproducibility.So, there are two alternatives:
When doing
return np.random.choice(self.indices, self.batch_size, replace=True, p=self.sampling_probabilities)
in line 118 DataLoader do it usingself.rs
(i.e.return self.rs.choice(self.indices, self.batch_size, replace=True, p=self.sampling_probabilities)
).Instead of set
self.rs
at init of DataLoader instance, donp.random.seed()
.Thanks for your time!
The text was updated successfully, but these errors were encountered: