-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any way to integrate this within a pytorch dataloader/dataset? #17
Comments
Hey, thanks for the issue. I have two thoughts about this:
Lmk if this answers your question and feel free to ask more. |
@iejMac Thanks for your reply. My current case is I have a large number of video files, I am using webvid-2M dataset (https://m-bain.github.io/webvid-dataset/). There are around 2M videos each with variable length, and on average 18 seconds length. After initial preprocessing of converting videos to 256x256, the resulting dataset is around 1.6 TB. Your suggestion of converting the mp4 links to numpy-arrays in first round of preprocessing for all videos would likely require very large amount of disk space which is not affordable. My current data pipeline involves a dataset (torch.utils.data.Dataset) which (i) reads the video file (no decoding), (ii) decodes specific frames (iii) applies preprocessing on them (iv) return the corresponding tensor. I then wrap around the dataset into DataLoader class. The main bottleneck is the decoding part though. I am currently using decord for this purpose, which on initial testing seems slightly faster than pyav, imageio-ffmpeg. I do see your point of poor scaling wrt target fps, but I don't know if that is the main bottleneck. The very ideal case to me seems like having your first thought, but doing it for every mini-batch of videos at training/inference time. Essentially, read all the mp4 files (no decoding) very fast, and then pass them to the video decoder. Unfortunately, the decoding part is essentially sequential (one worker decodes 1 video), so the same bottleneck will likely exist. Not sure if there is any easy way to mitigate this issue. |
Hmmm, good point. For context - I hadn't thought about that since we get around the storage problem by only storing lower-dimensional frame embeddings. Additionally, the reason poor scaling wrt target fps is the bottleneck for us is because a target FPS of 1 is acceptable since we're looking into longer context video understanding. Anyway, thanks a ton for the description of your pipeline. It seems you're doing a few things that we aren't so I'll look into these to see if they can improve video2numpy. Could you point me to the code for this? As to the ideal case, I think I agree. Like I said before, I'm getting to the point of considering calculating the theoretical maximum decoding speed given the way video is compressed. Don't want to be wasting time hyper-optimizing something if we're already close to theoretical max. Have you looked into this yet? Also, why do you want to integrate this within a PyTorch DataLoader? The input and output for the FrameReader class in video2numpy is exactly the same (see this example + API section in README). Are you getting better performance with your approach? I guess our FrameReader doesn't allow custom preprocessing functions but this should be easy enough to add. Please let me know what video2numpy is missing for your use case. |
@iejMac I believe even with larger time horizon videos, having access to full videos is often useful. For instance you can sample the frames with a bit of temporal jittering (like if you wanted to sample frame 45, now you sample frame 47), which can help in generalization I haven't looked into maximum decoding speed on my end. The main reason to use pytorch dataloader is due to the convenience of porting existing code. Many existing frameworks such as Pytorch Lightning expect a dataloader class into their trainer. Maybe if the video2numpy can be easily instantiated as a DataLoader class it could be really helpful. Using dataloader definitely improves performance, but I don't have a good benchmark against video2numpy. On my side, with current dataloader, I am able to get around 2it/s on 8 A100 gpus, Batch size per gpu = 32, with num_workers=10 on each rank. |
@TheShadow29 I'm a bit confused with what you mean by "full videos" and how this relates to temporal jittering. Downsampling from say 25FPS to 1FPS still gets you the full video but more coarsely grained which fits our method of using semantic CLIP embeddings. Because of how CLIP is trained I don't think there would be much of a difference between the embedding of frame n and n+1 at full FPS. But in general I suppose I do agree that full FPS would be useful even in larger time horizons, I just don't know how useful. Ok, I see. In that case I think it would be useful to make our video2numpy FrameReader class work with PyTorch dataloaders. Maybe by making it easily wrappable or something. I'll look into this when I have some time and let you know. If you'd like to investigate this sooner I welcome you to give it a shot and make a PR. What do you mean by iteration in this context? frame or video? |
@iejMac I agree that for CLIP it is likely not that useful. But I am not exactly training CLIP, but trying a different model. Video models such as SlowFast often expect 8 or 32 frames (for slow and fast paths). In videos such as those sourced from Kinetics with lot of actions, fps=1 wouldn't really work out. Yeah, making the dataloader wrappable would be really useful. I highly doubt I have the bandwidth, but if I do get time, I will update you on the PR. In the context, batch is a video (4 frames per video, forgot to note it previously). |
@TheShadow29 I made a PR with a wrapper that might work for you - #18 A simple example that should work:
credit to https://github.com/ClashLuke for the code |
I think it would be nice to add this in a examples/pytorch.py |
a example/jax.py and examples/tf.py definitely wouldn't hurt either |
Hello, is there any easy way to use the frame extraction method within the dataset/dataloader for pytorch? My understanding that it is not trivial given the use of multiprocessing here: https://github.com/iejMac/video2numpy/blob/main/video2numpy/frame_reader.py#L39 which could conflict with what is already used by Pytorch dataloader.
The text was updated successfully, but these errors were encountered: