Vision Transformers for Image Segmentation
This is my attempt to port Vision Transformers (ViT) to be used for semantic image segmentation.
I took a shortcut because of time where I did not do proper multiclass data. The label are colored images and the output is too - rather than having channels in the image that correspond to the various clases. Though I imagine it would not be much work to modify it to be a proper multiclass system.