Researchers introduce ViTok, a Vision Transformer-based auto-encoder that scales visual tokenization to enhance image and video generation while reducing computational costs.
Results that may be inaccessible to you are currently showing.
Hide inaccessible results