Researchers introduce ViTok, a Vision Transformer-based auto-encoder that scales visual tokenization to enhance image and video generation while reducing computational costs.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results