Researchers introduce ViTok, a Vision Transformer-based auto-encoder that scales visual tokenization to enhance image and video generation while reducing computational costs.