DALL-E uses a discrete VAE (Variational Autoencoder) to encode images into tokens and then generates images from these tokens using a Transformer architecture. Suppose the model uses a vocabulary of V=8192 discrete tokens. Each token is represented by an embedding vector of dimensionality d=512.
Calculate the total number of parameters in the embedding matrix used for encoding these tokens.