From words to 3D faces : a real-time pipeline for avatar expression animation
Syed, Abdurrehman (2025-06-12)
Syed, Abdurrehman
A. Syed
12.06.2025
© 2025 Abdurrehman Syed. Ellei toisin mainita, uudelleenkäyttö on sallittu Creative Commons Attribution 4.0 International (CC-BY 4.0) -lisenssillä (https://creativecommons.org/licenses/by/4.0/). Uudelleenkäyttö on sallittua edellyttäen, että lähde mainitaan asianmukaisesti ja mahdolliset muutokset merkitään. Sellaisten osien käyttö tai jäljentäminen, jotka eivät ole tekijän tai tekijöiden omaisuutta, saattaa edellyttää lupaa suoraan asianomaisilta oikeudenhaltijoilta.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202506124400
https://urn.fi/URN:NBN:fi:oulu-202506124400
Tiivistelmä
Generating realistic and controllable 3D facial animations from textual descriptions remains a challenging task, often requiring complex mappings from text embeddings to the latent space of 3D Morphable Models. This thesis demonstrates the possibility of integrating state-of-the-art image diffusion models and 3D face reconstruction for generating text-based animatable avatars and the practical challenges of doing so. My approach takes a neutral 2D facial image and a text prompt describing a desired expression as input. These are fed into a fine-tuned text-to-image inpainting model (based on DiffEdit with LoRA on the KDEF dataset) to generate a 2D image of the face exhibiting the target expression. To ensure texture consistency, a skin tone matching technique is applied to the generated image with respect to the original neutral face. Subsequently, both the original and the generated 2D images are processed using a 3D reconstruction network (DECA) to obtain textured 3D meshes. For enhancing visual realism, a post-processing phase refines the reconstructed 3D model features. Finally, continuous animation is achieved by interpolating among the 3D meshes that represent different textual input, enabling real-time text-based control over 3D facial expressions, demonstrated in a Unity context. This hybrid 2D-3D approach presents a practical and potentially more natural method for generating dynamic 3D facial animation from text input, with potential application to a range of applications in virtual reality, character animation, and human-computer interaction.
Kokoelmat
- Avoin saatavuus [38841]