GenericAvatar: Generic Human Modeling from Monocular Video Based on Mesh-guided Gaussians

We propose a universal human avatar modeling framework GenericAvatar, which leverages mesh-guided Gaussian splatting to achieve personalized, high-fidelity reconstruction of human bodies or heads from monocular videos. Our method consists of two steps. First, the Gaussian initialization module based on explicit triangular meshes embeds Gaussian splats onto the mesh surface and then transforms them into the global coordinate system, thereby stably capturing the low frequency motions and surface deformations of human avatars. Second, the Gaussian adjustment module employs a Triplane representation to encode 3D Gaussian splats, followed by a spatial-posture cross-attention module and an MLP module to adjust Gaussian attributes. The second module effectively overcomes the limitations of traditional linear blend skinning (LBS) in modeling complex non-rigid deformations, enabling precise modeling of high-frequency details such as clothing wrinkles and dynamic hair. By fully integrating the geometric priors provided by explicit meshes with implicit Gaussian representations, GenericAvatar demonstrates high-fidelity reconstruction on PeopleSnapshot, ZJU-Mocap, and a monocular head dataset, preserving complex texture details. Experimental results indicate that GenericAvatar outperforms state-of-art methods on both human body reconstruction and head reconstruction.

GenericAvatar: Generic Human Modeling from Monocular Video Based on Mesh-guided Gaussians

framework of our GenericAvatar.

Abstract

Comparison Experiments

Related Links