GenericAvatar: Generic Human Modeling from Monocular Video Based on Mesh-guided Gaussians

method

framework of our GenericAvatar.

Abstract

We propose a universal human avatar modeling framework GenericAvatar, which leverages mesh-guided Gaussian splatting to achieve personalized, high-fidelity reconstruction of human bodies or heads from monocular videos. Our method consists of two steps. First, the Gaussian initialization module based on explicit triangular meshes embeds Gaussian splats onto the mesh surface and then transforms them into the global coordinate system, thereby stably capturing the low frequency motions and surface deformations of human avatars. Second, the Gaussian adjustment module employs a Triplane representation to encode 3D Gaussian splats, followed by a spatial-posture cross-attention module and an MLP module to adjust Gaussian attributes. The second module effectively overcomes the limitations of traditional linear blend skinning (LBS) in modeling complex non-rigid deformations, enabling precise modeling of high-frequency details such as clothing wrinkles and dynamic hair. By fully integrating the geometric priors provided by explicit meshes with implicit Gaussian representations, GenericAvatar demonstrates high-fidelity reconstruction on PeopleSnapshot, ZJU-Mocap, and a monocular head dataset, preserving complex texture details. Experimental results indicate that GenericAvatar outperforms state-of-art methods on both human body reconstruction and head reconstruction.

Comparison Experiments

Novel view synthesis on PeopleSnapshot: Our method is able to reconstruct intricate texture details.

Novel view synthesis on Monocular head dataset: Our method reconstructs complicated hair textures.

Avatar animation on out-of-distribution poses: Our method generates consistent representations for avatars on challenging poses.

Related Links

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians.

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting.

Monogaussianavatar: Monocular gaussian point-based head avatar.