Performance Issue of hy3dgen/shapegen/models/denoisers/GELU(nn.Module) 1.55X module speed up is observed on RTX3090 #236

David-Dingle · 2025-05-05T20:26:37Z

Sys env:

python=3.11.9
Hunyuan3D-2 : b8d6b65
OS: Ubuntu 22.04
CUDA: Nvidia RTX3090 with PyTorch260 cu12-4

Description:

Source Code
forces every non-contiguous tensor passed into GeLU to make a contiguous copy before activation.

torch.nn.functional.gelu accepts both contiguous and non-contiguous tensors as input. Certainly, contiguous operands shorten kernel execution time. But the Contiguous() operation is expensive, and the overhead is observed to be greater than the time that contiguous operands save.

Reproduce:

import time

import torch
from PIL import Image

from hy3dgen.rembg import BackgroundRemover
from hy3dgen.shapegen import Hunyuan3DDiTFlowMatchingPipeline

images = {
    "front": "assets/example_mv_images/1/front.png",
    "left": "assets/example_mv_images/1/left.png",
    "back": "assets/example_mv_images/1/back.png"
}

for key in images:
    image = Image.open(images[key]).convert("RGBA")
    if image.mode == 'RGB':
        rembg = BackgroundRemover()
        image = rembg(image)
    images[key] = image

pipeline = Hunyuan3DDiTFlowMatchingPipeline.from_pretrained(
    'tencent/Hunyuan3D-2mv',
    subfolder='hunyuan3d-dit-v2-mv',
    variant='fp16'
)

start_time = time.time()
mesh = pipeline(
    image=images,
    num_inference_steps=50,
    octree_resolution=380,
    num_chunks=20000,
    generator=torch.manual_seed(12345),
    output_type='trimesh'
)[0]

Inference data will go through GeLU multiple times.
After profiling the kernel execution time on the following scope:

x = x.contiguous() # pending to removal
nn.functional.gelu(x, approximate=self.approximate)

By removing the contiguous, the kernel exec time is shortened from 1148.51 ms to 740.17 ms.

The text was updated successfully, but these errors were encountered:

David-Dingle changed the title ~~Performance Issue of hy3dgen/shapegen/models/denoisers/GELU(nn.Module)~~ Performance Issue of hy3dgen/shapegen/models/denoisers/GELU(nn.Module) 1.55X module speed up is observed on RTX3090 May 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Issue of hy3dgen/shapegen/models/denoisers/GELU(nn.Module) 1.55X module speed up is observed on RTX3090 #236

Performance Issue of hy3dgen/shapegen/models/denoisers/GELU(nn.Module) 1.55X module speed up is observed on RTX3090 #236

David-Dingle commented May 5, 2025 •

edited

Loading

Performance Issue of hy3dgen/shapegen/models/denoisers/GELU(nn.Module) 1.55X module speed up is observed on RTX3090 #236

Performance Issue of hy3dgen/shapegen/models/denoisers/GELU(nn.Module) 1.55X module speed up is observed on RTX3090 #236

Comments

David-Dingle commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sys env:

Description:

Reproduce:

David-Dingle commented May 5, 2025 •

edited

Loading