DynamicScaler

Seamless and Scalable Video Generation for Panoramic Scenes

Jinxiu Liu*, Shaoheng Lin*, Yinxiao Li, Ming-Hsuan Yang
UC Merced • Google DeepMind • SCUT
(* Equal Contribution, † Corresponding Author)
Paper
It may take few moments to load the videos, please wait patiently :)

Abstract

The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360° panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose the DynamicScaler, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspect ratios. Additionally, we employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Extensive experiments demonstrate our method achieves superior content and motion quality in panoramic scene-level video generation, offering a training-free, efficient, and scalable solution for immersive dynamic scene creation with constant VRAM consumption regardless of the output video resolution.

Loopable and Long video

DynamicScaler can generate infinitely looping videos, seamlessly connecting to the first frame, making the video appear as a continuous loop, with each loop lasting 8 seconds (64 frames, 4x longer than base model duration)

Fireworks above city skyline...

Astronaut floating on Mars...

An eagle flying in the sky...

Input

input image

A band on stage with audiences cheering in a concert...

Input

input image

Underwater world with various marine creatures swimming ...

360 degree Panorama T2V

Clouds swirled over the green grass ...

(perspective view ports)

Aurora over the a lake...

(perspective view ports)

360 degree Panorama I2V

Input

input image

Fireworks show above city skyline...

(perspective view ports)

Input

input image

Ocean waves with white foam floating ...

(perspective view ports)

Regular (perspective) Panorama T2V

Balloons float over the meadow...

Boat on the stream of river......

Sakura blossoms fall by the lake......

Thunderstorm coming, lighting bolts sparks in dark sky...

Regular (perspective) Panorama I2V

Input

input image

Waves rush towards the beach...

Input

input image

On the road, cars come and go...

Method Overview

Pipeline Diagram

Our pipeline is divided into two stages:
low-resolution stage establishes a coarse motion structure, 360-degree setting(the yellow block) involves Panoramic Projecting Denoise to initialize motion that fits to spherical panorama, while the regular perspective setting(the blue block) utilizes Offset Shifting with overlap for the early denoise steps, then the remaining denoise steps are completed by our Offset Shifting Denoise.
The up-scaling stage(the green block) utilizes more shift windows to produce a refined, high-resolution panorama with Global Motion Guidance from the low-resoltuion video.

T2V Comparison

We preformed text conditioned 360-degree panorama video generation comparison with 360DVD.
The results are projected to perspective view ports for comparison. DynamicScaler has better text alignment compared to previous t2v panorama generation method.

Aurora swirling across the sky over lake...

Titanic slicing through the waves...

landscape of the snowy mountains...

Volcano erupting with smoke rolling ...

I2V Comparison

We preformed image to 360-degree panorama video generation comparison with 4K4DGen.
Input images and their results comes from the project page of 4K4DGen. The results are projected to perspective view ports for comparison. Compared to previous i2v panorama genration method, DynamicScaler has more natural and sufficient motion.

Input

Input

Input

Input

Seamlessness

We concatenated the generated panorama video horizontally to demonstrate the smooth boundary transitions of DynamicScaler, which is important for immersive panoramic content.

BibTeX

@misc{2024dynamicscaler,
      title={DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes}, 
      author={Jinxiu, Liu and Shaoheng, Lin and Yinxiao, Li and Ming-Hsuan, Yang},
      year={2024},
      eprint={2412.11100},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}