ControlVideo：零训练的可控文本到视频生成-EW帮帮网

Paper: Zhang Y, Wei Y, Jiang D, et al. Controlvideo: Training-free controllable text-to-video generation[J]. arXiv preprint arXiv:2305.13077, 2023.
Introduction: https://controlvideov1.github.io/
Code: https://github.com/YBYBZhang/ControlVideo

在这里插入图片描述

ControlVideo 是一种无训练的文本到视频生成的框架。ControlVideo 基于 ControlNet 进行改进，通过输入的运动序列获取粗略的结构一致性，并引入三个模块以提升视频生成质量。首先，为确保帧之间外观的一致性，ControlVideo 在自注意力模块中添加了完全跨帧交互。其次，为减轻闪烁效应，它引入了一个交错帧平滑器，通过对交替帧进行帧插值来缓解此问题。最后，为了高效地生成长视频，ControlVideo 采用了层次化采样器，将每个短片段分别合成，并确保整体一致性。
在这里插入图片描述

复现

平台：本地服务器
显卡：RTX 4090 24G
镜像：PyTorch 2.0.0、Python 3.8(ubuntu20.04)、Cuda 11.8
源码：https://github.com/YBYBZhang/ControlVideo

实验过程：

克隆仓库后，按照 README 创建虚拟环境 controlvideo 并安装依赖，然后下载需要的模型权重；
安装依赖的过程中，遇到 python setup.py egg_info did not run successfully. 报错：

按照提示增加 --use-pep517 选项使用 PEP 517 定义的构建流程来安装包即可。如果还是失败，可以换源后重新安装；
换源后遇到 Could not find a version that satisfies the requirement tb-nightly (from basicsr) (from versions: none) 报错：

原因是 pip 和清华源中没有对应的 “tb-nightly” 依赖包，使用阿里源即可： -i https://mirrors.aliyun.com/pypi/simple；
运行前还需要根据模型权重的位置修改 inference.py 中的模型路径；
运行时遇到 ModuleNotFoundError: No module named 'torch._six' 报错：

这是因为 torch 升级到 2.0 之后，取消了这个 api。将所有 from torch._six import inf 修改为 from torch import inf 即可 ¹；
运行时遇到 ValueError: depth is not a valid processor id 报错：

原因是 controlnet-aux 更新了，改为 depth_midas 即可 ²；
又遇到 Huggingface 连接错误：

将 lllyasviel/Annotators 模型下载至本地，然后将虚拟环境下 lib/python3.10/site-packages/controlnet aux/processor.py 中的 processor = processor.from_pretrained("lllyasviel/Annotators") 修改为 processor = processor.from_pretrained("../../pretrain_models/Annotators") 即可；
训练中途出现 CUDA out of memory 报错，增加 --is_long_video 参数即可。随后就可以进行训练：

实验结果：

mallard-water：python inference.py --prompt "A striking mallard floats effortlessly on the sparkling pond." --condition "depth_midas" --video_path "data/mallard-water.mp4" --output_path "outputs/mallard-water/" --video_length 15 --smoother_steps 19 20 --width 512 --height 512 --frame_rate 2 --version v10 --is_long_video

ControlVideo mallard-water源视频

ControlVideo mallard-water编辑视频

man-spraying-fire：python inference.py --prompt "A man is spraying fire from a fire-breathing device, surrounded by vehicles and a basketball hoop." --condition "depth_midas" --video_path "data/man-spraying-fire.mp4" --output_path "outputs/man-spraying-fire/" --video_length 15 --smoother_steps 19 20 --width 512 --height 512 --frame_rate 2 --version v10 --is_long_video

ControlVideo spraying-fire源视频

ControlVideo spraying-fire编辑视频

man-spraying-water：python inference.py --prompt "A man is spraying water from a water gun, surrounded by vehicles and a basketball hoop." --condition "depth_midas" --video_path "data/man-spraying-fire.mp4" --output_path "outputs/man-spraying-water/" --video_length 15 --smoother_steps 19 20 --width 512 --height 512 --frame_rate 2 --version v10 --is_long_video

ControlVideo spraying-water源视频

ControlVideo spraying-water编辑视频

car-moving：python inference.py --prompt "A green car moving on the road" --condition "depth_midas" --video_path "data/car-moving.mp4" --output_path "outputs/car-moving/" --video_length 15 --smoother_steps 19 20 --width 512 --height 512 --frame_rate 2 --version v10 --is_long_video

ControlVideo car-moving源视频

ControlVideo car-moving编辑视频

解决No module named ’torch._six‘问题 ↩︎
controlnet-aux==0.0.6 ValueError: depth is not a valid processor id #22 ↩︎

ControlVideo：零训练的可控文本到视频生成

复现

网站公告

今日签到

热门文章

最新发布