用开源phospho-app训练VLA并部署VLA

发布于:2025-08-31 ⋅ 阅读:(23) ⋅ 点赞:(0)

1. 介绍

我个人用下来的体验非常不错,封装了很多lerobot本身的功能作为可视化接口,上手简单,而且这个app经常更新。

github

这是一个软件,有以下功能:

  1. 控制机器人
  2. 记录数据
  3. 训练和使用VLA模型

有以下特点:

  • 目前支持ACT,GR00T-N1.5
  • SO100,SO101,松灵机械臂等
  • 充分适配lerobot和HF
  • macos/linux/windows 三端运行
  • 支持多种相机
  • 开源
  • 他们支持自己写控制器去控制我们自己的机械臂

更新app:

sudo apt update && sudo apt install --only-upgrade phosphobot

2. 开始使用

首先要有一个机器人,我这里是SO101

2.1 安装

# Install uv: https://docs.astral.sh/uv/
curl -LsSf https://astral.sh/uv/install.sh | sh

# Run phosphobot
uvx phosphobot run

2.2 控制机器人

 进入dashboard之后就可以选择控制机器人,这里是用键盘控制。

这个默认的是主臂。

从臂也是可以选择的。

我们这列还可以选择其他的控制方式,比如手柄,主从臂,

还有一些其他的小功能。

比如修改主从,镜像控制,录制动作并回放动作等。

还能看看舵机温度。

2.3 录制数据集

可以开始录制和停止

记录下来的数据集在这里:

录制下来的数据集直接被传到HuggingFace上了

这里也可以设置数据集名称和任务指令。

数据集也是都能正常查看的,而且最关键的是:不抖了!!!可能是因为无法实时看到摄像头的图像导致的?但是最后录制出来的视频是没问题的。

那之前带摄像头遥操可能是因为实时显示导致处理不过来?

在phospho中也时可以直接查看数据集的,这个功能还挺好的。

我们也可以从hf上下载其他数据集,可以进行数据集的合并

这就是非常实用的一系列数据集管理方法(合并/拆分/修复/删除)了,phospho都帮我们封装好了。

拆分主要是可以分成测试集和训练集

3. 训练VLA

这两个不急,先把lerobot-pi和gr00t官方的试过再来搞这个。

需要创建账户--碰到了

ValueError: Unknown scheme for proxy URL URL('socks://127.0.0.1:7891/')

解决:pip install httpx-socks 并重启服务

在尝试拿到数据集时,有这样的报错,但是数据集确实是公开的啊?

Error fetching training info: 1 validation error for TrainingRequest dataset_name Value error, Dataset wantobcm/youliangtan_so101_strawberry_grape is not a valid, public Hugging Face dataset. Please check the URL and try again. Your dataset name should be in the format <username>/<dataset_name> [type=value_error, input_value='wantobcm/youliangtan_so101_strawberry_grape', input_type=str] For further information visit https://errors.pydantic.dev/2.11/v/value_error

而且我发现,登上账户之后很容易断连,导致训练和控制部分没办法很好的使用。

还有就是用他这个app训练模型时,如果不升级会员,最长仅2h,所以我还是选择本地脚本进行训练了。

不过后来修改成功!!!

这时上传的数据集也可以正常显示并修改训练参数了。

如果不:

# 设置环境变量
export HTTP_PROXY=http://127.0.0.1:7890
export HTTPS_PROXY=http://127.0.0.1:7890

就会出现以下问题:

所以这两条设置环境变量是必要的,可以添加到bashrc里。

这条命令可以指定某版本运行:

uvx phosphobot@0.3.78 run

当然,他的训练过程不是我个人的主要训练方式,我选择本地脚本微调。

4. 用VLA控制机器人

关于控制,似乎要调用他们云端的GPU?所以还是本地推理比较好一点,就不用他的app了。

所以也不用纠结了,还是调本地脚本吧,这里也仅做演示。

如果用官方的nvidia/GR00T-N1.5,他们的模型主要是用在人形上的,所以不匹配,可以用下面这个单臂模型:

点控制之后会出问题,移动到初始位置之后,ai的状态一直进入等待了。

但如果用官方的,摄像头又加载不出来,详间下一节,等待后续官方的优化更新吧。

5. 本地部署Phosphobot

人家官方功能可能会出现不适配的问题,所以我本地构建了一个,以下关于realsense D405相机的报错就是契机:

2025-08-06 19:05:02.303 | INFO     | phosphobot.camera:initialize_realsense_camera:1117 - Found 1 RealSense device(s)
2025-08-06 19:05:02.333 | DEBUG    | phosphobot.camera:initialize_realsense_camera:1125 - Attempting to initialize RealSense device 0: Intel RealSense D405 (Serial: 230322271611)
2025-08-06 19:05:02.338 | ERROR    | phosphobot.camera:__init__:780 - RealsenseCamera 0 (230322271611): Failed to initialize - enable_stream(): incompatible function arguments. The following argument types are supported:
    1. (self: pyrealsense2.pyrealsense2.config, stream_type: pyrealsense2.pyrealsense2.stream, stream_index: int, width: int, height: int, format: pyrealsense2.pyrealsense2.format, framerate: int) -> None
    2. (self: pyrealsense2.pyrealsense2.config, stream_type: pyrealsense2.pyrealsense2.stream) -> None
    3. (self: pyrealsense2.pyrealsense2.config, stream_type: pyrealsense2.pyrealsense2.stream, stream_index: int) -> None
    4. (self: pyrealsense2.pyrealsense2.config, stream_type: pyrealsense2.pyrealsense2.stream, format: pyrealsense2.pyrealsense2.format, framerate: int) -> None
    5. (self: pyrealsense2.pyrealsense2.config, stream_type: pyrealsense2.pyrealsense2.stream, width: int, height: int, format: pyrealsense2.pyrealsense2.format, framerate: int) -> None
    6. (self: pyrealsense2.pyrealsense2.config, stream_type: pyrealsense2.pyrealsense2.stream, stream_index: int, format: pyrealsense2.pyrealsense2.format, framerate: int) -> None

Invoked with: <pyrealsense2.pyrealsense2.config object at 0x7f866dc3bd70>; kwargs: stream_type=<stream.color: 2>, format=<format.bgr8: 6>
2025-08-06 19:05:02.362 | WARNING  | phosphobot.camera:initialize_realsense_camera:1150 - RealSense camera 0 failed to connect properly
2025-08-06 19:05:02.362 | INFO     | phosphobot.camera:initialize_realsense_camera:1166 - No RealSense cameras initialized
2025-08-06 19:05:02.362 | INFO     | phosphobot.camera:detect_video_indexes:272 - (Linux) Found possible ports through scanning '/dev/video*': [0, 1, 2, 3, 4, 5, 6, 7]
2025-08-06 19:05:02.362 | INFO     | phosphobot.camera:detect_video_indexes:277 - Ignoring possible ports: [] (index > 10)
[ WARN:1@351.860] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video0): can't open camera by index
[ERROR:1@351.860] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
[ WARN:1@351.860] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video0): can't open camera by index
[ERROR:1@351.861] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
2025-08-06 19:05:02.391 | SUCCESS  | phosphobot.camera:_find_cameras:227 - Camera found at index 0
[ WARN:1@351.861] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video1): can't open camera by index
[ERROR:1@351.862] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
[ WARN:1@351.866] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video2): can't open camera by index
[ERROR:1@351.867] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
[ WARN:1@351.867] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video3): can't open camera by index
[ERROR:1@351.867] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
[ WARN:1@351.869] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video4): can't open camera by index
[ERROR:1@351.870] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
[ WARN:1@351.870] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video4): can't open camera by index
[ERROR:1@351.870] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
2025-08-06 19:05:02.400 | SUCCESS  | phosphobot.camera:_find_cameras:227 - Camera found at index 4
[ WARN:1@351.870] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video5): can't open camera by index
[ERROR:1@351.871] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
[ WARN:1@351.872] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video6): can't open camera by index
[ERROR:1@351.873] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
[ WARN:1@351.873] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video6): can't open camera by index
[ERROR:1@351.873] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
2025-08-06 19:05:02.403 | SUCCESS  | phosphobot.camera:_find_cameras:227 - Camera found at index 6
[ WARN:1@351.873] global cap_v4l.cpp:913 open VIDEOIO(V4L2:/dev/video7): can't open camera by index
[ERROR:1@351.874] global obsensor_uvc_stream_channel.cpp:158 getStreamChannelGroup Camera index out of range
2025-08-06 19:05:02.928 | DEBUG    | phosphobot.camera:detect_cameras:1023 - Ignoring camera 4: realsense
2025-08-06 19:05:02.928 | DEBUG    | phosphobot.camera:detect_cameras:1023 - Ignoring camera 6: realsense

我的两个camera4,6被忽略,因为没有初始化成功,于是想去代码中找答案,当然得再创建个虚拟环境,安装一下。

conda create -n phosphobot python=3.10
conda activate phosphobot

最终发现是参数问题:

                # Configure streams
                config.enable_stream(
                    stream_type=rs.stream.color,
                    format=rs.format.bgr8,
                )
                config.enable_stream(
                    stream_type=rs.stream.depth,
                    format=rs.format.z16,
                )

修改代码如下:

                config.enable_stream(
                    rs.stream.color,
                    640, 480, rs.format.bgr8, 30
                )
                config.enable_stream(
                    rs.stream.depth,
                    640, 480, rs.format.z16, 30
                )

自己构建前端...

# 安装 nvm
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash

# 安装 Node.js 18
export NVM_DIR="$HOME/.nvm"
nvm install 18

# 构建前端
export NVM_DIR="$HOME/.nvm"
make build_frontend

# 修复 app.py 中的拼写错误
search_replace(
    file_path="phosphobot/phosphobot/app.py",
    # old_string="posphobot/resources/dist",
    new_string="phosphobot/resources/dist"
)

然后

# 重新安装项目
pip install -e .

# 启动服务器
phosphobot run
(phosphobot) strawberry@strawberry-E500-G9-WS760T:~/zzy/project/phosphobot$ phosphobot run
2025-08-06 20:03:24.289 | INFO     | phosphobot.main:<module>:4 - Starting phosphobot...
sys.stdout.encoding = utf-8

    ░█▀█░█░█░█▀█░█▀▀░█▀█░█░█░█▀█░█▀▄░█▀█░▀█▀
    ░█▀▀░█▀█░█░█░▀▀█░█▀▀░█▀█░█░█░█▀▄░█░█░░█░
    ░▀░░░▀░▀░▀▀▀░▀▀▀░▀░░░▀░▀░▀▀▀░▀▀░░▀▀▀░░▀░

    phosphobot 0.3.83
    Copyright (c) 2025 phospho https://phospho.ai
            
2025-08-06 20:03:24.745 | WARNING  | phosphobot.main:run:319 - Port 80 is unavailable. Trying next...
pybullet build time: Jan 29 2025 23:16:28
2025-08-06 20:03:25.125 | DEBUG    | phosphobot.hardware.sim:init_simulation:42 - Simulation: headless mode enabled
INFO:     Started server process [537267]
INFO:     Waiting for application startup.
2025-08-06 20:03:26.512 | DEBUG    | phosphobot.utils:login_to_hf:211 - Successfully logged in to Hugging Face.
2025-08-06 20:03:26.848 | DEBUG    | phosphobot.utils:login_to_hf:216 - HF username or org ID: wantobcm
2025-08-06 20:03:26.848 | SUCCESS  | phosphobot.app:lifespan:81 - Startup complete. Go to the phosphobot dashboard here: http://192.168.1.112:8020

还是进入dashboard中,摄像头捕捉成功:

可以直接在这里打开摄像头,边看变记录数据:

所以我之后只要进入phosphobot环境,然后运行phosphobot run就可以了,其他服务也都是正常使用的。

5.1 录制数据集

设置HF token,录制数据集,录制好的自动会上传HF

5.2 微调GR00T

cd phosphobot

注意,gr00t1.5是我之前复现gr00t时创建的环境,现在要新安装phosphobot的相关依赖:

git clone https://github.com/phospho-app/Isaac-GR00T.git
conda activate gr00t1.5
pip install --upgrade setuptools
pip install -e Isaac-GR00T
cd phosphobot
pip install -e .
export HF_TOKEN=<your_token>

修改模型路径为本地路径:

修改batch_size

运行以下脚本:

python scripts/gr00t/train.py --dataset-name wantobcm/box2bowl

==================================================
GR00T FINE-TUNING CONFIGURATION:
==================================================
dataset_path: ['data']
validation_dataset_path: None
output_dir: outputs
data_config: so100
num_arms: 1
num_cams: 2
batch_size: 32
max_steps: 10000
num_epochs: 20
num_gpus: 1
save_steps: 2500
base_model_path: /home/strawberry/zzy/VLA/GR00T-N1.5-3B
tune_llm: False
tune_visual: False
tune_projector: True
tune_diffusion_model: True
resume: False
learning_rate: 0.0002
weight_decay: 1e-05
warmup_ratio: 0.05
lora_rank: 0
lora_alpha: 16
lora_dropout: 0.1
lora_full_model: False
dataloader_num_workers: 8
report_to: tensorboard
embodiment_tag: new_embodiment
video_backend: torchvision_av
train_test_split: 1
balance_dataset_weights: True
balance_trajectory_weights: True
==================================================

数据情况

  • 总帧数:11005
  • 训练轮数:20
  • 批次大小:32

所以实际步数 = (11005 * 20) // 32 + 1 = 6880,这是基于数据集大小设置的一个合理的训练步数,没毛病

监控GPU:

watch -n 1 nvidia-smi

查看日志曲线:

tensorboard --logdir=outputs/runs

进入这里查看:

http://localhost:6006/

最终6k步的曲线:

5.3 实机部署

开始推理服务:

python scripts/gr00t/serve.py --output-dir ./outputs

另一个终端,activate phosphobot:

phosphobot run

再在另一个终端启动:

python scripts/quickstart_ai_gr00t.py

要对其中的代码进行一些修改,官方的代码不是很好,但是官方dashboard中联网的服务应该没问题,以下是修改的过程:

  1. Changed the host from "localhost" to "127.0.0.1" to avoid the ValueError
  2. Fixed the API response key from "angles_rad" to "angles"
  3. Added proper handling for when cameras are not available by using dummy images
  4. Changed the image resolution from 320x240 to 224x224 to match the model's expectations
  5. Updated the state key from "state.arm" to "state.arm_0" to match the model's metadata

总之:

1. 终端1,gr00t1.5 虚拟环境下启动 python scripts/gr00t/serve.py --output-dir ./outputs

2. 终端2,phosphobot 虚拟环境下启动 phosphobot run

3. 终端3,phosphobot 虚拟环境下启动 python scripts/quickstart_ai_gr00t.py

成功率不是很高,可能只有30%,而且机械臂的行为还是比较固定的。

有趣的是当场景中box和bowl同时存在时,有时模型会直接去bowl上方开合夹爪,当场景中只有box时则会尝试抓取box。

后续会尝试一下phosphobot更新的其他功能。

6. phosphobot+pi0

\\待更


网站公告

今日签到

点亮在社区的每一天
去签到