官网文档:Deploy StarRocks with Docker | StarRocks
如果Downloading 不动,停止后再启动。
#启动:starrocks
docker run -p 9030:9030 -p 8030:8030 -p 8040:8040 -itd --name quickstart starrocks/allin1-ubuntu
#下载数据包
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csv
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv
# MySql终端链接
docker exec -it quickstart mysql -P 9030 -h 127.0.0.1 -u root --prompt="StarRocks > "
## --prompt 解释:是 MySQL 客户端的一个选项,用于自定义命令行提示符。
连接成功:
或使用Mysql工具连接 ROOT 密码为空
创建数据库、表
CREATE DATABASE IF NOT EXISTS quickstart;
USE quickstart;
报错:curl: (3) URL using bad/illegal format or missing URL
原因:PowerShell 中使用的是交互式输入方式(即每行手动输入),这种方式容易导致
curl
解析参数失败。特别是当你在终端中逐行输入命令时,PowerShell 的
curl.exe
会尝试立即执行它已经“看到”的内容,而不是等待整个命令拼接完成。
优化通过脚本通过Python写入
import requests
from requests.auth import HTTPBasicAuth
import os
# 配置参数
STARROCKS_URL = "http://localhost:8030/api/quickstart/crashdata/_stream_load"
CSV_FILE_PATH = "./NYPD_Crash_Data.csv"
HEADERS = {
"label": "crashdata-0",
"column_separator": ",",
"skip_header": "1",
"enclose": '"',
"max_filter_ratio": "1",
"columns": (
"tmp_CRASH_DATE, tmp_CRASH_TIME, "
"CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),"
"BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,"
"ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,"
"NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,"
"NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,"
"NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,"
"NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,"
"CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,"
"CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,"
"CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,"
"VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,"
"VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5"
),
"Expect": "100-continue"
}
USER = "root"
PASSWORD = "" # 如果设置了密码,请填写(如 'your_password')
def upload_to_starrocks():
if not os.path.exists(CSV_FILE_PATH):
print(f"❌ 文件 {CSV_FILE_PATH} 不存在")
return
print("⏳ 正在上传文件...")
with open(CSV_FILE_PATH, "rb") as f:
try:
response = requests.put(
STARROCKS_URL,
auth=HTTPBasicAuth(USER, PASSWORD),
headers=HEADERS,
data=f,
timeout=6000 # 设置最大等待时间
)
except requests.exceptions.Timeout:
print("❌ 请求超时,请检查网络或 StarRocks 是否正常")
return
except Exception as e:
print(f"❌ 发生异常:{e}")
return
print("✅ 响应状态码:", response.status_code)
try:
print("📄 响应内容:\n", response.json())
except Exception:
print("📄 原始响应内容:\n", response.text)
if __name__ == "__main__":
upload_to_starrocks()
这个脚本试了很慢,采用文件上传至容器内的方式,导入成功
docker cp ../weather/output/isd_lite_2021_china_with_station_info.csv quickstart:/data/tmp
root@46b4bd1c3a6a:/data/tmp# curl --location-trusted -u root \
> -T ./isd_lite_2021_china_with_station_info.csv \
> -H "label:gz-weather-0" \
> -H "column_separator:," \
> -H "skip_header:1" \
> -H "enclose:\"" \
> -H "max_filter_ratio:1" \
> -H "columns:year,month,day,hour,temp,dew_point,slp,wind_dir,wind_speed,sky_cover,precip_1hr,precip_6hr,station_id,station_name,country,latitude,longitude,elevation,datetime" \
成功导入2000万条数据速度极快