关于大视频大文件诸如超过5个G或10个G的视频上传详解原理以及-5种语言实现-优雅草卓伊凡|深蓝-EW帮帮网

关于大视频大文件诸如超过5个G或10个G的视频上传详解原理以及-5种语言实现-优雅草卓伊凡|深蓝

优雅草团队2019年开始一直在音视频，直播，短视频领域钻研，那么其中有个环节不得不说就是上传，那么大文件上传是如何实现呢，包括优雅草比较热门的一款discuz插件也是大视频上传，当然了今天也是收到了其他甲方的咨询要求，针对他们需要大视频上传的问题。

大视频上传方案其实主要有以下内容和要点：

大文件上传核心技术详解

1. 分片上传 (Chunked Upload)

原理与实现

分片上传是将大文件分割成多个小块(如5MB/块)分别上传的技术。其核心流程包括：

前端使用File.slice()方法将文件切分为多个Blob对象
为每个分片生成唯一标识(通常使用文件hash+分片序号)
按顺序或并行上传各个分片
服务端接收并临时存储分片
所有分片上传完成后，服务端合并文件

为什么需要分片上传

解决大文件上传问题：绕过Web服务器对单个请求大小的限制(如Nginx默认1MB)
提高可靠性：单个分片上传失败只需重传该分片而非整个文件
适应不稳定网络：小分片更容易在弱网环境下传输成功
节省带宽：失败时只需重传失败的分片而非整个文件

2. 断点续传 (Resumable Upload)

原理与实现

断点续传依赖于分片上传，主要实现机制：

分片记录：服务端记录已成功接收的分片序号
客户端查询：上传前客户端询问服务端已接收的分片
续传逻辑：客户端只上传缺失的分片
唯一标识：使用文件内容hash或”文件名+大小+修改时间”作为唯一ID

关键数据结构：

// 断点信息记录
{
  "fileId": "abc123",
  "fileName": "video.mp4",
  "totalSize": 104857600,
  "totalChunks": 20,
  "uploadedChunks": [1,2,3,5,6], // 已上传分片
  "createdAt": "2023-07-20T08:00:00Z"
}

为什么需要断点续传

网络中断恢复：用户断网后重新连接可以继续上传
页面刷新恢复：用户意外刷新页面后不必重新上传
节省时间和带宽：避免重复上传已传输完成的部分
提升用户体验：用户感知上传过程更稳定可靠

3. 并行上传 (Parallel Upload)

原理与实现

并行上传是通过同时上传多个分片来提高总体速度的技术：

并发控制：设置合理的并行数(通常3-5个)
分片独立性：确保各分片可以独立上传且顺序无关
带宽分配：动态调整并行数基于当前网络状况

实现示例：

// 并行上传控制
const MAX_PARALLEL = 3;
let currentUploading = 0;

async function uploadChunks(chunks) {
  while(chunks.length > 0) {
    if(currentUploading < MAX_PARALLEL) {
      const chunk = chunks.shift();
      currentUploading++;

      uploadChunk(chunk).finally(() => {
        currentUploading--;
        uploadChunks(chunks); // 继续上传剩余分片
      });
    }
    await new Promise(resolve => setTimeout(resolve, 200));
  }
}

为什么需要并行上传

充分利用带宽：现代浏览器支持每个域名6个TCP连接
缩短上传时间：通过并行化将串行等待时间最小化
适应高延迟网络：在高延迟环境下并行上传优势更明显
提升吞吐量：特别是对于大文件和高速网络环境

4. 完整性校验 (Integrity Verification)

原理与实现

确保上传文件的完整性，主要方法：

分片校验：

- 每个分片上传时附带MD5/SHA1校验值
- 服务端接收后立即验证分片完整性

整体校验：

- 文件合并完成后计算整体hash值
- 与客户端最初计算的hash值比对

实现示例：

// 前端计算文件hash
async function calculateFileHash(file) {
  const buffer = await file.arrayBuffer();
  const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);
  return Array.from(new Uint8Array(hashBuffer))
    .map(b => b.toString(16).padStart(2, '0'))
    .join('');
}

// 服务端验证
const serverFileHash = createHash('sha256')
  .update(fs.readFileSync(filePath))
  .digest('hex');

if(clientFileHash !== serverFileHash) {
  throw new Error('File integrity check failed');
}

为什么需要完整性校验

防止传输错误：网络传输中可能发生数据损坏
防止恶意篡改：确保上传文件未被中间人修改
验证文件完整性：特别是对于关键业务文件
建立可信存储：为后续文件使用提供可信基础

5. 进度监控 (Progress Monitoring)

原理与实现

实时反馈上传进度，关键技术点：

分片级进度：跟踪每个分片的上传进度
整体进度计算：

整体进度 = (∑已上传分片大小) / 总文件大小

可视化展示：进度条、百分比、速度估算等

实现示例：

// 使用XMLHttpRequest监控进度
xhr.upload.onprogress = (event) => {
  if(event.lengthComputable) {
    const percent = Math.round((event.loaded / event.total) * 100);
    updateProgress(percent);
  }
};

// 整体进度计算
function updateProgress(chunkIndex, percent) {
  const chunkSize = file.size / totalChunks;
  const baseProgress = (chunkIndex / totalChunks) * 100;
  const chunkProgress = (percent / 100) * (100 / totalChunks);
  const totalProgress = baseProgress + chunkProgress;

  progressBar.style.width = `${totalProgress}%`;
}

为什么需要进度监控

用户体验：让用户明确知道上传状态和剩余时间
故障排查：帮助识别卡住或失败的上传过程
决策支持：用户可以根据进度决定是否暂停/取消
心理预期：减少用户等待的焦虑感
调试辅助：开发人员可以观察上传性能特征

技术组合关系

这些技术通常需要组合使用：

分片上传是基础：必须先分片才能实现其他功能
断点续传依赖分片记录：需要知道哪些分片已上传
并行上传提升分片上传效率：但对服务器压力更大
完整性校验保障最终结果：在所有分片合并后进行
进度监控贯穿全程：从第一个分片到最后一个分片

实际应用建议

分片大小选择：

- 局域网：1-5MB
- 移动网络：0.5-1MB
- 弱网环境：0.1-0.5MB

断点信息存储：

- 服务端：数据库或Redis
- 客户端：localStorage或IndexedDB

完整性校验优化：

- 大文件使用抽样校验(如只校验头尾和随机分片)
- 关键业务使用全量校验

进度显示优化：

- 显示上传速度/剩余时间
- 区分”上传中”、”验证中”等不同状态

错误恢复策略：

- 分片上传失败自动重试(2-3次)
- 整体失败后提供手动恢复选项

大视频上传解决方案详解

大视频上传的原理

大视频上传与普通小文件上传的主要区别在于处理方式。大视频上传通常需要：

分片上传：将大文件分割成多个小块(如5MB每块)依次上传
断点续传：记录已上传分片，网络中断后可从中断处继续
并行上传：同时上传多个分片提高速度
完整性校验：上传完成后验证文件完整性
进度监控：实时显示上传进度

各语言实现方案

PHP 实现方案

推荐组件：

Plupload (前端) + PHP 后端处理
Resumable.js (前端) + PHP 后端处理

示例代码：

// 分片上传处理
$targetDir = "uploads";
$chunkDir = "uploads/chunks";

// 确保目录存在
if (!file_exists($chunkDir)) {
    mkdir($chunkDir, 0777, true);
}

// 获取前端传递的参数
$chunkNumber = $_POST['chunkNumber'];
$totalChunks = $_POST['totalChunks'];
$identifier = $_POST['identifier'];
$filename = $_POST['filename'];

// 移动临时文件到分片目录
$chunkFile = $chunkDir . '/' . $identifier . '.part' . $chunkNumber;
move_uploaded_file($_FILES['file']['tmp_name'], $chunkFile);

// 检查是否所有分片都已上传
$uploadComplete = true;
for ($i = 1; $i <= $totalChunks; $i++) {
    if (!file_exists($chunkDir . '/' . $identifier . '.part' . $i)) {
        $uploadComplete = false;
        break;
    }
}

// 合并分片
if ($uploadComplete) {
    $targetFile = $targetDir . '/' . $filename;
    if (file_exists($targetFile)) {
        unlink($targetFile);
    }

    $out = fopen($targetFile, "wb");
    for ($i = 1; $i <= $totalChunks; $i++) {
        $chunkFile = $chunkDir . '/' . $identifier . '.part' . $i;
        $in = fopen($chunkFile, "rb");
        while ($buff = fread($in, 4096)) {
            fwrite($out, $buff);
        }
        fclose($in);
        unlink($chunkFile); // 删除分片
    }
    fclose($out);

    echo json_encode(['status' => 'done']);
} else {
    echo json_encode(['status' => 'chunk_uploaded']);
}

Java 实现方案

推荐组件：

Apache Commons FileUpload
Spring Web MVC 的 MultipartFile
或者使用第三方库如 tus-java-client

示例代码：

// Spring Boot 示例
@RestController
@RequestMapping("/upload")
public class UploadController {

    @PostMapping("/chunk")
    public ResponseEntity<String> uploadChunk(
            @RequestParam("file") MultipartFile file,
            @RequestParam("chunkNumber") int chunkNumber,
            @RequestParam("totalChunks") int totalChunks,
            @RequestParam("identifier") String identifier) throws IOException {

        String uploadDir = "uploads";
        String chunkDir = uploadDir + "/chunks";

        // 确保目录存在
        new File(chunkDir).mkdirs();

        // 保存分片
        String chunkFilename = identifier + ".part" + chunkNumber;
        file.transferTo(new File(chunkDir, chunkFilename));

        // 检查是否完成
        boolean uploadComplete = true;
        for (int i = 1; i <= totalChunks; i++) {
            File chunkFile = new File(chunkDir, identifier + ".part" + i);
            if (!chunkFile.exists()) {
                uploadComplete = false;
                break;
            }
        }

        if (uploadComplete) {
            // 合并文件
            File outputFile = new File(uploadDir, file.getOriginalFilename());
            try (FileOutputStream fos = new FileOutputStream(outputFile)) {
                for (int i = 1; i <= totalChunks; i++) {
                    File chunkFile = new File(chunkDir, identifier + ".part" + i);
                    Files.copy(chunkFile.toPath(), fos);
                    chunkFile.delete();
                }
            }
            return ResponseEntity.ok("Upload complete");
        } else {
            return ResponseEntity.ok("Chunk uploaded");
        }
    }
}

Go 实现方案

推荐组件：

Gin Web 框架
gorilla/mux 路由
原生 net/http

示例代码：

package main

import (
    "fmt"
    "io"
    "os"
    "path/filepath"
    "strconv"

    "github.com/gin-gonic/gin"
)

func main() {
    r := gin.Default()
    r.POST("/upload/chunk", handleChunkUpload)
    r.Run(":8080")
}

func handleChunkUpload(c *gin.Context) {
    uploadDir := "uploads"
    chunkDir := filepath.Join(uploadDir, "chunks")

    // 确保目录存在
    if err := os.MkdirAll(chunkDir, 0755); err != nil {
        c.JSON(500, gin.H{"error": err.Error()})
        return
    }

    // 获取参数
    chunkNumber, _ := strconv.Atoi(c.PostForm("chunkNumber"))
    totalChunks, _ := strconv.Atoi(c.PostForm("totalChunks"))
    identifier := c.PostForm("identifier")
    filename := c.PostForm("filename")

    // 保存分片
    file, header, err := c.Request.FormFile("file")
    if err != nil {
        c.JSON(400, gin.H{"error": err.Error()})
        return
    }
    defer file.Close()

    chunkPath := filepath.Join(chunkDir, fmt.Sprintf("%s.part%d", identifier, chunkNumber))
    out, err := os.Create(chunkPath)
    if err != nil {
        c.JSON(500, gin.H{"error": err.Error()})
        return
    }
    defer out.Close()

    _, err = io.Copy(out, file)
    if err != nil {
        c.JSON(500, gin.H{"error": err.Error()})
        return
    }

    // 检查是否完成
    complete := true
    for i := 1; i <= totalChunks; i++ {
        _, err := os.Stat(filepath.Join(chunkDir, fmt.Sprintf("%s.part%d", identifier, i)))
        if os.IsNotExist(err) {
            complete = false
            break
        }
    }

    if complete {
        // 合并文件
        outputPath := filepath.Join(uploadDir, filename)
        out, err := os.Create(outputPath)
        if err != nil {
            c.JSON(500, gin.H{"error": err.Error()})
            return
        }
        defer out.Close()

        for i := 1; i <= totalChunks; i++ {
            chunkPath := filepath.Join(chunkDir, fmt.Sprintf("%s.part%d", identifier, i))
            in, err := os.Open(chunkPath)
            if err != nil {
                c.JSON(500, gin.H{"error": err.Error()})
                return
            }

            _, err = io.Copy(out, in)
            in.Close()
            if err != nil {
                c.JSON(500, gin.H{"error": err.Error()})
                return
            }

            os.Remove(chunkPath)
        }

        c.JSON(200, gin.H{"status": "done"})
    } else {
        c.JSON(200, gin.H{"status": "chunk_uploaded"})
    }
}

Node.js 实现方案

推荐组件：

multer 或 busboy 处理文件上传
express 作为 web 框架
或者使用 tus-node-server

示例代码：

const express = require('express');
const multer = require('multer');
const fs = require('fs');
const path = require('path');

const app = express();
const upload = multer({ dest: 'uploads/chunks/' });

const UPLOAD_DIR = 'uploads';
const CHUNK_DIR = path.join(UPLOAD_DIR, 'chunks');

// 确保目录存在
if (!fs.existsSync(CHUNK_DIR)) {
    fs.mkdirSync(CHUNK_DIR, { recursive: true });
}

app.post('/upload/chunk', upload.single('file'), (req, res) => {
    const { chunkNumber, totalChunks, identifier, filename } = req.body;

    // 重命名临时文件为分片文件
    const chunkFilename = `${identifier}.part${chunkNumber}`;
    const oldPath = req.file.path;
    const newPath = path.join(CHUNK_DIR, chunkFilename);

    fs.rename(oldPath, newPath, (err) => {
        if (err) {
            return res.status(500).json({ error: err.message });
        }

        // 检查是否所有分片都已上传
        let allChunksUploaded = true;
        for (let i = 1; i <= totalChunks; i++) {
            const chunkPath = path.join(CHUNK_DIR, `${identifier}.part${i}`);
            if (!fs.existsSync(chunkPath)) {
                allChunksUploaded = false;
                break;
            }
        }

        if (allChunksUploaded) {
            // 合并文件
            const outputPath = path.join(UPLOAD_DIR, filename);
            const writeStream = fs.createWriteStream(outputPath);

            const mergeChunks = (i) => {
                if (i > totalChunks) {
                    writeStream.end();
                    // 清理分片
                    for (let j = 1; j <= totalChunks; j++) {
                        fs.unlinkSync(path.join(CHUNK_DIR, `${identifier}.part${j}`));
                    }
                    return res.json({ status: 'done' });
                }

                const chunkPath = path.join(CHUNK_DIR, `${identifier}.part${i}`);
                const readStream = fs.createReadStream(chunkPath);

                readStream.pipe(writeStream, { end: false });
                readStream.on('end', () => {
                    mergeChunks(i + 1);
                });
                readStream.on('error', (err) => {
                    writeStream.end();
                    res.status(500).json({ error: err.message });
                });
            };

            mergeChunks(1);
        } else {
            res.json({ status: 'chunk_uploaded' });
        }
    });
});

app.listen(3000, () => {
    console.log('Server running on port 3000');
});

Python 实现方案

推荐组件：

Django 或 Flask 作为 web 框架
django-chunked-upload (Django 专用)
Flask-Reuploaded 或 Flask-Uploads

示例代码 (Flask):

from flask import Flask, request, jsonify
import os
import shutil

app = Flask(__name__)

UPLOAD_DIR = 'uploads'
CHUNK_DIR = os.path.join(UPLOAD_DIR, 'chunks')

# 确保目录存在
os.makedirs(CHUNK_DIR, exist_ok=True)

@app.route('/upload/chunk', methods=['POST'])
def upload_chunk():
    chunk_number = int(request.form.get('chunkNumber'))
    total_chunks = int(request.form.get('totalChunks'))
    identifier = request.form.get('identifier')
    filename = request.form.get('filename')

    # 保存分片
    file = request.files['file']
    chunk_filename = f"{identifier}.part{chunk_number}"
    chunk_path = os.path.join(CHUNK_DIR, chunk_filename)
    file.save(chunk_path)

    # 检查是否所有分片都已上传
    all_chunks_uploaded = True
    for i in range(1, total_chunks + 1):
        if not os.path.exists(os.path.join(CHUNK_DIR, f"{identifier}.part{i}")):
            all_chunks_uploaded = False
            break

    if all_chunks_uploaded:
        # 合并文件
        output_path = os.path.join(UPLOAD_DIR, filename)
        with open(output_path, 'wb') as output_file:
            for i in range(1, total_chunks + 1):
                chunk_path = os.path.join(CHUNK_DIR, f"{identifier}.part{i}")
                with open(chunk_path, 'rb') as chunk_file:
                    shutil.copyfileobj(chunk_file, output_file)
                os.unlink(chunk_path)  # 删除分片

        return jsonify({'status': 'done'})
    else:
        return jsonify({'status': 'chunk_uploaded'})

if __name__ == '__main__':
    app.run(debug=True)

前端实现建议

无论后端使用哪种语言，前端实现大文件上传通常需要：

文件分片：使用 File API 的 slice 方法
并发控制：限制同时上传的分片数量
进度显示：跟踪每个分片的上传进度
断点续传：记录已上传分片

简单前端示例 (JavaScript):

async function uploadFile(file) {
    const chunkSize = 5 * 1024 * 1024; // 5MB
    const totalChunks = Math.ceil(file.size / chunkSize);
    const identifier = `${file.name}-${file.size}-${Date.now()}`;

    for (let i = 0; i < totalChunks; i++) {
        const start = i * chunkSize;
        const end = Math.min(start + chunkSize, file.size);
        const chunk = file.slice(start, end);

        const formData = new FormData();
        formData.append('file', chunk);
        formData.append('chunkNumber', i + 1);
        formData.append('totalChunks', totalChunks);
        formData.append('identifier', identifier);
        formData.append('filename', file.name);

        try {
            const response = await fetch('/upload/chunk', {
                method: 'POST',
                body: formData
            });

            const result = await response.json();
            if (result.status === 'done') {
                console.log('Upload complete!');
                break;
            } else {
                console.log(`Uploaded chunk ${i + 1} of ${totalChunks}`);
            }
        } catch (error) {
            console.error('Error uploading chunk:', error);
            // 可以在这里实现重试逻辑
            i--; // 重试当前分片
        }
    }
}

// 使用示例
document.getElementById('file-input').addEventListener('change', (e) => {
    const file = e.target.files[0];
    if (file) {
        uploadFile(file);
    }
});

优化建议

分片大小：根据网络状况动态调整分片大小
并发上传：同时上传多个分片提高速度
断点续传：记录已上传分片，支持从中断处继续
压缩：上传前压缩视频（如果适用）
CDN：使用CDN加速上传
直接上传到云存储：考虑直接上传到S3、OSS等云存储服务

其实我们核心要点就是分片上传，断点上传，并行上传，完整性校验，进度监控，这几个点。

关于大视频大文件诸如超过5个G或10个G的视频上传详解原理以及-5种语言实现-优雅草卓伊凡|深蓝