在计算机视觉与人机交互领域,手势识别是一个非常有趣的应用场景。本文将带你用 Mediapipe 和 Python 实现一个基于摄像头的手势识别“剪刀石头布”小游戏,并展示实时手势与游戏结果。
1. 项目概述
本项目实现了一个基于摄像头的人机互动小游戏 —— 石头剪刀布 + OK 手势控制。玩家只需对着摄像头做出相应手势,系统便会自动识别并判定游戏结果,整个过程无需鼠标键盘操作,完全通过手势完成。
具体功能包括:
实时手势检测
借助 MediaPipe Hand Landmarker 模型,系统能够实时检测玩家的手部关键点,并通过几何计算判断玩家当前的手势类型。支持的手势包括:石头(Rock)
剪刀(Scissors)
布(Paper)
OK 手势(用于游戏重置)
游戏逻辑
当玩家展示 石头/剪刀/布 时,系统会让电脑随机选择一种出拳方式,并即时对比胜负。
当玩家展示 OK 手势 时,系统会自动重置游戏,清除上一轮的结果,等待下一局。
可视化效果
在摄像头画面上实时绘制手部关键点与连线,帮助玩家直观了解系统的检测效果。
在手部附近叠加文字标注,显示系统识别到的手势。
在屏幕左上角动态显示:
电脑的出拳选择
本轮的胜负结果(玩家胜利、电脑胜利或平局)
2. MediaPipe 简介
MediaPipe 是由 Google Research 推出的一个开源跨平台机器学习框架,专注于实时的多模态应用(如视频、音频和传感器数据)的处理。它为开发者提供了一系列 预训练模型 和 高性能计算管道,在移动端、桌面端甚至 Web 环境中都能高效运行。
在本项目中,我们使用了 MediaPipe 的 Hand Landmarker 模型,它可以在单帧图像或视频流中检测到 21 个手部关键点(包括手指关节和指尖)。这些关键点的位置数据为手势识别提供了坚实的基础。
MediaPipe 的优势主要体现在:
高性能:底层针对 CPU/GPU 进行了优化,能在实时视频流中稳定运行。
跨平台:支持 Android、iOS、Windows、Linux 和 Web,适合快速开发跨平台应用。
模块化:提供了丰富的组件(如 Face Mesh、Pose、Hand Tracking 等),可以灵活组合使用。
易用性:Python API 简洁直观,适合科研、教学和快速原型开发。
因此,MediaPipe 不仅是手势识别领域的常用工具,也广泛应用于 人脸识别、姿态检测、表情分析、增强现实(AR)和虚拟试衣等场景。
3. 核心类:SimpleHandGestureGame
class SimpleHandGestureGame:
def __init__(self, model_path="hand_landmarker.task", num_hands=1):
# 初始化 Mediapipe HandLandmarker
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.HandLandmarkerOptions(base_options=base_options, num_hands=num_hands)
self.detector = vision.HandLandmarker.create_from_options(options)
self.computer_choice = None
self.round_result = ""
self.round_played = False
3.1 手势绘制 _draw_landmarks
def _draw_landmarks(self, rgb_image, detection_result):
annotated_image = np.copy(rgb_image)
if detection_result.hand_landmarks:
for hand_landmarks in detection_result.hand_landmarks:
proto_landmarks = landmark_pb2.NormalizedLandmarkList()
proto_landmarks.landmark.extend([landmark_pb2.NormalizedLandmark(x=lm.x, y=lm.y, z=lm.z) for lm in hand_landmarks])
solutions.drawing_utils.draw_landmarks(
image=annotated_image,
landmark_list=proto_landmarks,
connections=mp.solutions.hands.HAND_CONNECTIONS,
landmark_drawing_spec=solutions.drawing_styles.get_default_hand_landmarks_style(),
connection_drawing_spec=solutions.drawing_styles.get_default_hand_connections_style()
)
return annotated_image
3.2 手势识别 _judge_gesture
我们通过手指关键点的伸直状态判断手势:
石头(Rock):全部手指弯曲
剪刀(Scissors):食指和中指伸直,其他弯曲
布(Paper):五指全部伸直
OK:拇指与食指形成圆圈,其余三指伸直
def _judge_gesture(self, hand_landmarks):
def is_straight(tip, pip, mcp=None):
if mcp:
a, b, c = np.array([tip.x, tip.y]), np.array([pip.x, pip.y]), np.array([mcp.x, mcp.y])
ba, bc = a - b, c - b
cos_angle = np.dot(ba, bc) / (np.linalg.norm(ba) * np.linalg.norm(bc) + 1e-6)
return np.arccos(np.clip(cos_angle, -1, 1)) * 180 / np.pi > 160
else:
return tip.y < pip.y
thumb_straight = is_straight(hand_landmarks[4], hand_landmarks[2], hand_landmarks[1])
index_straight = is_straight(hand_landmarks[8], hand_landmarks[6])
middle_straight = is_straight(hand_landmarks[12], hand_landmarks[10])
ring_straight = is_straight(hand_landmarks[16], hand_landmarks[14])
pinky_straight = is_straight(hand_landmarks[20], hand_landmarks[18])
total = sum([thumb_straight, index_straight, middle_straight, ring_straight, pinky_straight])
# OK gesture
thumb_tip, index_tip = np.array([hand_landmarks[4].x, hand_landmarks[4].y]), np.array([hand_landmarks[8].x, hand_landmarks[8].y])
if np.linalg.norm(thumb_tip - index_tip) < 0.05 and middle_straight and ring_straight and pinky_straight:
return "OK"
if total == 0: return "Rock"
if total == 2 and index_straight and middle_straight: return "Scissors"
if total == 5: return "Paper"
return "Undefined"
3.3 游戏逻辑 _play_game
def _play_game(self, player_choice):
choices = ["Rock", "Scissors", "Paper"]
if self.computer_choice is None:
self.computer_choice = random.choice(choices)
if player_choice == self.computer_choice:
self.round_result = "Draw"
elif (player_choice == "Rock" and self.computer_choice == "Scissors") or \
(player_choice == "Scissors" and self.computer_choice == "Paper") or \
(player_choice == "Paper" and self.computer_choice == "Rock"):
self.round_result = "You Win"
else:
self.round_result = "Computer Wins"
self.round_played = True
3.4 图像处理 do
最终的 do
方法负责:
读取摄像头帧
调用 Mediapipe 检测手势
绘制手部关键点和手势文字
显示电脑出拳及胜负结果
def do(self, frame, device=None):
if frame is None: return None
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
detection_result = self.detector.detect(mp_image)
annotated = self._draw_landmarks(mp_image.numpy_view(), detection_result)
# ...绘制手势文字和游戏结果...
return cv2.cvtColor(annotated, cv2.COLOR_RGB2BGR)
4. 快速体验
import cv2
import numpy as np
import mediapipe as mp
from mediapipe import solutions
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import random
class SimpleHandGestureGame:
def __init__(self, model_path="文件地址/hand_landmarker.task", num_hands=1):
"""Initialize Mediapipe HandLandmarker and game state"""
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.HandLandmarkerOptions(base_options=base_options, num_hands=num_hands)
self.detector = vision.HandLandmarker.create_from_options(options)
self.computer_choice = None
self.round_result = ""
self.round_played = False
def _draw_landmarks(self, rgb_image, detection_result):
annotated_image = np.copy(rgb_image)
if detection_result.hand_landmarks:
for hand_landmarks in detection_result.hand_landmarks:
proto_landmarks = landmark_pb2.NormalizedLandmarkList()
proto_landmarks.landmark.extend([landmark_pb2.NormalizedLandmark(x=lm.x, y=lm.y, z=lm.z) for lm in hand_landmarks])
solutions.drawing_utils.draw_landmarks(
image=annotated_image,
landmark_list=proto_landmarks,
connections=mp.solutions.hands.HAND_CONNECTIONS,
landmark_drawing_spec=solutions.drawing_styles.get_default_hand_landmarks_style(),
connection_drawing_spec=solutions.drawing_styles.get_default_hand_connections_style()
)
return annotated_image
def _judge_gesture(self, hand_landmarks):
"""Determine hand gesture: Rock-Paper-Scissors + OK"""
def is_straight(tip, pip, mcp=None):
if mcp:
a, b, c = np.array([tip.x, tip.y]), np.array([pip.x, pip.y]), np.array([mcp.x, mcp.y])
ba, bc = a - b, c - b
cos_angle = np.dot(ba, bc) / (np.linalg.norm(ba) * np.linalg.norm(bc) + 1e-6)
return np.arccos(np.clip(cos_angle, -1, 1)) * 180 / np.pi > 160
else:
return tip.y < pip.y
thumb_straight = is_straight(hand_landmarks[4], hand_landmarks[2], hand_landmarks[1])
index_straight = is_straight(hand_landmarks[8], hand_landmarks[6])
middle_straight = is_straight(hand_landmarks[12], hand_landmarks[10])
ring_straight = is_straight(hand_landmarks[16], hand_landmarks[14])
pinky_straight = is_straight(hand_landmarks[20], hand_landmarks[18])
thumb, index, middle, ring, pinky = thumb_straight, index_straight, middle_straight, ring_straight, pinky_straight
total = sum([thumb, index, middle, ring, pinky])
# OK gesture
thumb_tip, index_tip = np.array([hand_landmarks[4].x, hand_landmarks[4].y]), np.array([hand_landmarks[8].x, hand_landmarks[8].y])
if np.linalg.norm(thumb_tip - index_tip) < 0.05 and middle and ring and pinky:
return "OK"
# Rock-Paper-Scissors
if total == 0:
return "Rock"
if total == 2 and index and middle:
return "Scissors"
if total == 5:
return "Paper"
return "Undefined"
def _play_game(self, player_choice):
"""Determine the result of Rock-Paper-Scissors round"""
choices = ["Rock", "Scissors", "Paper"]
if self.computer_choice is None:
self.computer_choice = random.choice(choices)
if player_choice == self.computer_choice:
self.round_result = "Draw"
elif (player_choice == "Rock" and self.computer_choice == "Scissors") or \
(player_choice == "Scissors" and self.computer_choice == "Paper") or \
(player_choice == "Paper" and self.computer_choice == "Rock"):
self.round_result = "You Win"
else:
self.round_result = "Computer Wins"
self.round_played = True
def do(self, frame, device=None):
"""Process a single frame, overlay hand gesture and game result (vertically)"""
if frame is None:
return None
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
detection_result = self.detector.detect(mp_image)
annotated = self._draw_landmarks(mp_image.numpy_view(), detection_result)
gesture_display = ""
if detection_result.hand_landmarks:
for hand_landmarks in detection_result.hand_landmarks:
gesture = self._judge_gesture(hand_landmarks)
if gesture == "OK":
self.computer_choice = random.choice(["Rock", "Scissors", "Paper"])
self.round_result = ""
self.round_played = False
gesture_display = "Game Ready..."
elif gesture in ["Rock", "Scissors", "Paper"] and not self.round_played:
self._play_game(gesture)
gesture_display = f"{gesture}"
else:
gesture_display = gesture
h, w, _ = annotated.shape
index_finger_tip = hand_landmarks[8]
cx, cy = int(index_finger_tip.x * w), int(index_finger_tip.y * h)
cv2.putText(annotated, gesture_display, (cx, cy - 20), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 0), 2)
if self.round_result:
start_x, start_y, line_height = 30, 50, 40
lines = [f"Computer Choice: {self.computer_choice}", f"Result: {self.round_result}"]
for i, line in enumerate(lines):
cv2.putText(annotated, line, (start_x, start_y + i * line_height), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 255), 3)
return cv2.cvtColor(annotated, cv2.COLOR_RGB2BGR)
5. 总结
本文展示了如何用 Mediapipe HandLandmarker 快速搭建一个实时手势识别小游戏。通过关键点计算与简单逻辑判断,实现了石头、剪刀、布和 OK 手势识别,并结合游戏逻辑输出结果。
对 PiscTrace or PiscCode感兴趣?更多精彩内容请移步官网看看~🔗 PiscTrace