📋 目录
项目概述
什么是Windows-MCP.Net?
Windows MCP.Net是一个基于.NET 10.0开发的Windows桌面自动化MCP(Model Context Protocol)服务器,专为AI助手提供与Windows桌面环境交互的强大能力。该项目通过标准化的MCP协议,让AI助手能够直接操控Windows系统,实现真正的桌面自动化。
项目亮点
🚀 基于最新技术栈:采用.NET 10.0框架,性能卓越
🔧 模块化设计:清晰的分层架构,易于扩展和维护
🎯 功能全面:涵盖桌面操作、文件系统、OCR识别、系统控制等多个领域
📊 标准化协议:遵循MCP协议规范,与各种AI客户端无缝集成
🛡️ 安全可靠:完善的错误处理和日志记录机制
技术架构深度解析
整体架构设计
Windows MCP.Net采用经典的分层架构模式,主要包含以下几个层次:
┌─────────────────────────────────────┐
│ MCP Protocol Layer │ ← 协议通信层
├─────────────────────────────────────┤
│ Tools Layer │ ← 工具实现层
├─────────────────────────────────────┤
│ Services Layer │ ← 业务服务层
├─────────────────────────────────────┤
│ Interface Layer │ ← 接口定义层
├─────────────────────────────────────┤
│ Windows API Layer │ ← 系统API层
└─────────────────────────────────────┘
核心组件分析
1. 接口定义层(Interface Layer)
项目定义了清晰的服务接口,实现了良好的解耦:
// 桌面服务接口
public interface IDesktopService
{
Task<string> GetDesktopStateAsync(bool useVision = false);
Task<(string Response, int Status)> ClickAsync(int x, int y, string button = "left", int clickCount = 1);
Task<(string Response, int Status)> TypeAsync(int x, int y, string text, bool clear = false, bool pressEnter = false);
// ... 更多方法
}
// 文件系统服务接口
public interface IFileSystemService
{
Task<(string Response, int Status)> CreateFileAsync(string path, string content);
Task<(string Content, int Status)> ReadFileAsync(string path);
Task<(string Response, int Status)> WriteFileAsync(string path, string content, bool append = false);
// ... 更多方法
}
// 系统控制服务接口
public interface ISystemControlService
{
Task<string> SetVolumeAsync(bool increase);
Task<string> SetVolumePercentAsync(int percent);
Task<string> SetBrightnessAsync(bool increase);
// ... 更多方法
}
2. 服务实现层(Services Layer)
服务层是项目的核心,实现了具体的业务逻辑:
public class DesktopService : IDesktopService
{
private readonly ILogger<DesktopService> _logger;
// Windows API 声明
[DllImport("user32.dll")]
private static extern bool SetCursorPos(int x, int y);
[DllImport("user32.dll")]
private static extern void mouse_event(uint dwFlags, uint dx, uint dy, uint dwData, int dwExtraInfo);
// 实现具体的桌面操作逻辑
public async Task<(string Response, int Status)> ClickAsync(int x, int y, string button = "left", int clickCount = 1)
{
try
{
SetCursorPos(x, y);
await Task.Delay(50); // 短暂延迟确保光标移动完成
uint mouseDown, mouseUp;
switch (button.ToLower())
{
case "left":
mouseDown = MOUSEEVENTF_LEFTDOWN;
mouseUp = MOUSEEVENTF_LEFTUP;
break;
case "right":
mouseDown = MOUSEEVENTF_RIGHTDOWN;
mouseUp = MOUSEEVENTF_RIGHTUP;
break;
default:
return ("Invalid button type", 1);
}
for (int i = 0; i < clickCount; i++)
{
mouse_event(mouseDown, 0, 0, 0, 0);
mouse_event(mouseUp, 0, 0, 0, 0);
if (i < clickCount - 1) await Task.Delay(100);
}
return ($"Successfully clicked at ({x}, {y}) with {button} button {clickCount} time(s)", 0);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error clicking at ({X}, {Y})", x, y);
return ($"Error: {ex.Message}", 1);
}
}
}
3. 工具实现层(Tools Layer)
工具层将服务功能封装为MCP工具,提供标准化的接口:
[McpServerToolType]
public class ClickTool
{
private readonly IDesktopService _desktopService;
private readonly ILogger<ClickTool> _logger;
public ClickTool(IDesktopService desktopService, ILogger<ClickTool> logger)
{
_desktopService = desktopService;
_logger = logger;
}
[McpServerTool, Description("Click at specific coordinates on the screen")]
public async Task<string> ClickAsync(
[Description("X coordinate")] int x,
[Description("Y coordinate")] int y,
[Description("Mouse button: left, right, or middle")] string button = "left",
[Description("Number of clicks: 1=single, 2=double, 3=triple")] int clickCount = 1)
{
_logger.LogInformation("Clicking at ({X}, {Y}) with {Button} button, {ClickCount} times", x, y, button, clickCount);
var (response, status) = await _desktopService.ClickAsync(x, y, button, clickCount);
var result = new
{
success = status == 0,
message = response,
coordinates = new { x, y },
button,
clickCount
};
return JsonSerializer.Serialize(result, new JsonSerializerOptions { WriteIndented = true });
}
}
核心功能模块详解
1. 桌面操作模块(Desktop Tools)
桌面操作模块是项目的核心,提供了丰富的Windows桌面交互功能:
鼠标操作
ClickTool:支持左键、右键、中键的单击、双击、三击操作
DragTool:实现拖拽操作,支持文件拖拽、窗口移动等
MoveTool:精确控制鼠标光标位置
ScrollTool:支持垂直和水平滚动操作
键盘操作
TypeTool:智能文本输入,支持清除现有内容和自动回车
KeyTool:单个按键操作,支持所有键盘按键
ShortcutTool:快捷键组合操作,如Ctrl+C、Alt+Tab等
应用程序管理
LaunchTool:从开始菜单启动应用程序,支持多语言环境
SwitchTool:智能窗口切换,支持窗口标题模糊匹配
ResizeTool:窗口大小和位置调整
2. 文件系统模块(FileSystem Tools)
文件系统模块提供了完整的文件和目录操作功能:
// 文件操作示例
[McpServerTool, Description("Write content to a file")]
public async Task<string> WriteFileAsync(
[Description("The file path to write to")] string path,
[Description("The content to write to the file")] string content,
[Description("Whether to append to existing content (true) or overwrite (false)")] bool append = false)
{
try
{
_logger.LogInformation("Writing to file: {Path}, Append: {Append}", path, append);
var (response, status) = await _fileSystemService.WriteFileAsync(path, content, append);
var result = new
{
success = status == 0,
message = response,
path,
contentLength = content?.Length ?? 0,
append
};
return JsonSerializer.Serialize(result, new JsonSerializerOptions { WriteIndented = true });
}
catch (Exception ex)
{
_logger.LogError(ex, "Error in WriteFileAsync");
var errorResult = new
{
success = false,
message = $"Error writing to file: {ex.Message}",
path,
append
};
return JsonSerializer.Serialize(errorResult, new JsonSerializerOptions { WriteIndented = true });
}
}
3. 系统控制模块(SystemControl Tools)
系统控制模块提供了Windows系统级别的控制功能:
音量控制
[McpServerTool, Description("Set system volume to a specific percentage")]
public async Task<string> SetVolumePercentAsync(
[Description("Volume percentage (0-100)")] int percent)
{
_logger.LogInformation("Setting volume to {Percent}%", percent);
return await _systemControlService.SetVolumePercentAsync(percent);
}
亮度控制
[McpServerTool, Description("Set screen brightness to a specific percentage")]
public async Task<string> SetBrightnessPercentAsync(
[Description("Brightness percentage (0-100)")] int percent)
{
_logger.LogInformation("Setting brightness to {Percent}%", percent);
return await _systemControlService.SetBrightnessPercentAsync(percent);
}
分辨率控制
[McpServerTool, Description("Set screen resolution")]
public async Task<string> SetResolutionAsync(
[Description("Resolution type: \"high\", \"medium\", or \"low\"")] string type)
{
_logger.LogInformation("Setting resolution to: {Type}", type);
return await _systemControlService.SetResolutionAsync(type);
}
4. OCR识别模块(OCR Tools)
OCR模块提供了强大的文字识别功能,支持屏幕文字提取和定位:
ExtractTextFromScreenTool:全屏文字提取
ExtractTextFromRegionTool:指定区域文字提取
FindTextOnScreenTool:屏幕文字查找
GetTextCoordinatesTool:获取文字坐标位置
代码实现分析
依赖注入与服务注册
项目使用了.NET的依赖注入容器,实现了良好的解耦:
// Program.cs 中的服务注册
var builder = Host.CreateApplicationBuilder(args);
// 配置日志输出到stderr(stdout用于MCP协议消息)
builder.Logging.AddConsole(o => o.LogToStandardErrorThreshold = LogLevel.Trace);
// 注册MCP服务和工具
builder.Services
.AddSingleton<IDesktopService, DesktopService>()
.AddSingleton<IFileSystemService, FileSystemService>()
.AddSingleton<IOcrService, OcrService>()
.AddSingleton<ISystemControlService, SystemControlService>()
.AddMcpServer()
.WithStdioServerTransport()
.WithToolsFromAssembly(Assembly.GetExecutingAssembly());
错误处理与日志记录
项目采用了统一的错误处理模式:
try
{
// 业务逻辑
var result = await SomeOperation();
return ("Success message", 0);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error in operation with parameters {Param1}, {Param2}", param1, param2);
return ($"Error: {ex.Message}", 1);
}
Windows API集成
项目大量使用了Windows API来实现底层功能:
// Windows API 声明
[DllImport("user32.dll")]
private static extern bool SetCursorPos(int x, int y);
[DllImport("user32.dll")]
private static extern void mouse_event(uint dwFlags, uint dx, uint dy, uint dwData, int dwExtraInfo);
[DllImport("user32.dll")]
private static extern IntPtr GetForegroundWindow();
[DllImport("user32.dll")]
private static extern int GetWindowText(IntPtr hWnd, StringBuilder text, int count);
// 常量定义
private const uint MOUSEEVENTF_LEFTDOWN = 0x02;
private const uint MOUSEEVENTF_LEFTUP = 0x04;
private const uint MOUSEEVENTF_RIGHTDOWN = 0x08;
private const uint MOUSEEVENTF_RIGHTUP = 0x10;
使用场景与实战案例
场景1:自动化办公任务
{
"tool": "launch_app",
"params": {
"name": "notepad"
}
}
{
"tool": "type",
"params": {
"x": 400,
"y": 300,
"text": "这是一个自动化生成的报告\n\n日期:2024年1月15日\n内容:系统运行正常",
"clear": true
}
}
{
"tool": "key",
"params": {
"key": "ctrl+s"
}
}
场景2:批量文件处理
{
"tool": "list_directory",
"params": {
"path": "C:\\Documents",
"includeFiles": true,
"recursive": false
}
}
{
"tool": "search_files_by_extension",
"params": {
"directory": "C:\\Documents",
"extension": ".txt",
"recursive": true
}
}
{
"tool": "copy_file",
"params": {
"source": "C:\\Documents\\report.txt",
"destination": "C:\\Backup\\report_backup.txt",
"overwrite": true
}
}
场景3:系统监控与控制
{
"tool": "get_desktop_state",
"params": {
"useVision": false
}
}
{
"tool": "set_volume_percent",
"params": {
"percent": 50
}
}
{
"tool": "set_brightness_percent",
"params": {
"percent": 80
}
}
性能优化与最佳实践
1. 异步编程模式
项目全面采用异步编程模式,提高了并发性能:
public async Task<string> ProcessLargeFileAsync(string filePath)
{
// 使用异步I/O操作
var content = await File.ReadAllTextAsync(filePath);
// 异步处理
var processedContent = await ProcessContentAsync(content);
// 异步写入
await File.WriteAllTextAsync(filePath + ".processed", processedContent);
return "Processing completed";
}
2. 资源管理
public class DesktopService : IDesktopService, IDisposable
{
private bool _disposed = false;
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
protected virtual void Dispose(bool disposing)
{
if (!_disposed)
{
if (disposing)
{
// 释放托管资源
}
// 释放非托管资源
_disposed = true;
}
}
}
3. 缓存策略
private readonly ConcurrentDictionary<string, WindowInfo> _windowCache = new();
public async Task<WindowInfo> GetWindowInfoAsync(string windowTitle)
{
return _windowCache.GetOrAdd(windowTitle, title =>
{
// 获取窗口信息的昂贵操作
return GetWindowInfoFromSystem(title);
});
}
扩展开发指南
1. 添加新的工具
要添加新的MCP工具,需要遵循以下步骤:
// 1. 在相应的服务接口中添加方法
public interface IDesktopService
{
Task<(string Response, int Status)> NewOperationAsync(string parameter);
}
// 2. 在服务实现中添加具体逻辑
public class DesktopService : IDesktopService
{
public async Task<(string Response, int Status)> NewOperationAsync(string parameter)
{
try
{
// 实现具体逻辑
return ("Operation completed", 0);
}
catch (Exception ex)
{
_logger.LogError(ex, "Error in NewOperation");
return ($"Error: {ex.Message}", 1);
}
}
}
// 3. 创建MCP工具类
[McpServerToolType]
public class NewOperationTool
{
private readonly IDesktopService _desktopService;
private readonly ILogger<NewOperationTool> _logger;
public NewOperationTool(IDesktopService desktopService, ILogger<NewOperationTool> logger)
{
_desktopService = desktopService;
_logger = logger;
}
[McpServerTool, Description("Description of the new operation")]
public async Task<string> ExecuteAsync(
[Description("Parameter description")] string parameter)
{
_logger.LogInformation("Executing new operation with parameter: {Parameter}", parameter);
var (response, status) = await _desktopService.NewOperationAsync(parameter);
var result = new
{
success = status == 0,
message = response,
parameter
};
return JsonSerializer.Serialize(result, new JsonSerializerOptions { WriteIndented = true });
}
}
2. 单元测试编写
public class NewOperationToolTest
{
private readonly IDesktopService _desktopService;
private readonly ILogger<NewOperationTool> _logger;
private readonly NewOperationTool _tool;
public NewOperationToolTest()
{
var services = new ServiceCollection();
services.AddLogging(builder => builder.AddConsole());
services.AddSingleton<IDesktopService, DesktopService>();
var serviceProvider = services.BuildServiceProvider();
_desktopService = serviceProvider.GetRequiredService<IDesktopService>();
_logger = serviceProvider.GetRequiredService<ILogger<NewOperationTool>>();
_tool = new NewOperationTool(_desktopService, _logger);
}
[Fact]
public async Task ExecuteAsync_ValidParameter_ReturnsSuccess()
{
// Arrange
var parameter = "test";
// Act
var result = await _tool.ExecuteAsync(parameter);
// Assert
Assert.NotNull(result);
var jsonResult = JsonSerializer.Deserialize<JsonElement>(result);
Assert.True(jsonResult.GetProperty("success").GetBoolean());
}
}
3. 配置管理
// appsettings.json
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information"
}
},
"WindowsMcp": {
"DefaultTimeout": 5000,
"MaxRetries": 3,
"EnableCaching": true
}
}
// 配置类
public class WindowsMcpOptions
{
public int DefaultTimeout { get; set; } = 5000;
public int MaxRetries { get; set; } = 3;
public bool EnableCaching { get; set; } = true;
}
// 在Program.cs中注册配置
builder.Services.Configure<WindowsMcpOptions>(builder.Configuration.GetSection("WindowsMcp"));
总结与展望
项目优势
技术先进性:基于.NET 10.0,采用最新的C#语言特性
架构合理性:清晰的分层架构,良好的可扩展性
功能完整性:涵盖桌面自动化的各个方面
标准化程度:遵循MCP协议,具有良好的互操作性
代码质量:完善的错误处理、日志记录和单元测试
技术创新点
MCP协议集成:率先将MCP协议应用于Windows桌面自动化
多模块设计:模块化的工具设计,便于按需使用
异步优化:全面的异步编程,提升性能表现
智能识别:结合OCR技术,实现智能UI元素识别
未来发展方向
AI集成增强:
集成更多AI模型,提升自动化的智能程度
支持自然语言指令转换为操作序列
增加机器学习能力,自动优化操作路径
跨平台支持:
扩展到Linux和macOS平台
统一的跨平台API接口
平台特定功能的适配层
云端集成:
支持云端部署和远程控制
分布式任务执行能力
云端AI服务集成
安全性增强:
操作权限细粒度控制
操作审计和合规性检查
数据加密和安全传输
性能优化:
GPU加速的图像处理
更高效的内存管理
并行处理能力提升
对开发者的价值
Windows MCP.Net不仅是一个功能强大的桌面自动化工具,更是一个优秀的.NET项目实践案例。通过学习这个项目,开发者可以:
掌握现代.NET应用程序的架构设计模式
学习Windows API的集成和使用技巧
了解MCP协议的实现和应用
获得桌面自动化开发的实战经验
社区贡献
项目采用开源模式,欢迎社区贡献:
功能扩展:添加新的工具和功能模块
性能优化:提升现有功能的性能表现
文档完善:改进项目文档和使用指南
测试覆盖:增加单元测试和集成测试
Bug修复:发现和修复项目中的问题
如果这篇文章对您有帮助,请点赞👍、收藏⭐、分享📤!您的支持是我们持续改进的动力!
项目地址:Windows-MCP.Net GitHub仓库https://github.com/AIDotNet/Windows-MCP.Net
相关链接:
本文基于Windows MCP.Net项目源码分析编写,旨在为.NET开发者提供桌面自动化开发的技术参考。如有问题或建议,欢迎在评论区交流讨论!