《Python学习之文件操作：从入门到精通》-EW帮帮网

坚持用清晰易懂的图解 + 代码语言，让每个知识点变得简单！
🚀呆头个人主页详情
🌱 呆头个人Gitee代码仓库
📌 呆头详细专栏系列
座右铭： “不患无位，患所以立。”

Python学习之文件操作：从入门到精通

摘要
目录

摘要

在Python编程的世界里，文件操作是一项基础且必不可少的技能。无论是读取配置文件、处理日志、分析数据集，还是保存程序运行结果，文件操作都扮演着至关重要的角色。本文将带你全面了解Python中的文件操作，从基础概念到高级应用，让你轻松掌握这一核心技能。

在计算机世界中，文件是存储在持久化存储介质（如硬盘、SSD）上的一段连续数据，它有特定的名称和格式。从本质上讲，文件就是一串二进制数据，但根据不同的编码方式和格式规范，我们可以将其解释为文本、图像、音频、视频等各种形式。

1.文件的基本属性

文件通常具有以下基本属性：

文件名：用于标识文件的名称，通常包含扩展名（如.txt、.py、.jpg等）
路径：文件在文件系统中的位置
大小：文件占用的存储空间
创建时间：文件被创建的时间
修改时间：文件最后一次被修改的时间
访问权限：规定谁可以读取、写入或执行该文件

在Python中，我们可以通过os模块获取这些文件属性：

import os

file_path = "example.txt"

# 获取文件大小（字节）
size = os.path.getsize(file_path)
print(f"文件大小: {size} 字节")

# 获取文件的创建和修改时间
import time
ctime = os.path.getctime(file_path)
mtime = os.path.getmtime(file_path)
print(f"创建时间: {time.ctime(ctime)}")
print(f"修改时间: {time.ctime(mtime)}")

# 检查文件权限
print(f"是否可读: {os.access(file_path, os.R_OK)}")
print(f"是否可写: {os.access(file_path, os.W_OK)}")
print(f"是否可执行: {os.access(file_path, os.X_OK)}")

2.文件类型

在Python中，我们主要处理两种类型的文件：

文本文件：包含可读字符的文件，如.txt、.csv、.json、.py等
二进制文件：包含二进制数据的文件，如图像、音频、视频、可执行文件等

图1：文件类型分类图 (flowchart) - 展示了Python中常见的文件类型及其分类

二、文件路径

在处理文件之前，我们需要了解文件路径的概念。文件路径是指向文件系统中特定文件或目录的地址。

1.绝对路径与相对路径

绝对路径：从文件系统的根目录开始的完整路径
- Windows: C:\Users\username\Documents\file.txt
- Unix/Linux/MacOS: /home/username/Documents/file.txt
相对路径：相对于当前工作目录的路径
- file.txt (当前目录下的文件)
- data/file.txt (当前目录下的data子目录中的文件)
- ../file.txt (上一级目录中的文件)

2.路径操作

Python的os.path和pathlib模块提供了强大的路径操作功能：

import os
from pathlib import Path

# 使用os.path
file_path = os.path.join("data", "input", "example.txt")
print(file_path)  # 输出: data/input/example.txt (在Unix系统上)

# 获取目录名和文件名
dirname = os.path.dirname(file_path)
filename = os.path.basename(file_path)
print(f"目录: {dirname}, 文件名: {filename}")

# 检查路径是否存在
exists = os.path.exists(file_path)
print(f"路径存在: {exists}")

# 使用pathlib (Python 3.4+)
path = Path("data") / "input" / "example.txt"
print(path)  # 输出: data/input/example.txt

# 获取目录名和文件名
print(f"目录: {path.parent}, 文件名: {path.name}")

# 检查路径是否存在
exists = path.exists()
print(f"路径存在: {exists}")

pathlib模块（Python 3.4+）提供了更现代、更面向对象的路径操作方式，推荐在新代码中使用。

三、文件操作基础：打开、关闭、读写

1.打开文件

在Python中，使用内置的open()函数打开文件：

file = open('example.txt', 'r')

open()函数接受两个主要参数：

文件路径
打开模式

1.1文件打开模式

模式	描述
`'r'`	只读模式（默认）
`'w'`	写入模式（覆盖已有内容）
`'a'`	追加模式（在文件末尾添加内容）
`'x'`	独占创建模式（如果文件已存在则失败）
`'b'`	二进制模式（与其他模式结合使用，如’rb’、‘wb’）
`'t'`	文本模式（默认，与其他模式结合使用，如’rt’、‘wt’）
`'+'`	读写模式（与其他模式结合使用，如’r+'、‘w+’）

2.关闭文件

打开文件后，使用完毕必须关闭它：

file = open('example.txt', 'r')
# 进行文件操作...
file.close()

关闭文件非常重要，它可以：

释放系统资源
确保数据被正确写入磁盘
允许其他程序访问该文件

3.读取文件

Python提供了多种读取文件内容的方法：

# 读取整个文件内容
file = open('example.txt', 'r')
content = file.read()
print(content)
file.close()

# 按行读取
file = open('example.txt', 'r')
lines = file.readlines()  # 返回一个包含所有行的列表
for line in lines:
    print(line.strip())  # strip()移除行尾的换行符
file.close()

# 逐行读取（更节省内存）
file = open('example.txt', 'r')
for line in file:
    print(line.strip())
file.close()

4.写入文件

写入文件同样简单：

# 写入模式（覆盖已有内容）
file = open('output.txt', 'w')
file.write('Hello, World!\n')
file.write('This is a new line.')
file.close()

# 追加模式
file = open('output.txt', 'a')
file.write('\nThis line is appended.')
file.close()

# 写入多行
lines = ['Line 1\n', 'Line 2\n', 'Line 3\n']
file = open('output.txt', 'w')
file.writelines(lines)
file.close()

图2：文件操作时序图 (sequenceDiagram) - 展示了Python程序与文件系统之间的交互过程

四、中文文件处理

处理包含中文字符的文件时，需要特别注意编码问题。

1.编码基础

编码是将字符转换为二进制数据的规则。常见的编码包括：

ASCII：只能表示英文字符和一些特殊符号
UTF-8：可变长度编码，兼容ASCII，能表示世界上几乎所有字符
GBK/GB2312/GB18030：中文编码标准，在中文Windows系统中常用
Latin-1 (ISO-8859-1)：西欧语言编码

2.指定编码读写文件

在打开文件时，可以通过encoding参数指定编码：

# 使用UTF-8编码读取中文文件
file = open('chinese.txt', 'r', encoding='utf-8')
content = file.read()
print(content)
file.close()

# 使用GBK编码写入中文文件
file = open('chinese_gbk.txt', 'w', encoding='gbk')
file.write('这是中文内容')
file.close()

3.编码错误处理

当遇到无法解码的字符时，可以通过errors参数指定处理方式：

# 遇到错误时替换为问号
file = open('mixed_encoding.txt', 'r', encoding='utf-8', errors='replace')
content = file.read()
print(content)
file.close()

# 遇到错误时忽略
file = open('mixed_encoding.txt', 'r', encoding='utf-8', errors='ignore')
content = file.read()
print(content)
file.close()

常用的错误处理选项：

'strict'：默认值，遇到错误时抛出异常
'replace'：用替代字符（通常是问号）替换无法解码的字符
'ignore'：忽略无法解码的字符
'surrogateescape'：使用代理转义编码，适用于处理未知编码的数据

4.检测文件编码

有时我们需要检测文件的编码，可以使用第三方库chardet：

import chardet

# 读取二进制数据
with open('unknown_encoding.txt', 'rb') as f:
    raw_data = f.read()

# 检测编码
result = chardet.detect(raw_data)
encoding = result['encoding']
confidence = result['confidence']

print(f"检测到的编码: {encoding}, 置信度: {confidence}")

# 使用检测到的编码读取文件
with open('unknown_encoding.txt', 'r', encoding=encoding) as f:
    content = f.read()
    print(content)

图3：中文文件编码使用比例 (pie) - 展示了不同中文编码方案的流行程度

五、使用上下文管理器（with语句）

前面的例子中，我们总是需要手动关闭文件，这容易忘记并导致资源泄漏。Python提供了with语句（上下文管理器）来自动处理资源的获取和释放：

# 使用with语句自动关闭文件
with open('example.txt', 'r') as file:
    content = file.read()
    print(content)
# 文件在这里自动关闭

# 同时打开多个文件
with open('input.txt', 'r') as infile, open('output.txt', 'w') as outfile:
    for line in infile:
        outfile.write(line.upper())
# 两个文件都会自动关闭

1.with语句的优势

自动资源管理：无需手动调用close()方法
异常安全：即使发生异常，也能确保文件被正确关闭
代码更简洁：减少了样板代码
可读性更好：明确了资源的使用范围

2.自定义上下文管理器

我们可以创建自己的上下文管理器，只需实现__enter__和__exit__方法：

class MyFileManager:
    def __init__(self, filename, mode):
        self.filename = filename
        self.mode = mode
        self.file = None
        
    def __enter__(self):
        self.file = open(self.filename, self.mode)
        return self.file
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.file:
            self.file.close()
        # 返回False表示不抑制异常，返回True表示抑制异常
        return False

# 使用自定义上下文管理器
with MyFileManager('example.txt', 'r') as file:
    content = file.read()
    print(content)

六、高级文件操作

1.文件指针与定位

文件对象维护一个指针，指示当前读写位置。可以使用seek()和tell()方法操作这个指针：

with open('example.txt', 'r') as file:
    # 获取当前位置
    position = file.tell()
    print(f"当前位置: {position}")
    
    # 读取前5个字符
    content = file.read(5)
    print(f"读取内容: {content}")
    
    # 获取新位置
    position = file.tell()
    print(f"新位置: {position}")
    
    # 移动到文件开头
    file.seek(0)
    print(f"回到开头后位置: {file.tell()}")
    
    # 移动到文件中间位置
    file.seek(10)
    print(f"移动后位置: {file.tell()}")
    print(f"读取内容: {file.read(5)}")

seek()方法接受两个参数：

offset：偏移量
whence：参考位置
- 0：文件开头（默认）
- 1：当前位置
- 2：文件末尾

2.临时文件

有时我们需要创建临时文件来存储中间数据：

import tempfile

# 创建临时文件
with tempfile.TemporaryFile() as temp:
    # 写入数据
    temp.write(b'Hello temporary file')
    
    # 回到文件开头
    temp.seek(0)
    
    # 读取数据
    data = temp.read()
    print(data)  # 输出: b'Hello temporary file'
# 文件在这里自动删除

# 创建命名临时文件
with tempfile.NamedTemporaryFile(delete=False) as temp:
    print(f"临时文件名: {temp.name}")
    temp.write(b'Named temporary file')
# 文件不会自动删除

# 创建临时目录
with tempfile.TemporaryDirectory() as temp_dir:
    print(f"临时目录: {temp_dir}")
    # 在临时目录中创建文件
    temp_file_path = os.path.join(temp_dir, 'temp.txt')
    with open(temp_file_path, 'w') as f:
        f.write('File in temporary directory')
# 目录及其内容在这里自动删除

3.文件压缩与解压

Python提供了多个模块处理压缩文件：

import gzip
import zipfile
import tarfile

# 使用gzip压缩
with open('example.txt', 'rb') as f_in:
    with gzip.open('example.txt.gz', 'wb') as f_out:
        f_out.write(f_in.read())

# 读取gzip压缩文件
with gzip.open('example.txt.gz', 'rb') as f:
    content = f.read()
    print(content)

# 创建ZIP文件
with zipfile.ZipFile('archive.zip', 'w') as zipf:
    zipf.write('example.txt')
    zipf.write('another_file.txt')

# 解压ZIP文件
with zipfile.ZipFile('archive.zip', 'r') as zipf:
    zipf.extractall('extracted_folder')

# 创建TAR文件
with tarfile.open('archive.tar.gz', 'w:gz') as tar:
    tar.add('example.txt')
    tar.add('another_file.txt')

# 解压TAR文件
with tarfile.open('archive.tar.gz', 'r:gz') as tar:
    tar.extractall('extracted_folder')

4.内存中的文件对象

io模块提供了在内存中创建文件对象的功能：

import io

# 创建文本内存文件
text_io = io.StringIO()
text_io.write('Hello, ')
text_io.write('World!')
text_io.seek(0)  # 回到开头
print(text_io.read())  # 输出: Hello, World!
text_io.close()

# 创建二进制内存文件
binary_io = io.BytesIO()
binary_io.write(b'Binary data')
binary_io.seek(0)
print(binary_io.read())  # 输出: b'Binary data'
binary_io.close()

七、实际应用案例

案例1：日志处理

def process_log_file(log_path):
    error_count = 0
    warning_count = 0
    info_count = 0
    
    with open(log_path, 'r', encoding='utf-8') as log_file:
        for line in log_file:
            if '[ERROR]' in line:
                error_count += 1
            elif '[WARNING]' in line:
                warning_count += 1
            elif '[INFO]' in line:
                info_count += 1
    
    print(f"日志统计: {error_count} 错误, {warning_count} 警告, {info_count} 信息")
    
    # 将统计结果写入新文件
    with open('log_summary.txt', 'w', encoding='utf-8') as summary:
        summary.write(f"日志文件: {log_path}\n")
        summary.write(f"错误数量: {error_count}\n")
        summary.write(f"警告数量: {warning_count}\n")
        summary.write(f"信息数量: {info_count}\n")
        summary.write(f"总行数: {error_count + warning_count + info_count}\n")

# 使用示例
process_log_file('application.log')

案例2：CSV数据处理

import csv

def process_csv_data(input_path, output_path):
    # 读取CSV数据
    with open(input_path, 'r', encoding='utf-8', newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        
        # 处理数据
        processed_data = []
        for row in reader:
            # 示例：将销售额转换为数字并计算税额
            row['sales'] = float(row['sales'])
            row['tax'] = row['sales'] * 0.1
            processed_data.append(row)
    
    # 写入处理后的数据
    with open(output_path, 'w', encoding='utf-8', newline='') as csvfile:
        fieldnames = processed_data[0].keys()
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        
        writer.writeheader()
        for row in processed_data:
            writer.writerow(row)

# 使用示例
process_csv_data('sales_data.csv', 'processed_sales.csv')

案例3：配置文件读写

import json
import configparser

# 使用JSON格式的配置文件
def load_json_config(config_path):
    with open(config_path, 'r', encoding='utf-8') as config_file:
        config = json.load(config_file)
    return config

def save_json_config(config, config_path):
    with open(config_path, 'w', encoding='utf-8') as config_file:
        json.dump(config, config_file, indent=4)

# 使用INI格式的配置文件
def load_ini_config(config_path):
    config = configparser.ConfigParser()
    config.read(config_path, encoding='utf-8')
    return config

def save_ini_config(config, config_path):
    with open(config_path, 'w', encoding='utf-8') as config_file:
        config.write(config_file)

# 使用示例
# JSON配置
json_config = load_json_config('app_config.json')
json_config['debug'] = True
save_json_config(json_config, 'app_config.json')

# INI配置
ini_config = load_ini_config('app_config.ini')
ini_config['DEFAULT']['debug'] = 'true'
save_ini_config(ini_config, 'app_config.ini')

八、最佳实践与注意事项

1.文件操作的最佳实践

始终使用with语句：确保文件正确关闭，即使发生异常
指定正确的编码：特别是处理非ASCII文本时
使用适当的打开模式：根据需要选择’r’、‘w’、'a’等
处理异常：文件操作可能引发多种异常，如FileNotFoundError、PermissionError等
分块处理大文件：避免一次性读取大文件到内存

def process_large_file(file_path):
    chunk_size = 1024 * 1024  # 1MB
    with open(file_path, 'r', encoding='utf-8') as file:
        while True:
            chunk = file.read(chunk_size)
            if not chunk:
                break
            # 处理这个数据块
            process_chunk(chunk)

2.常见异常及处理

文件操作中可能遇到的常见异常：

try:
    with open('file.txt', 'r') as file:
        content = file.read()
except FileNotFoundError:
    print("文件不存在")
except PermissionError:
    print("没有权限访问文件")
except IsADirectoryError:
    print("指定的路径是一个目录，不是文件")
except UnicodeDecodeError:
    print("文件编码错误，无法解码")
except Exception as e:
    print(f"发生其他错误: {e}")

“在Python中，文件是程序与外部世界交互的桥梁。掌握文件操作，就掌握了数据持久化的钥匙。”

总结

在这篇文章中，我们全面探讨了Python中的文件操作，从基础概念到高级应用。我们了解了什么是文件，如何处理文件路径，如何打开、关闭、读写文件，如何处理中文文件的编码问题，以及如何使用上下文管理器简化文件操作。

文件操作是Python编程中的基础技能，掌握这些知识将帮助你更有效地处理数据、配置和资源。无论是简单的文本处理还是复杂的数据分析，文件操作都是不可或缺的一环。

希望这篇文章能够帮助你更好地理解和应用Python的文件操作功能。在实际编程中，记得遵循最佳实践，合理处理异常，这将使你的代码更加健壮和可靠。

关键词标签

#Python #文件操作 #IO #编码 #上下文管理器

📢 如果你也喜欢这种"不呆头"的技术风格：
👁️ 【关注】看一个非典型程序员如何用野路子解决正经问题
👍 【点赞】给"不写八股文"的技术分享一点鼓励
🔖 【收藏】把这些"奇怪但有用"的代码技巧打包带走
💬 【评论】来聊聊——你遇到过最"呆头"的 Bug 是啥？
🗳️ 【投票】您的投票是支持我前行的动力
技术没有标准答案，让我们一起用最有趣的方式，写出最靠谱的代码！ 🎮💻

《Python学习之文件操作：从入门到精通》