Python文件路径工具类

Hort@IT 字数: 7851 阅读耗时: 19 分钟 2025/11/12 2025/11/12 博客独享热度: 4 评论: 0

本文最后更新于 2025-11-12，文章内容可能已经过时。

使用 Python 实现的文件路径工具类，支持灵活指定文件后缀（如 .shp 或 shp），并提供递归/非递归两种遍历模式

代码

import os
import pathlib
from typing import List

class FilePathUtils:
    @staticmethod
    def get_file_paths(directory: str, suffix: str, recursive: bool = True) -> List[str]:
        """
        获取指定目录下所有指定后缀文件的绝对路径
        
        :param directory: 目录路径（支持相对路径或绝对路径）
        :param suffix: 文件后缀（支持".shp"或"shp"，大小写不敏感）
        :param recursive: 是否递归遍历子目录（默认True）
        :return: 包含所有匹配文件绝对路径的列表，若无文件则返回空列表
        """
        # 标准化后缀：移除开头的点，转为小写
        normalized_suffix = suffix.strip().lower()
        if normalized_suffix.startswith('.'):
            normalized_suffix = normalized_suffix[1:]
        
        # 验证目录是否存在
        if not os.path.exists(directory) or not os.path.isdir(directory):
            return []
        
        # 递归遍历
        if recursive:
            return FilePathUtils._recursive_scan(directory, normalized_suffix)
        # 仅当前目录
        else:
            return FilePathUtils._current_directory_scan(directory, normalized_suffix)
    
    @staticmethod
    def _recursive_scan(directory: str, suffix: str) -> List[str]:
        """递归遍历目录树"""
        results = []
        for root, _, files in os.walk(directory):
            for file in files:
                if file.lower().endswith(f".{suffix}"):
                    results.append(os.path.abspath(os.path.join(root, file)))
        return results
    
    @staticmethod
    def _current_directory_scan(directory: str, suffix: str) -> List[str]:
        """仅扫描当前目录（不递归）"""
        results = []
        for file in os.listdir(directory):
            file_path = os.path.join(directory, file)
            if os.path.isfile(file_path) and file.lower().endswith(f".{suffix}"):
                results.append(os.path.abspath(file_path))
        return results

# 示例用法
if __name__ == "__main__":
    # 递归遍历示例（支持.shp和shp）
    shp_files = FilePathUtils.get_file_paths("/data/gis", "shp")
    print("递归找到的 .shp 文件:")
    for path in shp_files:
        print(path)
    
    # 非递归示例
    current_shp_files = FilePathUtils.get_file_paths("/data/gis", ".SHP", recursive=False)
    print("\n当前目录的 .SHP 文件:")
    for path in current_shp_files:
        print(path)
    
    # 支持其他后缀
    txt_files = FilePathUtils.get_file_paths("/data/docs", "txt", recursive=True)
    print("\n递归找到的 .txt 文件:")
    for path in txt_files:
        print(path)

关键特性说明

后缀处理优化：
```
normalized_suffix = suffix.strip().lower()
if normalized_suffix.startswith('.'):
    normalized_suffix = normalized_suffix[1:]
```
- 支持传入 ".shp" 或 "shp"
- 自动标准化为小写（shp）
- 大小写不敏感匹配（SHP/ShP/shp 都匹配）
遍历模式：
- recursive=True（默认）：递归遍历所有子目录
- recursive=False：仅扫描当前目录
边界情况处理：
- 目录不存在/非目录：返回空列表
- 空后缀：返回空列表（通过标准化后缀处理）
- 文件系统错误：自动忽略（os.walk 会跳过无效目录）
路径处理：
- 使用 os.path.abspath 确保返回绝对路径
- 文件名大小写不敏感匹配（file.lower().endswith(...)）

使用示例

# 1. 默认递归遍历（支持.shp和shp）
all_shp = FilePathUtils.get_file_paths("/data/gis", "shp")

# 2. 显式递归遍历
all_shp_recursive = FilePathUtils.get_file_paths("/data/gis", ".SHP", recursive=True)

# 3. 仅当前目录（不递归）
current_shp = FilePathUtils.get_file_paths("/data/gis", "SHP", recursive=False)

# 4. 支持其他后缀
geojson_files = FilePathUtils.get_file_paths("/data/maps", "geojson", recursive=True)

重要说明

后缀匹配逻辑：
```
file.lower().endswith(f".{normalized_suffix}")
```
- 匹配 file.shp（不会匹配 file.name.shp 中的 name.shp）
- 例如："file.name.shp" 会被正确匹配（因为 file.name.shp 以 .shp 结尾）

目录验证：

if not os.path.exists(directory) or not os.path.isdir(directory):
    return []

自动处理相对路径
确保输入是有效目录

性能优化：
- 递归遍历使用 os.walk（Python 标准库高效实现）
- 非递归使用 os.listdir（避免额外的目录遍历开销）

测试用例验证

测试场景	输入	预期结果
递归遍历	get_file_paths("/data", "shp")	找到所有子目录中的 .shp 文件
大小写混合	get_file_paths("/data", ".SHP")	匹配 file.SHP 和 file.shp
非递归	get_file_paths("/data", "shp", False)	仅当前目录的 .shp 文件
空后缀	get_file_paths("/data", "")	返回空列表
无效目录	get_file_paths("/invalid", "shp")	返回空列表

优势总结

Pythonic 风格：
- 使用标准库（os, pathlib）
- 符合 Python 的函数命名和参数设计
- 无额外依赖
灵活性：
- 后缀格式完全兼容（.sh/shp/SHP）
- 递归/非递归模式通过参数控制
健壮性：
- 自动处理路径标准化
- 严格检查目录有效性
- 正确处理文件名中的点（如 file.name.shp）
易用性：
- 简洁的 API 设计
- 明确的参数说明
- 详细的注释

使用注意事项

路径分隔符：
- Python 会自动处理不同操作系统的路径分隔符
- 无需担心 Windows (\) 和 Linux (/) 的差异
大文件处理：
- 递归遍历大量文件时可能占用较多内存
- 如需处理超大目录，可考虑使用生成器（yield）实现流式处理
文件系统权限：
- 无权限访问的目录会被自动跳过（os.walk 会忽略权限错误）

此工具类已在 Python 3.6+ 测试通过，可直接集成到Python 项目中。使用时只需导入类并调用相应方法即可。