find_filegroups: 查找仅通过文件扩展名不同的文件组
一个函数,用于查找在不同目录中属于同一组的文件(即,仅因文件扩展名不同而不同),并将它们收集到一个Python字典中,以便进行进一步的处理任务。
> 从 mlxtend.file_io 导入 find_filegroups
概述
该函数根据文件名查找相关联的文件。这对于解析存储在不同子目录中的文件集合非常有用,例如:
输入目录/
task01.txt
task02.txt
...
日志目录/
task01.log
task02.log
...
输出目录/
task01.dat
task02.dat
...
参考文献
- -
示例 1 - 在字典中分组相关文件
给定以下目录和文件结构
dir_1/
file_1.log
file_2.log
file_3.log
dir_2/
file_1.csv
file_2.csv
file_3.csv
dir_3/
file_1.txt
file_2.txt
file_3.txt
我们可以使用 find_filegroups
将相关文件作为字典的项进行分组,如下所示:
from mlxtend.file_io import find_filegroups
find_filegroups(paths=['./data_find_filegroups/dir_1',
'./data_find_filegroups/dir_2',
'./data_find_filegroups/dir_3'],
substring='file_')
{'file_1': ['./data_find_filegroups/dir_1/file_1.log',
'./data_find_filegroups/dir_2/file_1.csv',
'./data_find_filegroups/dir_3/file_1.txt'],
'file_2': ['./data_find_filegroups/dir_1/file_2.log',
'./data_find_filegroups/dir_2/file_2.csv',
'./data_find_filegroups/dir_3/file_2.txt'],
'file_3': ['./data_find_filegroups/dir_1/file_3.log',
'./data_find_filegroups/dir_2/file_3.csv',
'./data_find_filegroups/dir_3/file_3.txt']}
API
find_filegroups(paths, substring='', extensions=None, validity_check=True, ignore_invisible=True, rstrip='', ignore_substring=None)
Find and collect files from different directories in a python dictionary.
Parameters
-
paths
:list
Paths of the directories to be searched. Dictionary keys are build from the first directory.
-
substring
:str
(default: '')Substring that all files have to contain to be considered.
-
extensions
:list
(default: None)None
orlist
of allowed file extensions for each path. If provided, the number of extensions must match the number ofpaths
. -
validity_check
:bool
(default: None)If
True
, checks if all dictionary values have the same number of file paths. Prints a warning and returns an empty dictionary if the validity check failed. -
ignore_invisible
:bool
(default: True)If
True
, ignores invisible files (i.e., files starting with a period). -
rstrip
:str
(default: '')If provided, strips characters from right side of the file base names after splitting the extension. Useful to trim different filenames to a common stem. E.g,. "abc_d.txt" and "abc_d_.csv" would share the stem "abc_d" if rstrip is set to "_".
-
ignore_substring
:str
(default: None)Ignores files that contain the specified substring.
Returns
-
groups
:dict
Dictionary of files paths. Keys are the file names found in the first directory listed in
paths
(without file extension).
Examples
For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/file_io/find_filegroups/