使用更新的文件路径重命名文件内部的多个文件路径



我有一个名为experiments.txt的文件,其中包含名为script.py的python脚本的参数。

../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True

假设文件夹结构和文件如下所示,并注意data/中的csv文件与experiments.txt中的文件NOT相同。

data
|___20211117_09-10-50CST_raw_fold_results-mlr.csv
|___20211117_09-11-35CST_raw_fold_results-rf.csv
src
|___script.py
|___experiments.txt

我想取代的第一个论点

(例如../data/20211015_08-09-50CST_raw_fold_results-mlr.csv(

对于具有更新数据的experiments.txt中的每一行,使得experiments.txt(或者像experiments-2.txt一样创建新文件(成为

..data20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
..data20211117_09-11-35CST_raw_fold_results-rf.csv --plot True

我知道我可以使用Python编写一个复杂的解决方案,但我的解决方案往好了说似乎是次优的,往坏了说设计得非常非常糟糕我如何在bash中执行所需的任务(因为它看起来很适合任务,但我不确定如何执行(

# This sample solution is written in `.ipynb` in the `src/` directory
import os
from pathlib import Path
cwd = os.getcwd()  # src
replacement_fnames = [file for file in os.listdir(os.path.join(cwd, '..', 'data'))]
with open('experiments.txt', 'r') as fobj:
lines = [line.strip() for line in fobj.readlines()]
# The replacement lines for the file `experiments-2.txt` will be
# appended to this empty string
write_str = ''
for line in lines:
# A line in the file is of the form
# `path <SPACE> opts`, therefore splitting the line into a
# list delimited by a space `' '` allows access to the `path`
# by indexing 0
space_separated_line = line.split(' ')
cur_path = Path(space_separated_line[0])
cur_fname = Path(cur_path).name
# File names are separated by model name... in this case
# `mlr` and `rf`... by splitting the file name into a list
# delimited by `-`, then the last element of that list is the
# name of the model
# e.g., cur_fname = 20211015_08-09-50CST_raw_fold_results-mlr.csv
# cur_fname.split('-') --> ['20211015_08-09-50CST_raw_fold_results', 'mlr.csv']
cur_fname_model_name = cur_fname.split('-')[-1] 
for replacement_fname in replacement_fnames:
# Extract model name from the replacement fname in the same
# fashion as done for cur_fname
replacement_fname_model_name = replacement_fname.split('-')[-1]
if replacement_fname_model_name == cur_fname_model_name:
space_separated_line[0] = os.path.join(Path(cur_path).parent, replacement_fname)

write_str += ' '.join(space_separated_line) + 'n'
print('Original:')
print('n'.join(lines))
print()
print('Replaced:')
print(write_str)
with open('experiments-2.txt', 'w') as fobj:
fobj.write(write_str)
## Output
# Original:
# ../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
# ../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True
# Replaced:
# ..data20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
# ..data20211117_09-11-35CST_raw_fold_results-rf.csv --plot True

假设文件名为20211015_08-09-50CST_raw_fold_results-mlr.csv可以分解为变量前缀20211015_08-09-和固定子字符串50CST_raw_fold_results-mlr.csv,我们可以测试现有的data目录中使用固定子字符串的文件
那么你能试试吗:

#!/bin/bash
declare -A map                          # associative array to map filenames
for f in ../data/*.csv; do              # find the csv filenames in the ../data dir
f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$f")"
# remove the variable prefix (dirname and the date)
map[$f2]=$f                         # map the fixed substring of the filename to the fullpath
done
while read -r path opts; do             # read line of experiments.txt and break into variables
f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$path")"
# remove the variable prefix (dirname and the date)
f=${map[$f2]}                       # map filename via the fixed substring
if [[ -n $f ]]; then                # if the variable $f is not empty, the file exists
echo "${f////\} $opts"        # replace slashes with backslashes and write to "experiments-2.txt"
fi
done < experiments.txt > experiments-2.txt
  • for f in ../data/*.csv; do循环中,假设f被分配给../data/20211117_09-10-50CST_raw_fold_results-mlr.csv,则sed命令sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//'删除前缀则CCD_ 19被分配给CCD_
  • map[$f2]=$f50CST_raw_fold_results-mlr.csv索引的关联数组(也称为python中的字典(分配为其全部路径名CCD_ 23
  • 在下面的while循环中,我们使用固定的子字符串作为完整路径名的键来替换文件名

[备选方案]
如果我们将上面的bash脚本转换为python,它将看起来像:

#!/usr/bin/python
import glob
import re
map = {re.sub(r'.*d{8}_d{2}-d{2}-', '', f) : f for f in glob.glob('../data/*.csv')}
with open('experiments.txt', 'r') as f, open('experiments-2.txt', 'w') as fw:
for line in f:
path, opts = line.strip().split(' ', 1)
f2 = re.sub(r'.*d{8}_d{2}-d{2}-', '', path)
if f2 in map:
fw.write(' '.join([map[f2], opts]).replace('/', '\') + 'n')

JFYI

最新更新