我有一个名为experiments.txt
的文件,其中包含名为script.py
的python脚本的参数。
../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True
假设文件夹结构和文件如下所示,并注意data/
中的csv
文件与experiments.txt
中的文件NOT相同。
data
|___20211117_09-10-50CST_raw_fold_results-mlr.csv
|___20211117_09-11-35CST_raw_fold_results-rf.csv
src
|___script.py
|___experiments.txt
我想取代的第一个论点
(例如../data/20211015_08-09-50CST_raw_fold_results-mlr.csv
(
对于具有更新数据的experiments.txt
中的每一行,使得experiments.txt
(或者像experiments-2.txt
一样创建新文件(成为
..data20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
..data20211117_09-11-35CST_raw_fold_results-rf.csv --plot True
我知道我可以使用Python编写一个复杂的解决方案,但我的解决方案往好了说似乎是次优的,往坏了说设计得非常非常糟糕我如何在bash中执行所需的任务(因为它看起来很适合任务,但我不确定如何执行(
# This sample solution is written in `.ipynb` in the `src/` directory
import os
from pathlib import Path
cwd = os.getcwd() # src
replacement_fnames = [file for file in os.listdir(os.path.join(cwd, '..', 'data'))]
with open('experiments.txt', 'r') as fobj:
lines = [line.strip() for line in fobj.readlines()]
# The replacement lines for the file `experiments-2.txt` will be
# appended to this empty string
write_str = ''
for line in lines:
# A line in the file is of the form
# `path <SPACE> opts`, therefore splitting the line into a
# list delimited by a space `' '` allows access to the `path`
# by indexing 0
space_separated_line = line.split(' ')
cur_path = Path(space_separated_line[0])
cur_fname = Path(cur_path).name
# File names are separated by model name... in this case
# `mlr` and `rf`... by splitting the file name into a list
# delimited by `-`, then the last element of that list is the
# name of the model
# e.g., cur_fname = 20211015_08-09-50CST_raw_fold_results-mlr.csv
# cur_fname.split('-') --> ['20211015_08-09-50CST_raw_fold_results', 'mlr.csv']
cur_fname_model_name = cur_fname.split('-')[-1]
for replacement_fname in replacement_fnames:
# Extract model name from the replacement fname in the same
# fashion as done for cur_fname
replacement_fname_model_name = replacement_fname.split('-')[-1]
if replacement_fname_model_name == cur_fname_model_name:
space_separated_line[0] = os.path.join(Path(cur_path).parent, replacement_fname)
write_str += ' '.join(space_separated_line) + 'n'
print('Original:')
print('n'.join(lines))
print()
print('Replaced:')
print(write_str)
with open('experiments-2.txt', 'w') as fobj:
fobj.write(write_str)
## Output
# Original:
# ../data/20211015_08-09-50CST_raw_fold_results-mlr.csv --plot True
# ../data/20211115_08-15-35CST_raw_fold_results-rf.csv --plot True
# Replaced:
# ..data20211117_09-10-50CST_raw_fold_results-mlr.csv --plot True
# ..data20211117_09-11-35CST_raw_fold_results-rf.csv --plot True
假设文件名为20211015_08-09-50CST_raw_fold_results-mlr.csv
可以分解为变量前缀20211015_08-09-
和固定子字符串50CST_raw_fold_results-mlr.csv
,我们可以测试现有的data
目录中使用固定子字符串的文件
那么你能试试吗:
#!/bin/bash
declare -A map # associative array to map filenames
for f in ../data/*.csv; do # find the csv filenames in the ../data dir
f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$f")"
# remove the variable prefix (dirname and the date)
map[$f2]=$f # map the fixed substring of the filename to the fullpath
done
while read -r path opts; do # read line of experiments.txt and break into variables
f2="$(sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//' <<< "$path")"
# remove the variable prefix (dirname and the date)
f=${map[$f2]} # map filename via the fixed substring
if [[ -n $f ]]; then # if the variable $f is not empty, the file exists
echo "${f////\} $opts" # replace slashes with backslashes and write to "experiments-2.txt"
fi
done < experiments.txt > experiments-2.txt
- 在
for f in ../data/*.csv; do
循环中,假设f
被分配给../data/20211117_09-10-50CST_raw_fold_results-mlr.csv
,则sed
命令sed -E 's/.*[0-9]{8}_[0-9]{2}-[0-9]{2}-//'
删除前缀则CCD_ 19被分配给CCD_ map[$f2]=$f
将50CST_raw_fold_results-mlr.csv
索引的关联数组(也称为python中的字典(分配为其全部路径名CCD_ 23- 在下面的
while
循环中,我们使用固定的子字符串作为完整路径名的键来替换文件名
[备选方案]
如果我们将上面的bash
脚本转换为python
,它将看起来像:
#!/usr/bin/python
import glob
import re
map = {re.sub(r'.*d{8}_d{2}-d{2}-', '', f) : f for f in glob.glob('../data/*.csv')}
with open('experiments.txt', 'r') as f, open('experiments-2.txt', 'w') as fw:
for line in f:
path, opts = line.strip().split(' ', 1)
f2 = re.sub(r'.*d{8}_d{2}-d{2}-', '', path)
if f2 in map:
fw.write(' '.join([map[f2], opts]).replace('/', '\') + 'n')
JFYI