我在源文件中有数据,如下所示(文件.txt(
N4*WALTER*WHITE~DMG*D8*19630625~N4*JESSI*PINKMAN*15108~
输入命令:(N4 = 段标识符,1 = 位置,ref.txt=参考文件(
N4*1*ref.txt
参考.txt有如下数据
BILL
LEONARDO
BALE
BRAD
PITT
我有下面的代码,它在 N4 的位置 x(输入(显示数据
identifier=N4
position=1
refile=ref.txt
awk -F[*~] -v id="$identifier" -v pos="$position"
'id { for (i=1; i<=NF; i++)
if ($i == id) {
if (i+pos <= NF)
print $(i+pos)
else
print "invalid position"
}
}
' file.txt
WALTER
JESSI
identifier=N4
position=2
refile=ref.txt
awk -F[*~] -v id="$identifier" -v pos="$position"
'id { for (i=1; i<=NF; i++)
if ($i == id) {
if (i+pos <= NF)
print $(i+pos)
else
print "invalid position"
}
}
' file.txt
WHITE
PINKMAN
现在我如何在上面的代码中集成 ref.txt 以更新文件中的 WALTER 和 JESSI.txt以及位于 ref.txt 文件中的随机文本。
我知道 shuf 命令从 ref.txt 提供随机数据,但不确定如何将其集成到上面的 awk 命令中。
shuf -n -1 ref.txt
预期输出:文件.txt(N4 段的位置 1 数据(使用来自 ref 的随机文本进行更新.txt
N4*BALE*WHITE~DMG*D8*19630625~N4*PITT*PINKMAN*15108~
好吧,我可以在 bash 中做到这一点,但循环while read
会很慢:
# recreate the input files
cat <<EOF >file.txt
N4*WALTER*WHITE~DMG*D8*19630625~N4*JESSI*PINKMAN*15108~
EOF
cat <<EOF >input
N4*1*ref.txt
EOF
cat <<EOF >ref.txt
BILL
LEONARDO
BALE
BRAD
PITT
EOF
# read the input
IFS='*' read -r segment position reference_file <input
{
# for each line
while IFS='*' read -r -d'~' -a data; do
# if the segment id is the segmend
if [ "${data[0]}" = "$segment" ]; then
# update the data
data[$position]=$(shuf -n1 "$reference_file")
fi
# and output the data
( IFS=*; printf "%s~" "${data[*]}"; )
done
# append a newline on the end
echo
} < file.txt
我想尝试用sed
来迭代段,但最终预处理sed
输入。下面是,有评论:
IFS='*' read -r segment position reference_file <input
# remove the nelwine from input
# and substitute each `~` with a newline
# so we can nicely process the file in sed
<file.txt tr -d 'n' | tr '~' 'n' >tmp.txt
# count of segments inside input we are interested in
segmentscnt=$(
grep "^${segment}*" tmp.txt | wc -l
)
# generate single line with random words from reference_file
# words separated by `*`
# the count of words should that many as many are there
# segments we are interested in the input file
randoms=$(
while shuf -n1 "$reference_file"; do :; done |
head -n"$segmentscnt" |
tr 'n' '*'
)
sed -n "
# the first line should be random words from referencefile
# load it to hold space
1{
h
d
}
# if this is our segment
/^$segment*/{
# append random words to our pattern space
G
# remember as many fields as the position we want to insert
# each one word more
# remember rest of line
# remember first word from randoms that were inserted from hold space
# then just substitute the words in proper order
s/^(([^*]**){$position})[^*]*([^n]*)n([^*]*)*.*/143/
# remove the first word from hold space
x
s/^([^*]*)*//
x
}
p
# the first input are the random words separated by *
# the words are on a single line
# than the input file
" - <<<"$randoms" tmp.txt |
# then replace newlines with `~`.
# also append a newline with echo
# as it will be missing
tr 'n' '~'; echo