Ubuntu 16.04
Bash 4.3.3
如果第 6 列中不存在逗号,我还需要一种方法在逗号后添加一个空格。我不得不对上面的行进行注释,因为它在 csv 文件中的所有逗号之后放置了一个空格。
错误:"This is 6th column,Hey guys,Red White & Blue,I know it,Right On"
完美:"This is 6th column, Hey guys, Red White & Blue, I know it, Right On"
我几乎可以看到awk
打印出第 6 列,然后让sed
做剩下的事情:
awk '{ print $6 }' "$feed " | sed -i 's/|/,/g; s/,/, /g; s/,s+/, /g'
这是我到目前为止所拥有的:
for feed in *; do
sed -r -i 's/([^,]{0,10})[^,]*/1/5' "$feed"
sed -i '
s/<b>//g; s/*//g;
s/([0-9])""/1inch/g;
# s/|/,/g; s/,/, /g; s/,s+/, /g;
s/"one","drive"/"onetext","drive"/;
s/"comments"/"description"/;
s/"features"/"optiontext"/;
' "$feed"
done
s/|/,/g; s/,/, /g; s/,s+/, /g;
有效,但是是全局的,不在列内。
听起来你所需要的只是这个(使用 GNU awk for FPAT(:
awk 'BEGIN{FPAT="[^,]*|"[^"]+""; OFS=","} {gsub(/, ?/,", ",$6)} 1'
例如:
$ cat file
1,2,3,4,5,"This is 6th column,Hey guys,Red White & Blue,I know it,Right On",7,8
$ awk 'BEGIN{FPAT="[^,]*|"[^"]+""; OFS=","} {gsub(/, ?/,", ",$6)} 1' file
1,2,3,4,5,"This is 6th column, Hey guys, Red White & Blue, I know it, Right On",7,8
实际上看起来你的整个 shell 脚本,包括对 GNU sed 的多次调用,只需一次调用 GNU awk 就可以更有效地完成,而不需要周围的 shell 循环,例如(未经测试(:
awk -i inplace '
BEGIN{FPAT="[^,]*|"[^"]+""; OFS=","}
{
$0 = gensub(/([^,]{0,10})[^,]*/,"\1",5)
$0 = gensub(/([0-9])""/,"\1inch","g")
sub(/"one","drive"/,""onetext","drive"")
sub(/"comments"/,""description"")
sub(/"features"/,""optiontext"")
gsub(/, ?/,", ",$6)
}
' *
这可能对你有用(GNU sed(:
sed -r 's/[^,"]*("[^"]*")*/n&n/6;h;s/, ?/, /g;G;s/.*n(.*)n.*n(.*)n.*n/21/' file
用换行符包围第 6 个字段。复制该行。将所有逗号后跟一个可能的空格替换为一个逗号,后跟一个空格。附加原始行并使用模式匹配替换修改后的字段,丢弃改进行的其余部分。