我有一个看起来像这样的规范文件"PC DELL OptiPlex 3010MT i3 3220/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD
Intel i3 3220 (Dual Core, 3.30GHz, 3MB, w/ HD2500 Graphics), 2GB (1x2GB) DDR3 PC3-1600MHz, 500GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"
所以我需要填充一个 html 表,然后将其放入.csv文件中以供上传到目前为止,我已经设法使用以下脚本"清理"了文件
for f in $(ls *.csv)
do
#fix newline from file
sed -i ':a;{N;s/NBD n/NBD,/};ba;s/"//g;' "$f"
#fix csv & and remove strings
sed -i 's/"PC/PC/g;s/Core,/Core/g;s/3,/3./g;s/3MB,//g;s/6MB,//g;s/6MB//g;s/w ///g;s/7,200/7200/g;s/site"/site/g;s/3MB//g;s/3,/3./g;s/w///g;s/3,/3./g;s/Cache,)/Cache/g;s/ Internal Dell Business Audio Speaker,//g;' "$f"
#don't know how to remove symbols with sed using awk
awk 'NR==FNR {a[$1]=$2;next} {for ( i in a) gsub(i,a[i])}1' template $f >temp.txt
mv temp.txt $f
done
然后使用此脚本填充 html 表
#!/bin/bash
for f in $(ls *.csv)
do
#split csv into 1line .csv files
split --additional-suffix=.csv -d -l 1 "$f" output/data_
#populate html file and create .html files
for file in $(ls output/*.csv)
do
IFS=","
while read f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
do
echo "<table cellspacing="0" cellpadding="0" border="0" width="100%"> "
echo "<tbody>"
echo "<tr> "
echo "<td class="specsTitle">Box</td> "
echo "<td class="specsDescript stripeBottom">$f2</td> "
echo "</tr> "
echo "<tr> "
<snip>
done <$file > output/temp.txt
mv output/temp.txt $file.html
done
done
#remove not important .csv
rm output/*.csv
所以在这一点上,我在输出文件夹中有几个.html文件
问题是:1.上面的代码有多糟糕?:-)2.如何将.html文件中的代码放入如下所示的.csv文件中
col1,col2,col3,HERE SHOULD BE THE HTML CODE FROM FILE1,col5,
col1,col2,col3,HERE SHOULD BE THE HTML CODE FROM FILE2,col5,
我正在考虑使用模板文件并以某种方式添加几个.html代码。有什么帮助吗?亲切问候
--编辑--这是原始输入原始输入:
"PC DELL OptiPlex 3010MT i3 3220/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD
Intel i3 3220 (Dual Core, 3.30GHz, 3MB, w/ HD2500 Graphics), 2GB (1x2GB) DDR3 PC3-1600MHz, 500GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"
"PC DELL OptiPlex 3010MT i5 3470/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD
Intel i5 3470 (Quad Core, 3.20GHz Turbo,6MB, w/ HD2500 Graphics), 4GB (1x4GB) DDR3, PC3-1600MHz, 750GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"
CSV 模板
price,product code, SPECS,other things,
300.00,CODE 2112334, ,OTHER STRINGS,
500.00,CODE 2222222, ,OTHER STRINGS,
期望.csv输出:
price,product code, SPECS,other things,
300.00,CODE 2112334, <table style="width:300px"><tr><td>Proccessor</td><td>Intel i3 3220 (Dual Core, 3.30GHz</td></tr><tr><td>Memmory</td><td> 2GB (1x2GB) DDR3 PC3-1600MHz</td>tr><td>Hard Disk</td><td>500GB HDD SATA III 7200rpm</td></tr><tr><td>VGA</td><td>HD2500 Graphics</td></tr><tr><td>Warranty</td><td>5Yr Basic Warranty NBD on site</td></tr><tr><td>Ohter features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Ohter features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></tr></table>,OTHER STRINGS,
500.00,CODE 2222222, <table style="width:300px"><tr><td>Proccessor</td><td>Intel i5 3470 (Quad Core 3.20GHz)</td></tr><tr><td>Memmory</td><td> 4GB (1x4GB) DDR3 PC3-1600MHz</td>tr><td>Hard Disk</td><td>750GB HDD SATA III 7200rpm</td></tr><tr><td>VGA</td><td>HD2500 Graphics</td></tr><tr><td>Warranty</td><td>5Yr Basic Warranty NBD on site</td></tr><tr><td>Ohter features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Ohter features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></tr></table>,OTHER STRINGS,
--编辑--
这是一个不使用任何临时文件的方法,通过使用 paste
实用程序组合输入文件和模板文件的行并在 awk
中处理结果来创建任何临时文件。我用sed
对数据执行最小的清理,刚好足以使其工作,但您当然可以替换完整的清理命令。
#!/bin/bash
# Dummy header for the input file to match the header of the template file.
# Used only to make sure files have same number of lines.
heading="model,processor,speed,cache,graphics,memory,hd,optical,dos,warranty"
# Create input for awk that has the lines of input and template side by side.
paste -d ',' - template.csv <<< "$(echo $heading; sed -e 'N;s/ *n/,/g' -e 's/"//g' input.csv)" | awk -F ',' '
## awk portion
# First line: print just template.csv header (not dummy header).
NR == 1 { for (i=11; i<NF; ++i) printf("%s,", $i); print(""); next }
# Print each line, starting with the fields from template.csv,
# then the HTML populated with values form input.csv,
# and ending with the last fields form template.csv.
{ print($11","$12","" <table style="width:300px"><tr><td>Processor</td><td>" $2 $3 ")</td></tr><tr><td>Memory</td><td>" $6 "</td></tr><tr><td>Hard Disk</td><td>" $7 "</td></tr><tr><td>VGA</td><td>" $5 "</td></tr><tr><td>Warranty</td><td>" $10 "</td></tr><tr><td>Other features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Other features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></table>," $14 ","); }'
这实际上已经是一行:
paste -d ',' - template.csv <<< "$(echo "model,processor,speed,cache,graphics,memory,hd,optical,dos,warranty"; sed -e 'N;s/ *n/,/g' -e 's/"//g' input.csv)" | awk -F ',' 'NR == 1 { for (i=11; i<NF; ++i) printf("%s,", $i); print(""); next } { print($11","$12","" <table style="width:300px"><tr><td>Processor</td><td>" $2 $3 ")</td></tr><tr><td>Memory</td><td>" $6 "</td></tr><tr><td>Hard Disk</td><td>" $7 "</td></tr><tr><td>VGA</td><td>" $5 "</td></tr><tr><td>Warranty</td><td>" $10 "</td></tr><tr><td>Other features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Other features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></table>," $14 ","); }'
此解决方案假定您已将输入合并到单个文件中,并对其进行清理以匹配给定的格式。此外,它还假定聚合输入文件具有与模板文件相同的数据行数。您还必须确保输入文件中没有额外/缺失的逗号,因为此awk
脚本将逗号用作字段分隔符。