Unix脚本，将一个大文件拆分为多个文件，每个文件中有两对标记，并对文件名进行命名约定

我正在编写一个Shell脚本，将一个大文件拆分为多个文件，每个文件中有两对标记，这些小文件名必须遵循命名约定

示例：-

大文件名：abcdef123.xml内容：

<parent>
<child>
<code1><code1>
<text1><text1>
</child>
<child1>
<code2><code2>
<text2><text2>
</child1>
<child>
<code3><code3>
<text3><text3>
</child>
<child1>
<code4><code4>
<text4><text4>
</child1>
<child>
<code5><code5>
<text5><text5>
</child>
<child1>
<code6><code6>
<text6><text6>
</child1>
<child>
<code7><code7>
<text7><text7>
</child>
<child1>
<code8><code8>
<text8><text8>
</child1>
</parent>

Unix shell脚本应该将这个大文件拆分为具有以下条件的多个文件(文件中各有两对<child>和<child1>(，并根据文件名约定接受用户输入(所有文件名中以毫秒为单位的日期可以保持不变，但变量'j'应该更改(：-

标准：-

将头'<parent>'和尾'</parent>'添加到每个文件中
文件名的格式应为'UserinputstringMMDDYYYYHHMMSSMIL_n increment.xml'(其中MIL为毫秒，"n增量"类似于001、002、003……(
任何两个文件都不应具有相同的文件名

大文件拆分示例：-

文件1=stack_10120120134434789_001.xml

内容：-

<parent>
<child>
<code1><code1>
<text1><text1>
</child>
<child1>
<code2><code2>
<text2><text2>
</child1>
<child>
<code3><code3>
<text3><text3>
</child>
<child1>
<code4><code4>
<text4><text4>
</child1>
</parent>

文件2=stack_10120120134434791_002.xml

内容：-

<parent>
<child>
<code5><code5>
<text5><text5>
</child>
<child1>
<code6><code6>
<text6><text6>
</child1>
<child>
<code7><code7>
<text7><text7>
</child>
<child1>
<code8><code8>
<text8><text8>
</child1>
</parent>

我正在尝试的脚本：-

csplit -ksf part. src.xml
n=000
E.g. Enter beginning of file name :
User entered-> stack
read userinput
j=n+1
$date= date +%m%d%Y%H%M%S%3N
filename=$userinput$date_$j.xml```

sample.xml：

<?xml version="1.0"?>
<parent>
<child>
<code1>aa</code1>
<text1>aat</text1>
</child>
<child1>
<code2>aa2</code2>
<text2>aat2</text2>
</child1>
<child>
<code3>bb</code3>
<text3>bbt</text3>
</child>
<child1>
<code4>bb2</code4>
<text4>bbt2</text4>
</child1>
<child>
<code5>cc</code5>
<text5>cct</text5>
</child>
<child1>
<code6>cc2</code6>
<text6>cct2</text6>
</child1>
<child>
<code7>dd</code7>
<text7>ddt</text7>
</child>
<child1>
<code8>dd2</code8>
<text8>ddt2</text8>
</child1>
</parent>

parser.sh

#!/bin/bash
PARENT='parent'
CHILD1='child'
CHILD2='child1'
INPUT_FILE='sample.xml'
NUM_OF_CHILDS=$(cat $INPUT_FILE | grep "<$CHILD1>" | wc -l)
FILE_NUM=1
for i in $(seq 1 2 $NUM_OF_CHILDS); do
echo "-----------------------------------------------------"
echo "FILENAME_"$(date +%s%N)"_$FILE_NUM.xml"
echo "-----------------------------------------------------"
echo '<?xml version="1.0"?>'
echo '<'$PARENT'>'
xmllint --xpath "(//parent/$CHILD1[$i])" $INPUT_FILE
xmllint --xpath "(//parent/$CHILD2[$i])" $INPUT_FILE
xmllint --xpath "(//parent/$CHILD1[$(( i + 1 ))])" $INPUT_FILE
xmllint --xpath "(//parent/$CHILD2[$(( i + 1 ))])" $INPUT_FILE
echo '</'$PARENT'>'
FILE_NUM=$(( FILE_NUM + 1 ))
done

输出：

-----------------------------------------------------
FILENAME_1603633647540475038_1.xml
-----------------------------------------------------

<?xml version="1.0"?>
<parent>
<child>
<code1>aa</code1>
<text1>aat</text1>
</child>
<child1>
<code2>aa2</code2>
<text2>aat2</text2>
</child1>
<child>
<code3>bb</code3>
<text3>bbt</text3>
</child>
<child1>
<code4>bb2</code4>
<text4>bbt2</text4>
</child1>
</parent>

-----------------------------------------------------
FILENAME_1603633647547254647_2.xml
-----------------------------------------------------

<?xml version="1.0"?>
<parent>
<child>
<code5>cc</code5>
<text5>cct</text5>
</child>
<child1>
<code6>cc2</code6>
<text6>cct2</text6>
</child1>
<child>
<code7>dd</code7>
<text7>ddt</text7>
</child>
<child1>
<code8>dd2</code8>
<text8>ddt2</text8>
</child1>
</parent>

看起来您计划将输出文件用作xml，所以缩进和换行并不重要。在其他情况下，尝试使用xmllint参数
文件命名约定等其他细节很容易更改，因此由您决定。

相关内容

最新更新

热门标签：