linux sed grep -P用换行符替换字符串并考虑下一行



我已经创建了一个文件,我需要替换最后的",";"所以它将是有效的JSON。问题是,我不知道如何做到这一点与sed或甚至grep/管道到别的东西。我真的被难住了。如有任何帮助,不胜感激。

test.json

[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"},
]

当然,使用grep-P匹配我需要替换

grep -Pzo '"},n]' test.json

一个有效的解决方案是使用perl读取文件的最后一个n字节,然后确定多余的逗号在这些字节中的位置(例如使用regex),然后用空格字符替换该逗号:

perl -e '
$n = 16;                         # how many bytes to read
open $fh, "+<", $ARGV[0];        # open file in read & write mode
seek $fh, -$n, 2;                # go to the end minus some bytes
$n = read $fh, $str, $n;         # load the end of the file
if ( $str =~ /,s*]s*$/s ) {    # get position of comma
seek $fh, -($n - $-[0]), 1;  # go to position of comma
print $fh " ";               # replace comma with space char
}
close $fh;                       # close file
' log.json

这个解决方案的优点是它只读取文件的几个字节来进行替换=>这使内存消耗几乎为0,并避免读取整个文件。

使用GNUsed

$ sed -Ez 's/([^]]*),/1/' test.json
[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"}
]

删除GNU sed文件中的最后一个逗号:

sed -zE 's/,([^,]*)$/1/' file

输出到stdout:

<>之前({许多其他记录,使文件3.5Gig(使sed失败,因为内存不足,所以添加了换行符)},{" ID ":"57705 e4a - 158 c - 4 - d4e 9 e07 - 94892 acd98aa"、"用户名":"jmael"、"LOGINTIMESTAMP":"2021-11-30"},{" ID ": " b8b67609 - 50 - ed - 4 - cdc bbb4 - 622 c7e6a8cd2"、"用户名":"henrydo"、"LOGINTIMESTAMP":"2021-12-15"},{" ID ": " a44973d0 - 0 - ec1 - 4252 b9e6 - 2 fd7566c6f7d"、"用户名":"空","LOGINTIMESTAMP":"2021-10-31"}参见:man sed和Stack Overflow正则表达式FAQ

所以下面是我使用的最终解决方案,不是最漂亮的,但它的工作没有内存问题,它做我需要的。谢谢塞勒斯的帮助。希望这篇文章能帮助到一些人。

find *.json | while read file; do
_FILESIZE=$(stat -c%s "$file")
if [[ $_FILESIZE -gt 2050000000 ]] ;then
echo "${file} is too large = $(stat -c%s "${file}") bytes. will be split to work on."
#get the name of the file without extension
_FILENAME=$( echo "${file}" | sed -r "s/(.+)(..+)/1/" )
#Split the large file with 3 extension, 1G size, no zero byte files, numeric suffix
split -a 3 -e -d -b1G ${file} ${_FILENAME}_
#Because pipe runs in new shell you must do it this way.
_FINAL_FILE_NAME_SPLIT=
while read file_split; do
_FINAL_FILE_NAME_SPLIT=${file_split}
done < <(find ${_FILENAME}_* | sort -z)
#The last file has the change we need to make @@ "null"}, n ] @@ to @@ "null"} n ] @@
sed -i -zE 's/},([^,]*)$/}1/' ${_FINAL_FILE_NAME_SPLIT}
#Rebuild the split files to replace the final file.
cat ${_FILENAME}_* > ${file}
#Remove the split files
rm -r *_00*
else
sed -i -zE 's/},([^,]*)$/}1/' ${file}
fi
#Check that the file is a valid json file.
cat ${file} | jq '. | length'
#view the change
tail -c 50 ${file}
echo " "
echo " "
done

相关内容

  • 没有找到相关文章

最新更新