我已经创建了一个文件,我需要替换最后的",";"所以它将是有效的JSON。问题是,我不知道如何做到这一点与sed
或甚至grep
/管道到别的东西。我真的被难住了。如有任何帮助,不胜感激。
test.json
[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"},
]
当然,使用grep
与-P
匹配我需要替换
grep -Pzo '"},n]' test.json
一个有效的解决方案是使用perl
读取文件的最后一个n
字节,然后确定多余的逗号在这些字节中的位置(例如使用regex),然后用空格字符替换该逗号:
perl -e '
$n = 16; # how many bytes to read
open $fh, "+<", $ARGV[0]; # open file in read & write mode
seek $fh, -$n, 2; # go to the end minus some bytes
$n = read $fh, $str, $n; # load the end of the file
if ( $str =~ /,s*]s*$/s ) { # get position of comma
seek $fh, -($n - $-[0]), 1; # go to position of comma
print $fh " "; # replace comma with space char
}
close $fh; # close file
' log.json
这个解决方案的优点是它只读取文件的几个字节来进行替换=>这使内存消耗几乎为0,并避免读取整个文件。
使用GNUsed
$ sed -Ez 's/([^]]*),/1/' test.json
[
{MANY OTHER RECORDS, MAKING FILE 3.5Gig (making sed fail because of memory, so newlines were added)},
{"ID":"57705e4a-158c-4d4e-9e07-94892acd98aa","USERNAME":"jmael","LOGINTIMESTAMP":"2021-11-30"},
{"ID":"b8b67609-50ed-4cdc-bbb4-622c7e6a8cd2","USERNAME":"henrydo","LOGINTIMESTAMP":"2021-12-15"},
{"ID":"a44973d0-0ec1-4252-b9e6-2fd7566c6f7d","USERNAME":"null","LOGINTIMESTAMP":"2021-10-31"}
]
删除GNU sed文件中的最后一个逗号:
sed -zE 's/,([^,]*)$/1/' file
输出到stdout:
<>之前({许多其他记录,使文件3.5Gig(使sed失败,因为内存不足,所以添加了换行符)},{" ID ":"57705 e4a - 158 c - 4 - d4e 9 e07 - 94892 acd98aa"、"用户名":"jmael"、"LOGINTIMESTAMP":"2021-11-30"},{" ID ": " b8b67609 - 50 - ed - 4 - cdc bbb4 - 622 c7e6a8cd2"、"用户名":"henrydo"、"LOGINTIMESTAMP":"2021-12-15"},{" ID ": " a44973d0 - 0 - ec1 - 4252 b9e6 - 2 fd7566c6f7d"、"用户名":"空","LOGINTIMESTAMP":"2021-10-31"}参见:man sed
和Stack Overflow正则表达式FAQ
所以下面是我使用的最终解决方案,不是最漂亮的,但它的工作没有内存问题,它做我需要的。谢谢塞勒斯的帮助。希望这篇文章能帮助到一些人。
find *.json | while read file; do
_FILESIZE=$(stat -c%s "$file")
if [[ $_FILESIZE -gt 2050000000 ]] ;then
echo "${file} is too large = $(stat -c%s "${file}") bytes. will be split to work on."
#get the name of the file without extension
_FILENAME=$( echo "${file}" | sed -r "s/(.+)(..+)/1/" )
#Split the large file with 3 extension, 1G size, no zero byte files, numeric suffix
split -a 3 -e -d -b1G ${file} ${_FILENAME}_
#Because pipe runs in new shell you must do it this way.
_FINAL_FILE_NAME_SPLIT=
while read file_split; do
_FINAL_FILE_NAME_SPLIT=${file_split}
done < <(find ${_FILENAME}_* | sort -z)
#The last file has the change we need to make @@ "null"}, n ] @@ to @@ "null"} n ] @@
sed -i -zE 's/},([^,]*)$/}1/' ${_FINAL_FILE_NAME_SPLIT}
#Rebuild the split files to replace the final file.
cat ${_FILENAME}_* > ${file}
#Remove the split files
rm -r *_00*
else
sed -i -zE 's/},([^,]*)$/}1/' ${file}
fi
#Check that the file is a valid json file.
cat ${file} | jq '. | length'
#view the change
tail -c 50 ${file}
echo " "
echo " "
done