删除 txt 中的特定字符

>想象一下我有下一个txt格式：

'20201': "a" ,
'20202': "e" ,
'20203': "i" ,
'20204': "o" ,
'20205': "u" ,
'20207': "ae" ,
'20209': "ai" ,
'20210': "ao"

我想在四位数字是 0 时擦除它。所以预期的输出是：

'2021': "a" ,
'2022': "e" ,
'2023': "i" ,
'2024': "o" ,
'2025': "u" ,
'2027': "ae" ,
'2029': "ai" ,
'20210': "ao"

我在考虑这个：

awk -i inplace  ' { for ( i = 1; i <= NF; ++i ) {
if ( $i == '0')
r = 1

}
}}
1 ' example.txt ```

有了awk，您能否尝试使用GNUawk中显示的示例进行以下编写和测试。

如果没有字段分隔符，请尝试：

awk 'substr($0,5,1)==0{ $0=substr($0,1,4) substr($0,6) } 1'  Input_file

或者使用字段分隔符，请尝试以下操作：要专门处理此处的第一个字段。

awk '
BEGIN{
FS=OFS=":"
}
substr($1,5,1)==0{
$1=substr($1,1,4) substr($1,6)
}
1
'  Input_file

要将输出保存到Input_file本身，请在对上述命令的输出感到满意后附加> temp && mv temp Input_file。

说明：为上述添加详细说明。

awk '                             ##Starting awk program from here.
BEGIN{                            ##Starting BEGIN section of this program from here.
FS=OFS=":"                      ##Setting FS and OFS as colon here.
}
substr($1,5,1)==0{                ##Checking condition if 5th character is 0 then do following.
$1=substr($1,1,4) substr($1,6)  ##Setting sub string of 1st 4 characters then mentioning characters from 6th character to last of 1st field here.
}
1                                 ##1 will print current line.
' Input_file                      ##Mentioning Input_file name here.

对于简洁的 GNUsed解决方案，这适用于：

sed "s/^(....)0/1/" example.txt

在这里，我们只匹配前 5 个字符——前 4 个字符是免费的，第 5 个字符是零。对于任何匹配项，我们将前 5 个字符替换为仅前 4 个字符。

如果要就地修改文件，可以使用 sed 的-i选项：

sed "s/^(....)0/1/" -i example.txt

(注意-i适用于许多(但不是全部)系统;请参阅此处的解决方法)

如果我的子字符串是正数，如果第四位数字为零，请将其删除：

sed -e 's/([0-9][0-9][0-9])0/1/g' file

如果我的单词是正数，如果第四位数字为零，请将其删除：

sed -e 's/b([0-9][0-9][0-9])0([0-9]*)b/12/g' file

如果要将python用作标记选项，请考虑使用pandas.read_csv函数以及str.split和str.replace方法，然后在写入原始文件时应用str.join方法组合每个派生行的每个拆分片段，例如

import pandas as pd
sss=[]
with open('myfile.txt','r') as f_in:
data = pd.read_csv(f_in,header=None)        
for line in data[0]:
s=line.split()
j=0
ss=""
for i in s[0]:
j+=1
if j==5: # including the first quote(')
if i!='0':
ss+=i
else:
ss+=i
sss.append(line.replace(s[0],ss))                 
j=0
ss=""
with open('myfile.txt','w') as f_out:
for line in sss:
j+=1
ss=''.join(str(line))
if j==len(sss):
f_out.write(ss+'n')
else:
f_out.write(ss+',n')

你可以利用GNUAWKgensub来实现以下方式，让file.txt内容是

'20201': "a" ,
'20202': "e" ,
'20203': "i" ,
'20204': "o" ,
'20205': "u" ,
'20207': "ae" ,
'20209': "ai" ,
'20210': "ao"

然后

awk '{print gensub(/^(....)0/,"\1",1)}' file.txt

输出

'2021': "a" ,
'2022': "e" ,
'2023': "i" ,
'2024': "o" ,
'2025': "u" ,
'2027': "ae" ,
'2029': "ai" ,
'20210': "ao"

解释：我使用 gensub 在替换文本中指定正则表达式组件的功能来指示使用(前 4 个字符)替换(前 4 个字符后跟零)。我们需要获得 4 个前置字符，因为前导 ' 这意味着第 4 位数字是第 5 个字符。

相关内容

最新更新

热门标签：