我有一个出价电子表格,我必须替换一些值,匹配用逗号分隔的非规则模式(即 1)。 在其最小模式中,它可以是 1,1,但它可以更高,最多 10 次复制(即 1,1,1,1,1,1,1,1,1,1,1,1,1,1...)。我想将系列缩小到 1。
以下是数据框的示例:
chr5 141587227 141587466 240 * exon 0 0 1 0 0 0 0 0 0 0 chr5:140966508-140967052 DIAPH1_23361 chr5 141505592 141505799 208 * promoter_flanking_region 0 1 0 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 c(PCDHGA1_17708,PCDHGA2_17627,PCDHGA3_17505,PCDHGB1_17702,PCDHGA4_17576,PCDHGB2_17511,PCDHGA5_17603,PCDHGB3_17600,PCDHGA6_17551,PCDHGA7_17606,PCDHGB4_17598,PCDHGA8_17560,PCDHGB5_17553,PCDHGA9_17757,PCDHGB6_17668,PCDHGA10_17824,PCDHGB7_17671,PCDHGA11_17515,PCDHGA12_17651,PCDHGC3_17713,PCDHGC4_17790,PCDHGC5_17760) 2 2.31 81651
chr5 141587468 141588358 891 * promoter_flanking_region 0 1 1 0 0 0 0 0 0 0 c(chr5:140966508-140967052,chr5:140967690-140967917) DIAPH1_23361 chr5 140944575 140944811 237 * intron 0 0 0 1,1,1,1,1,1,1,1,1,1 0 0 0 0 0 0 0 c(PCDHA1_11483,PCDHA2_56916,PCDHA3_11465,PCDHA4_11655,PCDHA5_11663,PCDHA6_11423,PCDHA7_11585,PCDHA8_11671,PCDHA9_11458,PCDHA10_56912,PCDHA11_11590,PCDHA12_56962,PCDHA13_11369,PCDHAC1_11533) 3 3.1 643220
chr5 141587468 141588358 891 * promoter_flanking_region 0 1 1 0 0 0 0 0 0 0 c(chr5:140966508-140967052,chr5:140967690-140967917) DIAPH1_23361 chr5 141380219 141380588 370 * intron 0 0 0 1,1,1,1,1,1 0 0 0 0 0 0 0 c(PCDHGA1_17708,PCDHGA2_17627,PCDHGA3_17505,PCDHGB1_17702,PCDHGA4_17576,PCDHGB2_17511,PCDHGA5_17603,PCDHGB3_17600,PCDHGA6_17551) 3 3.41 207509
chr5 141587468 141588358 891 * promoter_flanking_region 0 1 1 0 0 0 0 0 0 0 c(chr5:140966508-140967052,chr5:140967690-140967917) DIAPH1_23361 chr5 141381619 141381892 274 * intron 0 0 0 1,1,1,1,1,1 0 0 0 0 0 0 0 c(PCDHGA1_17708,PCDHGA2_17627,PCDHGA3_17505,PCDHGB1_17702,PCDHGA4_17576,PCDHGB2_17511,PCDHGA5_17603,PCDHGB3_17600,PCDHGA6_17551) 3 3.41 206157
sed
尝试:
sed -i -r 's/b1,w+/1/g' file.txt
sed -i -r 's/b1.*/1/g' file.txt
我理想的输出是这样的:
chr5 141587227 141587466 240 * exon 0 0 1 0 0 0 0 0 0 0 chr5:140966508-140967052 DIAPH1_23361 chr5 141505592 141505799 208 * promoter_flanking_region 0 1 0 1 c(PCDHGA1_17708,PCDHGA2_17627,PCDHGA3_17505,PCDHGB1_17702,PCDHGA4_17576,PCDHGB2_17511,PCDHGA5_17603,PCDHGB3_17600,PCDHGA6_17551,PCDHGA7_17606,PCDHGB4_17598,PCDHGA8_17560,PCDHGB5_17553,PCDHGA9_17757,PCDHGB6_17668,PCDHGA10_17824,PCDHGB7_17671,PCDHGA11_17515,PCDHGA12_17651,PCDHGC3_17713,PCDHGC4_17790,PCDHGC5_17760) 2 2.31 81651
chr5 141587468 141588358 891 * promoter_flanking_region 0 1 1 0 0 0 0 0 0 0 c(chr5:140966508-140967052,chr5:140967690-140967917) DIAPH1_23361 chr5 140944575 140944811 237 * intron 0 0 0 1 0 0 0 0 0 0 0 c(PCDHA1_11483,PCDHA2_56916,PCDHA3_11465,PCDHA4_11655,PCDHA5_11663,PCDHA6_11423,PCDHA7_11585,PCDHA8_11671,PCDHA9_11458,PCDHA10_56912,PCDHA11_11590,PCDHA12_56962,PCDHA13_11369,PCDHAC1_11533) 3 3.1 643220
chr5 141587468 141588358 891 * promoter_flanking_region 0 1 1 0 0 0 0 0 0 0 c(chr5:140966508-140967052,chr5:140967690-140967917) DIAPH1_23361 chr5 141380219 141380588 370 * intron 0 0 0 1 0 0 0 0 0 0 0 c(PCDHGA1_17708,PCDHGA2_17627,PCDHGA3_17505,PCDHGB1_17702,PCDHGA4_17576,PCDHGB2_17511,PCDHGA5_17603,PCDHGB3_17600,PCDHGA6_17551) 3 3.41 207509
chr5 141587468 141588358 891 * promoter_flanking_region 0 1 1 0 0 0 0 0 0 0 c(chr5:140966508-140967052,chr5:140967690-140967917) DIAPH1_23361 chr5 141381619 141381892 274 * intron 0 0 0 1 0 0 0 0 0 0 0 c(PCDHGA1_17708,PCDHGA2_17627,PCDHGA3_17505,PCDHGB1_17702,PCDHGA4_17576,PCDHGB2_17511,PCDHGA5_17603,PCDHGB3_17600,PCDHGA6_17551) 3 3.41 206157
我想你正在寻找类似的东西
sed -r 's/b1(,1)+b/1/g'
即匹配1
后跟一个或多个,1
。
看起来你想要awk '$28=1' input-file
但也许你想过滤一点并做
awk '$28 ~ "^1(,1)*$" {$28=1} 1' input-file
第一个只是用1
替换第 28 列,而第 2 列仅在与正则表达式匹配时才用1
替换第 28 列^1(,1)*$