如果第 3 列中的值位于另一个文本文件中，则删除行

我有一个长文本文件（单倍型.txt），看起来像这样：

19 rs541392352 55101281 A 0 0 ...
19 rs546022921 55106773 C T 0 ...
19 rs531959574 31298342 T 0 0 ...

一个简单的文本文件（位置.txt）如下所示：

如果要删除位置.txt中存在第三个字段的所有行，以获得以下输出：

19 rs541392352 55101281 A 0 0 ...
19 rs531959574 31298342 T 0 0 ...

我希望有人能帮忙。

使用 AWK：

awk 'NR == FNR{a[$0] = 1;next}!a[$3]' positions.txt haplotypes.txt

故障：

NR == FNR { # If file is 'positions.txt'
  a[$0] = 1 # Store line as key in associtive array 'a'
  next      # Skip next blocks
}
!a[$3]      # Print if third column is not in the array 'a'

这应该有效：

$ grep -vwFf positions.txt haplotypes.txt 
19 rs541392352 55101281 A 0 0 ...
19 rs531959574 31298342 T 0 0 ...

-f positions.txt：从文件中读取模式
-v：反转匹配
-w：仅匹配完整的单词（避免子字符串匹配）
-F：固定字符串匹配（不要将模式解释为正则表达式）

这期望只有第三列看起来像一个很长的数字。如果模式恰好与未显示的列中的某个列中的完全相同的单词匹配，则可能会收到误报。为了避免这种情况，您必须使用按列过滤的awk解决方案（请参阅andlrc的答案）。

相关内容

最新更新

热门标签：