小贝子编程

从python中的大.txt文件中删除下划线标记

本文关键字：删除下划线文件 txt python python split
更新时间 : 2023-09-22
英文 : Remove underscore tags from big .txt file in python

我有一个大的。txt文件(大约600 mb)，我试图分割所有下划线和它的前导文本

xxxxxxxx_NUM 0.20825405 -0.0756654 0.026837101
have_VERB -0.24344832 0.2747727 -0.024150277
two_NUM -0.038767103 0.20430847 0.10068103

我试过使用拆分方法和正则表达式模式，但没有成功。作为一个例子，这个文本的输出应该是:

xxxxxxxx 0.20825405 -0.0756654 0.026837101
have -0.24344832 0.2747727 -0.024150277
two -0.038767103 0.20430847 0.10068103

使用fileinput模块替换正则表达式:

import fileinput
import re
with fileinput.input(files='your_filename.txt',
encoding='utf-8', inplace=True) as f:
for line in f:
line = re.sub(r'_[^_s]+', '', line, count=1)
print(line.strip())

相关内容

最新更新