小贝子编程

按百分比将文本文件拆分为多个文件，用于测试和训练

本文关键字：文件测试用于百分比文本拆分 python pandas
更新时间 : 2023-09-22
英文 : Split a text file into multiple files by percentage for test and train

我有一个50000+行的大文本文件，我如何将它们随机分成70%用于训练，20%用于测试，10%用于开发。

result expected:Train.txt, test.txt, dev.txt

我觉得这段代码简单多了。

## allocating train, test and validate datasets
import random 
fin = open('unique.txt', 'rb') 
f75out = open("train.txt", 'wb') 
f125aout = open("test.txt", 'wb')
f125bout = open("validate.txt", 'wb')
for line in fin: 
r = random.random() 
if (0.0 <=  r <= 0.75): 
f75out.write(line) 
elif (0.75 < r <= 0.875): 
f125aout.write(line) 
else:
f125bout.write(line)
fin.close() 
f75out.close() 
f125aout.close() 
f125bout.close()

查看Scikit Learn的方法train_testrongplit()将训练数据分成子集。

然后把你的变量保存在文件中。

按百分比将文本文件拆分为多个文件，用于测试和训练

相关内容

最新更新

热门标签：