将文件夹拆分为训练集和测试集



我有5个文件夹安然电子邮件数据集。我想把enron1, enron3, enron5分成训练集,enron2,enron4分成测试集。我可以加载完整的数据集和分割。但不能像前面提到的那样。

for i in range(1,6):
# folder containing the 2 categories of documents in individual folders.
movie_data = load_files(f"/Users/mehedihasan/Desktop/Study/SEM6/COMP723 Data Mining & Knowledge Engineering/Assignment/email data/enron{i}") 
X = np.append(X, movie_data.data)
y = np.append(y, movie_data.target)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

可以用for i in [1,3,5]:for i in [2, 4]:代替range(1, 6)

for i in [1,3,5]:
# ... code ..
X_train = ...
y_train = ...
for i in [2, 4]:
# ... code ..
X_test = ...
y_test = ...

顺便说一句:

如果你有更多的文件夹,你可以使用

  • range(1, n, 2)得到1, 3, 5, 7, 9, ...
  • range(2, n, 2)获取2, 4, 6, 8, 10, ...

最新更新