我有一个用于对象检测算法的数据集,该算法包含图片(.jpg(和对应的.xml文件包含边界框。
我想编写一个将数据集随机分配到火车和测试集中的脚本,这意味着我必须确保将JPG分配给其对应的XML到同一目录。
我应该如何编辑以下代码以实现此目标?
另外,这是"最好的"这样做的方法,或者最好在XML到CSV转换之后或生成CSV转换后分配数据集?
import shutil, os, glob, random
# List all files in a directory using os.listdir
basepath = '/home/bis/hans/bis/workspace/images/Synced_dataset'
filenames = []
for entry in os.listdir(basepath):
if os.path.isfile(os.path.join(basepath, entry)):
#print(entry)
filenames.append(entry)
filenames.sort() # make sure that the filenames have a fixed order before shuffling
random.seed(230)
random.shuffle(filenames) # shuffles the ordering of filenames (deterministic given the chosen seed)
split = int(0.8 * len(filenames))
train_filenames = filenames[:split]
test_filenames = filenames[split:]
我最好的选择是正确顺序创建两个文件列表( filenames
for CC_1 for CC_1和 xmlnames
(,以及一个索引 indices=[i for i in range(len(filenames))]
列表。
然后,您可以将指数列表列出:
random.seed(230)
random.shuffle(indices)
最后,您为jpg
和xml
文件创建火车和测试集:
split = int(0.8 * len(filenames))
file_train = [filenames[idx] for idx in indices[:split]]
file_test = [filenames[idx] for idx in indices[split:]]
xml_train = [xmlnames[idx] for idx in indices[:split]]
xml_test = [xmlnames[idx] for idx in indices[split:]]
import shutil, os, glob, random
# List all files in a directory using os.listdir
basepath = 'images/'
labelpath='label/'
filenames = []
xmlnames = []
for entry in os.listdir(basepath):
if os.path.isfile(os.path.join(basepath, entry)):
print(entry)
filenames.append(entry)
for entry in os.listdir(labelpath):
if os.path.isfile(os.path.join(labelpath, entry)):
print(entry)
xmlnames.append(entry)
indices=[i for i in range(len(filenames))]
filenames.sort()
xmlnames.sort() # make sure that the filenames have a fixed order before shuffling
random.seed(230)
random.shuffle(indices) # shuffles the ordering of filenames (deterministic given the chosen seed)
split = int(0.8 * len(filenames))
file_train = [filenames[idx] for idx in indices[:split]]
file_test = [filenames[idx] for idx in indices[split:]]
xml_train = [xmlnames[idx] for idx in indices[:split]]
xml_test = [xmlnames[idx] for idx in indices[split:]]
print(file_test)
print(xml_test)
因此,我遵循上述建议(约瑟夫(添加索引,然后当我们进行测试和训练变量时,完全相同的图像和标签会在变量中添加,希望这会有所帮助