CTF阅读器为CNTK中的大文件投掷错误



我在GitHub上的CNTK教程后使用CTF读取器函数。

def create_reader(path, is_training, input_dim, label_dim):
    return MinibatchSource(CTFDeserializer(path, StreamDefs(
        features = StreamDef(field='x', shape=input_dim, is_sparse=True),
        labels = StreamDef(field='y', shape=label_dim, is_sparse=False)
    )), randomize=is_training, epoch_size= INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)

这完全很好,除非输入文件大小大于某个大小(未知)。然后,它引发了这样的错误:

WARNING: Sparse index value (269) at offset 8923303 in the input file (C:localCNTK-2-0-beta6-0-Windows-64bit-CPU-OnlycntkExamplescommondata_pos_train_balanced_ctf.txt) exceeds the maximum expected value (268).
attempt: Reached the maximum number of allowed errors while reading the input file (C:localCNTK-2-0-beta6-0-Windows-64bit-CPU-OnlycntkExamplescommondata_pos_train_balanced_ctf.txt)., retrying 2-th time out of 5...
.
.
.
RuntimeError: Reached the maximum number of allowed errors while reading the input file (C:localCNTK-2-0-beta6-0-Windows-64bit-CPU-OnlycntkExamplescommondata_pos_train_balanced_ctf.txt).

我确定在文件textparser.cpp中丢弃了这种错误https://github.com/microsoft/cntk/blob/5633e79febe1dc514714714919190ad1944742328a/source/source/source/cntktektextktextxtxtxtparreater/textparser.cpp

对此的解决方案是什么?

您需要知道输入的维度,并且还知道索引从0开始。

最新更新