CNTK NVidia RTX 3060 Cublas Failure 13,层数大于512



我有一个在CNTK 2.7中使用EasyCNTK c#的2000个神经元的LSTM网络,该网络与CPU和Gigabyte NVidia RTX 2060 6GB一起工作良好,但与Gigabyte NVidia RTX 3060 12GB一起,如果我增加神经元数量超过512(在两张卡上使用相同的NVidia驱动程序版本461.72)我会得到此错误

这是我的神经网络配置

int minibatchSize = 8;
int epochCount = 10;
int inputDimension = 10200;
var device = DeviceDescriptor.GPUDevice(0);
// check the current device for running neural networks
Console.WriteLine($"Using device: {device.AsString()}");
var model = new Sequential<double>(device, new[] { inputDimension }, inputName: "Input");            
model.Add(new LSTM(2000, isLastLstm: false));
model.Add(new LSTM(500, selfStabilizerLayer: new SelfStabilization<double>()));
model.Add(new Residual2(128, new Tanh()));
model.Add(new Residual2(1, new Tanh()));    

这是错误,我在选择Dense或者其他图层类型时也会出现错误

Unhandled Exception: System.ApplicationException: CUBLAS failure 13: CUBLAS_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=EVO ; expr=cublasgemmHelper(cuHandle, transA, transB, m, n, k, &alpha, a.Data(), (int) a.m_numRows, b.Data(), (int) b.m_numRows, &beta, c.Data(), (int) c.m_numRows)
[CALL STACK]
> Microsoft::MSR::CNTK::TensorView<half>::  Reshaped
- Microsoft::MSR::CNTK::CudaTimer::  Stop
- Microsoft::MSR::CNTK::GPUMatrix<double>::  MultiplyAndWeightedAdd
- Microsoft::MSR::CNTK::Matrix<double>::  MultiplyAndWeightedAdd
- Microsoft::MSR::CNTK::TensorView<double>::  DoMatrixProductOf
- Microsoft::MSR::CNTK::TensorView<double>::  AssignMatrixProductOf
- std::enable_shared_from_this<Microsoft::MSR::CNTK::MatrixBase>::  shared_from_this (x3)
- CNTK::Internal::  UseSparseGradientAggregationInDataParallelSGD
- CNTK::  CreateTrainer
- CNTK::Trainer::  TotalNumberOfUnitsSeen
- CNTK::Trainer::  TrainMinibatch (x2)
- CSharp_CNTK_Trainer__TrainMinibatch__SWIG_2
- 00007FFF157B7E55 (SymFromAddr() error: The specified module could not be found.)

看起来CNTK不支持CUDA 11, RTX 3060不支持CUDA 10或更早的版本。