Dataframes, csv, and CNTK

我一直在与CNTK一起玩，并且发现只能使用numpy阵列对模型进行训练。这是正确的吗？

这对于图像识别等是有意义的。

如何将整理数据集（使用熊猫读取为数据框）变成可以使用逻辑回归训练的格式？我试图将其读成一个numpy数组

 np.genfromtxt(“My.csv",delimiter=',' , dtype=float)

，我也尝试用

包装变量

np.array.MyVeriable.astype('float32')

，但我没有得到结果，我希望能够喂养模型。

我在教程中也找不到有关如何在CNTK中的表格数据范围内执行ML的任何内容。

不支持吗？

cntk 104显示了如何使用pandas dataframes和numpy。

https://github.com/microsoft/cntk/blob/master/master/tutorials/cntk_104_finance_timeseries_basic_basic_with_pandas_pandas_numpy.ipynb

cntk 106b显示了如何使用CSV文件读取数据。

https://github.com/microsoft/cntk/blob/master/master/tutorials/cntk_106b_106b_lstm_lstm_timeseries_with_iot_iot_data.ipynb

感谢这些链接。这就是我最终在CSV中阅读的方式，但Sayan请根据需要纠正：

def generate_data_from_csv():
# try to find the data file local. If it doesn't report "file does not exists" if it does report "using loacl file"
data_path = os.path.join("MyPath")
csv_file = os.path.join(data_path, "My.csv")
if not os.path.exists(data_path):
    os.makedirs(data_path)
if not os.path.exists(data_file):
    print("file does not exists")
else:
    print("using loacl file")
df = pd.read_csv(csy_file, usecols = ["predictor1", "predictor2",
"predictor3", "predictor4", "dependent_variable"], dtype=np.float32)
return df

然后，我将该数据框保存为triagh_data

training_data = generate_data_from_csv()

i然后将该数据框架变成一个numpy阵列，如下所示

training_features = np.asarray(training_data[[["predictor1",    
"predictor2", "predictor3", "predictor4",]], dtype = "float32")
training_labels = np.asarray(training_data[["dependent_variable"]],
dtype="float32")

训练我使用此代码的模型：

features, labels = training_features[:,[0,1,2,3]], training_labels

相关内容

最新更新

热门标签：