如何解析包含CSV数据集的字符串变量



我看到了关于如何读取CSV文件的参考,例如:

X = csvread('gs_train.csv');

但是,当CSV数据在变量中时,我找不到任何引用。

具体来说,我有:

output = 1546405200000,38.7225,39.7125,38.5575,39.48,148158948
1546491600000,35.995,36.43,35.5,35.5475,365248780
1546578000000,36.1325,37.1375,35.95,37.065,234284280
1546837200000,37.175,37.2075,36.475,36.9825,219111056
1546923600000,37.39,37.955,37.13,37.6875,164101256
1547010000000,37.8225,38.6325,37.4075,38.3275,180396324
1547096400000,38.125,38.4925,37.715,38.45,143122680
1547182800000,38.22,38.425,37.8775,38.0725,108082828
1547442000000,37.7125,37.8175,37.305,37.5,129756744
1547528400000,37.5675,38.3475,37.5125,38.2675,114841296
1547614800000,38.27,38.97,38.25,38.735,122278824
1547701200000,38.55,39.415,38.315,38.965,119284640
1547787600000,39.375,39.47,38.9952,39.205,135004092
1548133200000,39.1025,39.1825,38.155,38.325,121575880
1548219600000,38.5375,38.785,37.925,38.48,92522280
1548306000000,38.5275,38.62,37.935,38.175,101766196
1548392400000,38.87,39.5325,38.58,39.44,133635572
1548651600000,38.9475,39.0825,38.415,39.075,104768232
1548738000000,39.0625,39.5325,38.5275,38.67,166348956
1548824400000,40.8125,41.5375,40.0575,41.3125,244337120
1548910800000,41.5275,42.25,41.14,41.61,162958596

我想以结束

A = 
1546405200000 38.7225 39.7125 38.5575 39.48 148158948
1546491600000 35.995 36.43 35.5 35.5475 365248780
1546578000000 36.1325 37.1375 35.95 37.065 234284280
1546837200000 37.175 37.2075 36.475 36.9825 219111056
1546923600000 37.39 37.955 37.13 37.6875 164101256
1547010000000 37.8225 38.6325 37.4075 38.3275 180396324
1547096400000 38.125 38.4925 37.715 38.45 143122680
1547182800000 38.22 38.425 37.8775 38.0725 108082828
1547442000000 37.7125 37.8175 37.305 37.5 129756744
1547528400000 37.5675 38.3475 37.5125 38.2675 114841296
1547614800000 38.27 38.97 38.25 38.735 122278824
1547701200000 38.55 39.415 38.315 38.965 119284640
1547787600000 39.375 39.47 38.9952 39.205 135004092
1548133200000 39.1025 39.1825 38.155 38.325 121575880
1548219600000 38.5375 38.785 37.925 38.48 92522280
1548306000000 38.5275 38.62 37.935 38.175 101766196
1548392400000 38.87 39.5325 38.58 39.44 133635572
1548651600000 38.9475 39.0825 38.415 39.075 104768232
1548738000000 39.0625 39.5325 38.5275 38.67 166348956
1548824400000 40.8125 41.5375 40.0575 41.3125 244337120

我不确定Matlab和Octave是否有相同的解决方案来处理这种情况。

除了rahnema1的一行外,您还可以使用textscan来完成此操作,但您需要手动提供需要解析的列数。textscan的输出是单元格形式的,所以如果你的csv是严格的数字,你可以将其转换为数字矩阵:

cell2mat( textscan( output, "%f,%f,%f,%f,%f,%f" ) )

但是,如果csv文件中也包含要捕获的非数字字段,则文本扫描方法可能很有用,在这种情况下,可以将输出保留为单元格。

与大多数Matlab输入/输出函数一样,没有让csvreadread*从变量或其他内存数据源读取的选项。你必须将它反弹到一个临时文件并读取它。

如果你问我的话,Matlab标准库有点缺点。

如果迫切希望在内存中执行此操作,则可以使用支持读取InputStreams的Java CSV解析库:获取字符串的原始字节,将其封装在Java ByteArrayInputStream中,然后从中进行解析。不过,为了使其高效工作,您可能需要编写一些自定义Java代码,这样结果就可以作为数组高效地传递回来,而不需要调用多个Java方法。来自M代码的Java方法调用很慢。

或者,如果你迫切需要速度,可以制作一个RAMdisk并将其用于临时文件。

是的,奥克塔夫也是如此。

在Octave中,我可以使用str2num将逗号分隔的字符串转换为矩阵:

A = str2num (output);

我在Matlab文档页面上搜索了在字符串上操作的函数;替换";作用这似乎能满足你的需求。

在您的情况下,您需要:

A = replace(X, ",", " ");

最新更新