从文本文件中读取双精度值



尝试使用 C# 应用程序从文本文件中读取数据。有多行数据,每行数据都以整数开头,然后是一堆双精度值。文本文件的一部分如下所示,

33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03
34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03
35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03

这里的 33、34、35 是整数值,后跟 6 个双精度值。并且这些双精度值不能保证它们之间有空格或其他分隔符。即,如果双精度为负数,那么它前面会有一个"-",这将占用空间。所以基本上,所有 6 个双精度值都可能在一起。

现在的挑战是,如何优雅地提取它?

我尝试过:

String.Split(' ');

这将不起作用,因为不能保证在初始整数值和其余双精度值之间有空格。

这可以在C++中使用sscanf轻松解决。

double a, b, c, d, e, f;
sscanf(string, "%d %lf%lf%lf%lf%lf%lf", &a, &b, &c, &d, &e, &f);
// here string contains a line of data from text file.

包含双精度值的文本文件由第三方工具生成,我无法控制其输出。

有没有办法逐行优雅地提取整数和双精度值?

如果我没看错的话,您有一个"固定宽度数据"格式。而不是你可以简单地解析这个事实。

即假设值在文件中d:tempdoubles.txt

void Main()
{
var filename = @"d:tempdoubles.txt";
Func<string, string[]> split = (s) =>
{
string[] res = new string[7];
res[0] = s.Substring(0, 2);
for (int i = 0; i < 6; i++)
{
res[i + 1] = s.Substring(2 + (i * 19), 19);
}
return res;
};
var result = from l in File.ReadAllLines(filename)
let la = split(l)
select new
{
i = int.Parse(la[0]),
d1 = double.Parse(la[1]),
d2 = double.Parse(la[2]),
d3 = double.Parse(la[3]),
d4 = double.Parse(la[4]),
d5 = double.Parse(la[5]),
d6 = double.Parse(la[6])
};
foreach (var e in result)
{
Console.WriteLine($"{e.i}, {e.d1}, {e.d2}, {e.d3}, {e.d4}, {e.d5}, {e.d6}");
}
}

输出:

33, 0.0573140941467, 0.00011291426239, 0.00255553577735, 4.97192659486E-05, 0.0141869181079, -0.000147813598922
34, 0.0570076593453, 0.000100112550891, 0.00256427138318, -8.68691490164E-06, 0.0142821920093, -0.000346011975369
35, 0.0715507714946, 0.000316132133031, -0.0106581466521, -9.205137369E-05, 0.0138018668842, -0.000212219497066

PS:有了你的确切数据,int应该分配更多的空间。

用正则表达式解决这个问题。我的第一枪是:

"[s-+]d+.d+E[+-]dd"

我只是这样尝试过:

using System;
using System.Globalization;
using System.Text.RegularExpressions;
namespace ConsoleApp1 {
class Program {
static void Main(string[] args) {
var fileContents =
"33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03"
+ "34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03"
+ "35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03";
var rex = new Regex(@"[s-+]d+.d+E[+-]dd", RegexOptions.Multiline);
foreach (Match match in rex.Matches(fileContents)) {
double d = double.Parse(match.Value.TrimStart(), NumberFormatInfo.InvariantInfo);
Console.WriteLine("found a match: " + match.Value.TrimStart() + " => " + d);
}
Console.ReadLine();
}
}
}

使用此输出(德语本地化,逗号作为小数分隔符(:

found a match: 0.573140941467E-01 => 0,0573140941467
found a match: 0.112914262390E-03 => 0,00011291426239
found a match: 0.255553577735E-02 => 0,00255553577735
found a match: 0.497192659486E-04 => 4,97192659486E-05
found a match: 0.141869181079E-01 => 0,0141869181079
found a match: -0.147813598922E-03 => -0,000147813598922
found a match: 0.570076593453E-01 => 0,0570076593453
found a match: 0.100112550891E-03 => 0,000100112550891
found a match: 0.256427138318E-02 => 0,00256427138318
found a match: -0.868691490164E-05 => -8,68691490164E-06
found a match: 0.142821920093E-01 => 0,0142821920093
found a match: -0.346011975369E-03 => -0,000346011975369
found a match: 0.715507714946E-01 => 0,0715507714946
found a match: 0.316132133031E-03 => 0,000316132133031
found a match: -0.106581466521E-01 => -0,0106581466521
found a match: -0.920513736900E-04 => -9,205137369E-05
found a match: 0.138018668842E-01 => 0,0138018668842
found a match: -0.212219497066E-03 => -0,000212219497066

我只是非最佳状态,将"E-"字符串替换为其他字符串,同时将所有负号替换为空格和负号(" -"(,然后还原所有"E-"值。

然后我能够使用 split 来提取值。

private static IEnumerable<double> ExtractValues(string values)
{
return values.Replace("E-", "E*").Replace("-", " -").Replace("E*", "E-").Split(' ').Select(v => double.Parse(v));
}

你可以这样做:

public void ParseFile(string fileLocation)
{
string[] lines = File.ReadAllLines(fileLocation);
foreach(var line in lines)
{
string[] parts = var Regex.Split(line, "(?((?<!E)-)| )");
if(parts.Any())
{
int first = int.Parse(parts[0]);
double[] others = parts.Skip(1).Select(a => double.Parse(a)).ToArray();
}
}
}   

到目前为止,我看到的答案是如此复杂。这是一个简单的,没有过度思考

根据@Veljko89的评论,我已经更新了代码,支持无限数量

List<double> ParseLine(string line)
{
List<double> ret = new List<double>();
ret.Add(double.Parse(line.Substring(0, line.IndexOf(' '))));
line = line.Substring(line.IndexOf(' ') + 1);
for (; !string.IsNullOrWhiteSpace(line); line = line.Substring(line.IndexOf('E') + 4))
{
ret.Add(double.Parse(line.Substring(0, line.IndexOf('E') + 4)));
}
return ret;
}

如果我们不能使用string.Split我们可以尝试在Regex.Split的帮助下通过正则表达式进行拆分; 对于给定的line

string line = @"  33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03";

我们可以试试

// Split either
//   1. by space
//   2. zero length "char" which is just after a [0..9] digit and followed by "-" or "+"
var items = Regex
.Split(line, @" |((?<=[0-9])(?=[+-]))")
.Where(item => !string.IsNullOrEmpty(item)) // we don't want empty parts 
.Skip(1)                                    // skip 1st 33
.Select(item => double.Parse(item));        // we want double
Console.WriteLine(string.Join(Environment.NewLine, items));

并得到

0.573140941467E-01
0.112914262390E-03
0.255553577735E-02
0.497192659486E-04
0.141869181079E-01
-0.147813598922E-03

如果是文本文件,我们应该拆分每一行:

Regex regex = new Regex(@" |((?<=[0-9])(?=[+-]))");
var records = File
.ReadLines(@"c:MyFile.txt") 
.Select(line => regex
.Split(line)
.Where(item => !string.IsNullOrEmpty(item))
.Skip(1)
.Select(item => double.Parse(item))
.ToArray());

演示:

string[] test = new string[] {
// your examples
"  33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03",
"  34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03",
" 35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03",
// Some challenging cases (mine)
"    36 123+456-789    123e+78 9.9e-95 0.0001", 
};
Regex regex = new Regex(@" |((?<=[0-9])(?=[+-]))");
var records = test
.Select(line => regex
.Split(line)
.Where(item => !string.IsNullOrEmpty(item))
.Skip(1)
.Select(item => double.Parse(item))
.ToArray());
string testReport = string.Join(Environment.NewLine, records
.Select(record => $"[{string.Join(", ", record)}]"));
Console.WriteLine(testReport);

结果:

[0.0573140941467, 0.00011291426239, 0.00255553577735, 4.97192659486E-05, 0.0141869181079, -0.000147813598922]
[0.0570076593453, 0.000100112550891, 0.00256427138318, -8.68691490164E-06, 0.0142821920093, -0.000346011975369]
[0.0715507714946, 0.000316132133031, -0.0106581466521, -9.205137369E-05, 0.0138018668842, -0.000212219497066]
[123, 456, -789, 1.23E+80, 9.9E-95, 0.0001]

另一种解决方案,单独处理每一行并包含 int 值:

static void Main(string[] args) {
string[] fileLines = {
"33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03",
"34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03",
"35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03"
};
var rex = new Regex(@"b([-+]?d+(?:.d+(?:E[+-]d+)?)?)b", RegexOptions.Compiled);
foreach (var line in fileLines) {
var dblValues = new List<double>();
foreach (Match match in rex.Matches(line)) {
string strVal = match.Groups[1].Value;
double number = Double.Parse(strVal, NumberFormatInfo.InvariantInfo);
dblValues.Add(number);
}
Console.WriteLine(string.Join("; ", dblValues));
}
Console.ReadLine();
}
}

结果/输出为:

33; 0,0573140941467; 0,00011291426239; 0,00255553577735; 4,97192659486E-05; 0,0141869181079; -0,000147813598922
34; 0,0570076593453; 0,000100112550891; 0,00256427138318; -8,68691490164E-06; 0,0142821920093; -0,000346011975369
35; 0,0715507714946; 0,000316132133031; -0,0106581466521; -9,205137369E-05; 0,0138018668842; -0,000212219497066

最新更新