我有一堆文件,格式如下:
20130201:14:58:47 I search: xx ('ID'= (xxxxxxxx) )
20130201:14:58:56 I request: search | For ID | Search
20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )
C#中是否有类似python中的restkey?我想获取前三项(日期-时间、我(称为操作)、搜索/请求),并将其中的每一项插入SQL表中各自的列中,然后将行的其余部分放在第4列中。
在python中,这很容易,但我无法处理所有的困难,我必须跳过这些困难才能将其插入到我的sql表中。所以我转到了C#,在那里连接SSMS更容易。
虽然String.Split()
可能是一种不错且简单的方法,但我更喜欢使用Regex
进行这种解析。在这种情况下,一种模式如下:
(?<DateTime>d{8}:d{2}:d{2}:d{2})s(?<Action>w)s(?<SearchOrRequest>search|request):s(?<RestOfTheLine>.*)
为您提供所需的一切,很好地分组为"DateTime"、"Action"、"SearchOrRequest"one_answers"RestOfLine"匹配组。
var pattern = "(?<DateTime>d{8}:d{2}:d{2}:d{2})s(?<Action>w)s(?<SearchOrRequest>search|request):s(?<RestOfTheLine>.*)";
var regex = new Regex(pattern);
var match = regex.Match(inputString);
var theDate = match.Groups["DateTime"].Value;
var theAction = match.Groups["Action"].Value;
var theChoice = match.Groups["SearchOrRequest"].Value;
var theRest = match.Groups["RestOfTheLine"].Value;
使用字符串。拆分方法
string myString = "20130201:14:58:47 I search: xx ('ID'= (xxxxxxxx) )"
string[] strarr = myString.split(' ');
string theLetterIVariableThing = strarr[1];
string iddate = strarr[0];
StringBuilder sb = new StringBuilder();
for (int i = 1; i < strarr.Length; i++)
{
sb.Append(strarr[i]);
sb.Append(" ");
}
string trailingText = sb.ToString();
string id = iddate.split(':')[0];
sb.Clear();
for (int i = 1; i < 4; i++)
{
sb.Append(iddate.split(':'))[i];
}
string date = sb.ToString();
我认为这会奏效,但可能还有很长的路要走。
您可以使用.NET函数String.Splitt().来完成此操作
假设你的日期字符串是固定长度的,这应该有效:
//string inputStr = "20130201:14:58:47 I search: xx ('ID'= (xxxxxxxx) )";
//string inputStr = "20130201:14:58:56 I request: search | For ID | Search";
string inputStr = "20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )";
string dateStr = inputStr.Substring(0, 17);
string[] splitStr = inputStr.Substring(18).Split(new char[] { ':' });
string actionStr = splitStr[0].Substring(0, splitStr[0].IndexOf(' '));
string userStr = splitStr[0].Substring(2);
string restStr = splitStr[1].TrimStart();
// print out what we parsed
Console.WriteLine(inputStr);
Console.WriteLine(dateStr);
Console.WriteLine(actionStr);
Console.WriteLine(userStr);
Console.WriteLine(restStr);
输出:
20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )
20130201:14:58:56
I
search
xx ('ID'= (xxxxxxx) )
我尝试了一种稍微不同的方法。我创建了一个控制台程序,可以将这些文件转换为完全限定的csv文件。然后,您可以很容易地使用ssms导入到sql中。
static void Main(string[] args)
{
if (args.Length == 2)
{
using (StreamWriter sw = new StreamWriter(args[1]))
{
using (StreamReader sr = new StreamReader(args[0]))
{
String line;
while ((line = sr.ReadLine()) != null)
{
int index = 0;
int oldIndex = 0;
string dateTime = null;
string action = null;
string task = null;
string details = null;
index = line.IndexOf(' ', oldIndex);
dateTime = line.Substring(oldIndex, index - oldIndex);
oldIndex = index + 1;
index = line.IndexOf(' ', oldIndex);
action = line.Substring(oldIndex, index - oldIndex);
oldIndex = index + 1;
index = line.IndexOf(':', oldIndex);
task = line.Substring(oldIndex, index - oldIndex);
oldIndex = index + 1;
details = line.Substring(oldIndex + 1);
sw.WriteLine(""{0}","{1}","{2}","{3}"", dateTime, action, task, details);
}
}
}
}
else
{
Console.WriteLine("Usage: program <input> <output>");
}
}
在这种情况下,正则表达式可能是正确的使用方法。
var testVectors = new[]
{
"20130201:14:58:47 I search: xx ('ID'= (xxxxxxxx) )",
"20130201:14:58:56 I request: search | For ID | Search",
"20130201:14:58:56 I search: xx ('ID'= (xxxxxxx) )"
};
var expression = @"^(?<TimeStamp>[0-9]{8}(:[0-9]{2}){3}) (?<Action>[^ ]+) (?<Type>search|request): (?<Rest>.*)$";
var regex = new Regex(expression);
foreach (var testVector in testVectors)
{
var match = regex.Match(testVector);
Console.WriteLine(match.Groups["Timestamp"]);
Console.WriteLine(match.Groups["Action"]);
Console.WriteLine(match.Groups["Type"]);
Console.WriteLine(match.Groups["Rest"]);
}
所使用的表达式做了一些假设——您所称的action是一个不包含任何空格的字符序列,并且只有search
和request
是我所称类型的有效值。但是,如果任何假设都不成立,那么采用这个表达应该很容易。