如何提取字符串的某些部分?



我有一个形式为

的字符串
11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550          1 0 0          0 1 0  8  0  8        7  5  3      355243301884671724    17.0   15.0 

和想写这一个csv文件形式

1,1,1663.31578,6.045e-26,0.6292,0.0698,0.304,2724.0415,0.64,-0.00955 

我知道如何在python中做到这一点的唯一方法是做一些形式的

import csv
s = "11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550          1 0 0          0 1 0  8  0  8        7  5  3      355243301884671724    17.0   15.0"
with open(<path to output_csv>, "w") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for line in data:
writer.writerow([s[0:2], s[2], ..., s[59:68]])

这当然有效,但似乎是一种非常简单的方法。有更好的选择吗?

如果字符串的每个元素之间有空格,最简单的方法是:

s = "11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550          1 0 0          0 1 0  8  0  8        7  5  3      355243301884671724    17.0   15.0"
s = [x for x in s.split(" ") if x != ""] 
csv_string = ",".join(s)

这是即使有多个元素之间的空间,就像在例子。

—EDIT—

根据对话元素有固定的断点。所以信息可以这样使用

s = "11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550          1 0 0          0 1 0  8  0  8        7  5  3      355243301884671724    17.0   15.0"
breakpoints = [1,2,14,24,34,40,44,55,58,65]
breakpoints.insert(0,0) # we need starting zero to make for loop work
elements = []
for i in range(len(breakpoints)-1):
elements.append(s[breakpoints[i]:breakpoints[i+1]].strip())
",".join(elements)

此方法还可以消除额外的空白,因为它在插入元素列表之前剥离子字符串。

如果这样:

11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550  

应该是这样的:

s =  "11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64 -.009550"

那就很简单了:

print(s.split(" "))

否则,您将需要手动执行最后的分割:

s = " 11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550  "
parts = s.split(" ")
last = parts.pop().split("-")
parts += last
print(parts)

您可以使用s.split()以空格分隔字符串。

>>> s = " 11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550          1 0 0          0 1 0  8  0  8        7  5  3      355243301884671724    17.0   15.0"
>>> s.split()
['11', '1663.315780', '6.045E-26', '6.292E-01.06980.304', '2724.04150.64-.009550', '1', '0', '0', '0', '1', '0', '8', '0', '8', '7', '5', '3', '355243301884671724', '17.0', '15.0']

这里有一个使用pandas的可能解决方案。你真的需要用空格分隔元素

import pandas as pd
txt = "1 1 1663.315780 6.045E-26 6.292E-01 .06980 .304 2724 .04150 .64 -.009550"
# split and put the list to a dataframe
df = pd.DataFrame({"a": txt.split(" ")})
# convert to numeric 
df["a"] = pd.to_numeric(df["a"])
# save to csv
df.to_csv("file.csv", index=False)

你可以做

样品考虑

c = '11 1663.315780 6.045E-26 6.292E-01.06980.304 2724.04150.64-.009550'
d = c.replace(" ", ",")
print(d)

会给

11,1663.315780,6.045E-26,6.292E-01.06980.304,2724.04150.64-.009550

print(c.split(" "))

会给

['11', '1663.315780', '6.045E-26', '6.292E-01.06980.304', '2724.04150.64-.009550']