如何在同一行中使用字符串拆分具有多个值的数据框



我读取我的pandas数据框架为:

ERA5RS12 = pd.read_csv('F:/ERA5_RS/[12]-PIRATA-2017_20171127_100021/wspd_hl.csv')

ERA5RS12

输出是

我想要这样的:

经度,纬度,价值,水平

33岁的9 20

8.4,33岁的9 10 40

33岁的9 11100

确保CSV文件的分隔符为,。否则,您可以在调用pd.read_csv时指定与参数sep对应的自定义分隔符。参考文档

:根据你提供的Excel文件,我将其转换为CSV格式。下面是生成所需内容的代码片段。

import pandas as pd
with open("wspd_hl.csv", "r") as f:
lines = [line.strip() for line in f.readlines()]
df = pd.DataFrame(columns=lines[0].split(" "))
for i, line in enumerate(lines[1:], 1):
if i % 2 == 1:
line = line.replace(",", "")
df.loc[len(df)] = [value for value in line.split(" ") if value.strip() != ""]

输出:

<表类>指数纬度经度值水平tbody><<tr>09.000321.9906.67322019.000321.9906.82823029.000321.9906.98145039.000321.9907.07567049.000321.9907.170510059.000321.9907.219012069.000321.9907.261314079.000321.9907.298916089.000321.9907.335218099.000321.9907.3685200109.000321.9907.4524250119.000321.9907.5457300129.000321.9907.6750350139.000321.9907.9962400149.000321.9908.318345015道明>321.9908.419150016道明>321.9908.383155017道明>321.9908.299360018道明>321.9908.3508650199.000321.9908.4924700209.000321.9908.6865750219.000321.9908.959080022道明>321.9909.2247850239.000321.9909.472590024道明>321.9909.686795025道明>321.9909.84661000269.000321.99010.12831100279.000321.99010.42101200289.000321.99010.75951300299.000321.99011.1359140030道明>321.99011.51501500319.000321.99011.8761160032道明>321.99012.55961800339.000321.99012.8827190034道明>321.99013.2004200035道明>321.99013.51172200369.000321.99013.34592400379.000321.99012.77132600389.000321.99011.98372800399.000321.99011.16243000

如果我是你,我会编写自己的读取器来读取这样的数据。使用下面的代码,您可以在header列表中使用列名,在lines列表中使用表行。

with open('test.csv', 'r') as file:

header = []
lines = []

for i, line in enumerate(file):

if i == 0:
header = line.split()
continue

if i % 2 == 0:
continue

lines.append(line.split())

处理数据时,可以轻松创建pd.DataFrame,并清理数据。

df = pd.DataFrame(lines, columns=header)
df['Value'] = df.Value.str.strip(',').astype(float)

结果如下:

>> df.head()
Latitude Longitude   Value level
0    9.000   321.990  6.6732    20
1    9.000   321.990  6.8282    30
2    9.000   321.990  6.9814    50
3    9.000   321.990  7.0756    70
4    9.000   321.990  7.1705   100

编辑

如果你想重用上面的代码,你可以把它包装起来并创建一个函数。

def read_data(csv_file_path: str) -> pd.DataFrame:
"""Read data, process them and return data frame."""
with open(csv_file_path, 'r') as file:

header = []
lines = []

for i, line in enumerate(file):

if i == 0:
header = line.split()
continue

if i % 2 == 0:
continue

lines.append(line.split())
df = pd.DataFrame(lines, columns=header)
df['Value'] = df.Value.str.strip(',').astype(float)
return df

那么函数可以这样使用。

>> df = read_data('/path/to/test.csv')
>> df.head()
Latitude Longitude   Value level
0    9.000   321.990  6.6732    20
1    9.000   321.990  6.8282    30
2    9.000   321.990  6.9814    50
3    9.000   321.990  7.0756    70
4    9.000   321.990  7.1705   100

相关内容

  • 没有找到相关文章

最新更新