根据行名有条件地减去Pandas Dataframe行

我正在处理一个大型数据集，但这个问题可以用下面的小数据集来总结:

import pandas as pd
df = pd.DataFrame({"Filename":["fileName1_uniqueTag1", "fileName2_uniqueTag1", "fileName3_uniqueTag1", "fileName1_uniqueTag2", "fileName2_uniqueTag2", "fileName3_uniqueTag2"], 
"measurement":[1336.564888, 1090.852579, 990.320323, 1202.522612, 1098.045258, 923.600277],})
print(df)
>>>
Filename  measurement
0  fileName1_uniqueTag1  1336.564888
1  fileName2_uniqueTag1  1090.852579
2  fileName3_uniqueTag1   990.320323
3  fileName1_uniqueTag2  1202.522612
4  fileName2_uniqueTag2  1098.045258
5  fileName3_uniqueTag2   923.600277

有三个不同的文件名，每个文件名在"文件名"中有两个唯一的标签。列。目标是计算每个文件的uniqueTag1/uniqueTag2测量值的比率。结果应该看起来像这样:

Filename  uniqueTag2/uniqueTag1
0  fileName1               0.899711
1  fileName2               1.006593
2  fileName3               0.932627

我可以列出三个不同的文件名和两个不同的标签:

nameList = df["Filename"].tolist()
fileNames = []                              #empty list to fill with different base file names
uniqueTags = []                             #empty list to fill with unique tags
for name in nameList:                       #iterate through list of full file names
subStrings = name.split("_")            #splits each base file name at the underscore
if subStrings[0] not in fileNames:      #if the base file name isn't already in the file names list...
fileNames.append(subStrings[0])     #append it
if subStrings[1] not in uniqueTags:     #if the unique tag isn't already in the unique tags list...
uniqueTags.append(subStrings[1])    #append it

我认为我可以通过将文件名变成索引并使用df.at()来访问单个测量值，但这看起来非常混乱，我确信一定有更好的方法使用Pandas中的功能来实现这一点。有什么建议吗?

您可以使用str.split():

这样做，这非常简单。

df[['Filename','uniquetag']] = df['Filename'].str.split('_', expand=True)
tag1 = df.loc[df['uniquetag'] == 'uniqueTag1'].set_index('Filename')['measurement']
tag2 = df.loc[df['uniquetag'] == 'uniqueTag2'].set_index('Filename')['measurement']
tag2 / tag1

Try

df[['one','two']] = df.filename.str.split("_",expand=True)

然后groupby使用这两列

相关内容

最新更新

热门标签：