如何做数学运算{例如.diff()}在DataFrame的多个列中,并将结果作为新列保存在同一个DataFrame中.&



假设我有一个这样的数据帧:

df=
p1  v1  p2  v2  p3  v3  p4  v4  p5  v5  p6  v6
0   3   6   5   8   4   4   8   4   9   6   0   0
1   5   0   5   9   0   8   8   5   5   2   2   9
2   6   9   8   6   9   9   9   2   8   4   2   6
3   4   1   8   0   5   9   0   2   1   2   4   8
4   1   4   8   1   3   1   4   9   6   2   6   7
5   5   4   6   5   5   2   3   0   5   5   6   4
6   4   4   9   0   2   1   7   0   1   0   8   8
7   9   1   7   3   5   4   4   4   8   9   3   8
8   1   5   0   5   4   3   6   5   2   3   1   4
9   9   1   7   6   5   3   6   8   8   4   7   5
10  1   6   5   8   2   5   1   5   3   4   5   8
11  8   7   6   6   9   3   5   5   9   7   6   7

pv是否对不同的样品(如1、2、3等)测量了某些参数?现在我要将"p_">的所有列相乘"v_">的所有列上使用diff()

我想使用相应的样本名称和数学运算的第一个字母将结果保存在相同的DataFrame中,如Dv1,Dv2用于df. diff('v1'), df.diff('v2')输出等。类似地,对于p的列,它会像Mp1.

对于每一列,我可以手动执行操作并保存结果,但这很繁琐(因为样本数量非常高),所以我想使用for循环等条件将其自动化。

是否建议在pandas的DataFrame的多个列中执行数学运算(减法、乘法或除法),并使用列名和数学运算名的组合将结果保存在同一个DataFrame中作为新列。

新的DataFrame应该是这样的p1 Mp1 v1 DV1 p2 Mp2 v2 Dv2 p3 Mp3 v3 Dv3.......

试试这个:

# Find all columns that starts with p and followed by a number
p = df.columns[df.columns.str.match('pd')]
# Find all columns that starts with v and followed by a number
v = df.columns[df.columns.str.match('vd')]
# Multiply the p columns by 2
mp = df[p].mul(2).add_prefix('M')
# Take a diff of the v columns
dv = df[v].diff().add_prefix('D')
# The display order of the columns
cols = [f'{j}{i}' for i in range(1,7) for j in ['p', 'Mp', 'v', 'Dv']]
# The final result
final = pd.concat([df, mp, dv], axis=1)[cols]

类似于

import pandas as pd
import io
str_data = """
p1,v1,p2,v2,p3,v3,p4,v4,p5,v5,p6,v6
3,6,5,8,4,4,8,4,9,6,0,0
5,0,5,9,0,8,8,5,5,2,2,9
6,9,8,6,9,9,9,2,8,4,2,6
4,1,8,0,5,9,0,2,1,2,4,8
1,4,8,1,3,1,4,9,6,2,6,7
5,4,6,5,5,2,3,0,5,5,6,4
4,4,9,0,2,1,7,0,1,0,8,8
9,1,7,3,5,4,4,4,8,9,3,8
1,5,0,5,4,3,6,5,2,3,1,4
9,1,7,6,5,3,6,8,8,4,7,5
1,6,5,8,2,5,1,5,3,4,5,8
8,7,6,6,9,3,5,5,9,7,6,7
"""
df = pd.read_csv(io.StringIO(str_data))

#Doing this in case you have a pN, but not a vN, or vice versa to avoid errors
p_samples = [int(c[1:]) for c in df.columns if c.startswith('p')]
v_samples = [int(c[1:]) for c in df.columns if c.startswith('v')]
samples = set(p_samples).intersection(v_samples)
samples = sorted(list(samples))
data = {}
mult_num = 7 #not sure what you want to multiply by
for sample in samples:
p_col = 'p{}'.format(sample)
v_col = 'v{}'.format(sample)

Mp_col = 'Mp{}'.format(sample)
Dv_col = 'Dv{}'.format(sample)

data[p_col] = df[p_col]
data[Mp_col] = mult_num*df[p_col]
data[v_col] = df[v_col]
data[Dv_col] = df[v_col].diff()


new_df = pd.DataFrame(data)
print(new_df)

相关内容

最新更新