我希望将数据框架列中的每个元素减去该列中的特定数字。我目前正在通过将每列转换为numpy数组来实现这一点,这并不理想。
作为一个例子,
data = [[1, 10], [2, 20], [3, 30],[4, 40],[5, 50]]
# Existing dataframe
df = pd.DataFrame(data, columns=['column1', 'column2'])
a = np.array([2,4]) # this is an array for the index of elements. 2 is for column 1, 4 is for column 2.
# In column 1 with index=2, find the element, and subtract that from all the elements in column 1.
#Similarly with column 2, with index = 4, find the element, and subtract that from all the elements in column 2
# Required Output dataframe
data2 = [[-2, -40], [-1, -30], [0, -20],[1, -10],[2, 0]]
df2 = pd.DataFrame(data2, columns=['column1', 'column2'])
输出Existing data frame:
column1 column2
0 1 10
1 2 20
2 3 30
3 4 40
4 5 50
Required Output data fram
column1 column2
0 -2 -40
1 -1 -30
2 0 -20
3 1 -10
4 2 0
我们可以使用numpy索引从DataFrame中选择值,通过转换DataFrame.to_numpy
,然后减去:
output = df - df.to_numpy()[a, np.arange(df.columns.size)]
或与DataFrame.sub
:
output = df.sub(df.to_numpy()[a, np.arange(df.columns.size)], axis='columns')
output
:
column1 column2
0 -2 -40
1 -1 -30
2 0 -20
3 1 -10
4 2 0
使用a
中的行索引选择值:
a = np.array([2, 4])
# [2, 4]
使用np.arange
和Index.size
为列的长度创建一个RangeIndex:
col_index = np.arange(df.columns.size)
# [0 1]
这些索引可以一起用于从DataFrame中选择值:
df.to_numpy()[a, col_index]
# [ 3 50]