按起始值和结束值对pandas数据帧列进行切片



例如,我有一个看起来像这样的数据帧:

0       -- end
1       QQQQ
2       GEO
3       DEF
4       ABC
5       -- start
6       -- end
7       apple
8.      -- start

是否可以按"--end"&"--"对列进行动态切片启动"。意思是,我想独立地处理--start和--end之间的数据。

start_end = df[df.col.str.contains('-- end')+1:df.col.str.contains('-- start')]

但没用,也许这在熊猫身上是不可能的,但我很乐意投入。

谢谢大家。

您可以尝试以下操作:

data = {'column': {0: '-- end',
1: 'QQQQ',
2: 'GEO',
3: 'DEF',
4: 'ABC',
5: '-- start',
6: '-- end',
7: 'apple',
8: '-- start'}}
df = pd.DataFrame(data)
exclude_lst = ['-- start','-- end']
# get False for members of exclude_lst, True for the rest
bools = ~df.column.isin(['-- start','-- end'])
# get sequences: [1, 2, 2, 2, 2, 3, 3, 4, 5]
sequences = (bools != bools.shift()).cumsum()
# keep only sequences where bools == True (so, only 2 and 4)
groups = df[bools].groupby([sequences])
# now you can loop through each slice, and perform some operation on them
for gr in groups:
print(gr)

# or put them in a list and go from there:
gr_lst = list(groups)
print(gr_lst[0])
(2,   column
1   QQQQ
2    GEO
3    DEF
4    ABC)
# so, we end up with tuples. Here gr_lst[0][0] == 2, a ref to first slice as [2, 2, 2, 2]
# use gr_lst[i][1] to access an actual slice, e.g.:
print(gr_lst[1][1])
column
7  apple

最新更新