我在使用python进行循环时遇到了一些问题。下面是一个单列数据帧示例。我发现的大多数panda示例都是针对一次处理整个数据帧的。或者搜索一个单词并附加到前一行。
What I am trying to do: Forgive me trying to sound it out in a logical way.
1-Start at (0,Test) in the series.
2-Check element at (0,Test) for number at first position (0). If True, then hold and (store)
pre_number_line.
3-Goto next line down.
4-Check element (1,Test) for number at first position (0). If False, then check first position for
letter.
5-If first character True for letter, concatenate current line at the end of the pre_num_line or
(0,Test) position line in this case.
6-Delete current row & shift rows up.(instead maybe change string(line) to NaN and delete all NaN at
end of code). Not sure which is easier.
7-Analyze next row down at (2,Test) repeat process starting at step 2.
8-End loop when all rows with letters (at 1st position) have been appended to the pre_num_line.
9-Next row down, should start with numbers. This will be the new pre_num_line.
列出的只是字符串的开头。尽管如此,字符串中可以包含数字和字母。每行的第一个位置总是一个数字或字母(不区分大小写(。每个带字母的行必须与上面的带编号的行组合(在末尾(。在处理结束时,只存在带编号的行将。
import pandas as pd
from pandas import DataFrame, Series
dat = {'Test': ['123456ab', 'coff-4', 'eat 8', 'bagle6', '345678-edh', 'wine', 'bread','567890 tfs',
'grape']}
df = pd.DataFrame(dat)
letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
numbers = '0123456789'
#-------------------
pre_num_lin = None
for line in df.Test:
if line[0] in numbers:
pre_num_lin = df['Test']
if line[0] in letters:
pre_num_lin = pre_num_lin + ' ' + line
#------------------
print(df)
What it should look like at end:
Test
0 123456ab coff-4 eat 8 bagle6
1 345678-edh wine 4 bread
2 567890 tfs grape
我感谢大家的时间和知识。如果你有任何问题,请告诉我。
试试这个:
df.groupby(df['Test'].str[0].str.isnumeric().cumsum())['Test'].agg(' '.join)
输出:
Test
1 123456ab coff-4 eat 8 bagle6
2 345678-edh wine bread
3 567890 tfs grape
Name: Test, dtype: object
详细信息:
使用字符串访问器和零的索引器来获得等于df['Test'].str.get(0)
的第一个字母df['Test'].str[0]
(只是键入较少(
接下来,使用带有isnumeric
方法的字符串访问器来检查该字符是否为数字。这将返回一个布尔序列。
df['Test'].str[0].str.isnumeric()
0 True
1 False
2 False
3 False
4 True
5 False
6 False
7 True
8 False
Name: Test, dtype: bool
现在,我们可以使用cumsum
创建这样的行分组:
df['Test'].str[0].str.isnumeric().cumsum()
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 3
8 3
Name: Test, dtype: int32
最后,我们可以使用生成分组的序列来分组,并应用字符串join
:的聚合
df.groupby(df['Test'].str[0].str.isnumeric().cumsum())['Test'].agg(' '.join)
Test
1 123456ab coff-4 eat 8 bagle6
2 345678-edh wine bread
3 567890 tfs grape
Name: Test, dtype: object