这里是Python初学者。我有一个Panda Dataframe(没有更好的术语),我想将其转换为第二列中每个项目的数据行。我当前的数据框架看起来像这样:
SECTION VALUES
0 A1 ['RC47']
1 A2 ['AC_100''AC_101''AC_102''AC_103''AC_104''AC105''AC_106''AC_107']
2 A3 ['RT1''RT2''RT3']
3 A4 ['800''801''802''803''804''805''1000''1001']
4 B3T2 ['RC50''RC50''RC50''RC50']
5 BR40 ['RC44']
6 T1 ['941.2''943.4''945.3''960.2']
我想要的是这样的:
SECTION VALUES
A1 13_200
A1 23_10
A1 200_300
A1 200_10
A2 AC_100
A2 AC_101
A2 AC_103
A3 AC_104
etc ...
我在设置正确语法和输出数据框时遇到麻烦的代码部分如下:
#print(percentages)
list_of_keys = []
list_of_values = []
for key,val in percentages.items():
list_of_keys.append(key)
list_of_values.append(val)
#print(list_of_keys)
#print(list_of_values)
head =['SECTION','VALUES']
newdf = pd.DataFrame(list(zip(list_of_keys, list_of_values)), columns=head)
print(newdf)
out = (newdf.assign(VALUES= newdf["VALUES"]
.replace(",", "','", regex=True).apply(literal_eval))
.explode("VALUES"))#.reset_index(drop=True) uncomment to make a unique index
我收到的错误如下:
ValueError: malformed node or string: array(['RC47']
错误在行:
.replace(",", "','", regex=True).apply(literal_eval))
我在处理第二列'VALUES'的语法时遇到困难。除了撇号内的每个值之外,数据列表没有任何分隔符。我知道这个问题通常是由DataFrame.transpose()
处理的,但是撇号格式给我带来了麻烦。有什么建议吗?
其中一个选项是literal_eval
/explode
:
from ast import literal_eval
out = (df.assign(VALUES= df["VALUES"]
.replace("''", "','", regex=True).apply(literal_eval))
.explode("VALUES"))#.reset_index(drop=True) uncomment to make a unique index
输出:
print(out)
SECTION VALUES
0 A1 13_200
0 A1 23_10
0 A1 200_300
.. ... ...
3 A4 805
3 A4 1000
3 A4 1001
[23 rows x 2 columns]