如何使索引Dataframe中的每一行只有一个对象



我想创建一个带有4个变量的海运箱形图("图纸","充分加热太昂贵","供暖系统不足","劣质建筑材料"),并将温度放在y轴上。问题是,许多人对每个意见调查了不止一个选项。我想知道我应该如何在每一行分开的选项,同时仍然保持所有的数据。以下是一些数据:

CausesCold                                                               
Draughts                                                             15.0
Draughts                                                             19.0
Heating it sufficiently is too expensive                              0.0
Draughts                                                             10.0
Draughts                                                             15.0
Draughts                                                             20.0
Heating it sufficiently is too expensive,Heatin...                    5.0
Heating it sufficiently is too expensive,Heatin...                   18.0
Heating system in inadequate,Draughts                                15.0
Heating system in inadequate,Poor building fabric                    15.0
Heating it sufficiently is too expensive,Heatin...                   21.0
Heating system in inadequate                                         21.0
Heating system in inadequate                                         21.0
Heating it sufficiently is too expensive                             10.0
Draughts                                                              0.0
Heating it sufficiently is too expensive,Poor b...                   18.0
Heating system in inadequate                                         18.0
Poor building fabric,Draughts                                        19.0
Heating system in inadequate,Poor building fabr...                   19.0
Heating system in inadequate                                         18.0
Heating system in inadequate                                         17.0
Heating it sufficiently is too expensive,Poor b...                   18.0
Heating it sufficiently is too expensive,Heatin...                   15.0
Heating it sufficiently is too expensive,Heatin...                   15.0
Heating it sufficiently is too expensive,Poor b...                   20.0
Heating it sufficiently is too expensive                             17.0
Heating it sufficiently is too expensive                             17.0
Heating system in inadequate                                          0.0
Heating it sufficiently is too expensive                             10.0
Heating it sufficiently is too expensive,Heatin...                    0.0

我希望它是这样的:

CurrentThermostatTemp
CausesCold                                 
Poor building fabric                   20.0
Poor building fabric                   17.0
Poor building fabric                   20.0
Poor building fabric                   19.0
Poor building fabric                   20.0
Poor building fabric                   17.0
Poor building fabric                   18.0
Poor building fabric                   22.0
Poor building fabric                   25.0
Poor building fabric                   20.0
Poor building fabric                   15.0
Poor building fabric                   19.0
Poor building fabric                   20.0
Poor building fabric                   20.0
Poor building fabric                   20.0
Poor building fabric                   21.0
Poor building fabric                   19.0
Poor building fabric                   20.0
Poor building fabric                   18.0
Poor building fabric                   20.0
Poor building fabric                   17.0
Poor building fabric                   25.0
Poor building fabric                   18.0
Poor building fabric                   20.0
Poor building fabric                   16.0
Poor building fabric                   15.0
Poor building fabric                   21.0
Poor building fabric                   25.0
Poor building fabric                   23.0
Poor building fabric                   30.0
...                                     ...
Draughts                               20.0
Draughts                               20.0
Draughts                               17.0
Draughts                               16.0
Draughts                               25.0
Draughts                               21.0
Draughts                               21.0
Draughts                               18.0
Draughts                               20.0
Draughts                               20.0
Draughts                               18.0

我不清楚这里的数据是如何格式化的。温控器的读数已经在它自己的一列了吗?

在任何情况下,您都可能希望使用pandas.Series.str.split

之类的
temp = data['CausesCold'].str.split(',', n = 1, expand = True) 

这将创建一个包含两个编号列的新数据帧。

如果我假设恒温器的值已经在一个单独的列中关闭,那么我将合并到这个"temp"数据框恒温器的值。比如:

temp['thermostat']=df['thermostat']

你的temp df看起来像这样:

|********************************|
|0         |1        |thermostat |
|Reason 1. |Reason 2 |Number     |
|Reason 1. |Reason 2 |Number     |
|Reason 1. |null     |Number     |
|********************************|

您希望0和1列与相应的恒温器值堆叠。

分割df

df=temp[['0','thermostat']]
df1=temp[['1','thermostat']]

,然后追加它们。也可能有些人只有一个答案(即列'1'为null'),所以继续处理它。

df=df.append(df1.dropna(subset=['1']))

如果您处于具有原始数据源的不幸位置,其中原因和恒温器代码都在相同的单个字符串中,我可能会作为第一步对该字符串中的任何数字进行正则表达式提取,并将其定义为一个名为"恒温器"的新列或类似的东西。

无论如何,这应该会让你朝着正确的方向前进。这未必是到达目的地最有效的方式,但它会让你到达目的地。

相关内容

最新更新