Pandas:将通过 API 调用检索到的 None 类型数组从列表转换为字符串(以启用数据透视表)



我查询内部数据库,检索数据,并创建一个类似于以下内容的 Pandas 数据帧df

import pandas as pd
df = pd.DataFrame({'issue_key':['MED-187', 'MED-188', 'MED-190', 'MED-191'],
'creator': ['Smith, J', 'Williams, S', 'Wilson, T', 'Smith, J'],
'manufacturer': ['Mercedes', 'Audi', 'Mercedes', 'BMW'],
'department': [['Sales'], ['Finance'], ['Operations'], ['Sales']],
'expense': [29181, 12809, 837, 2817]})
issue_key    creator        manufacturer      department        expense
MED-187      Smith, J       Mercedes          [Sales]           29181
MED-188      Williams, S    Audi              [Finance]         12809
MED-190      Wilson, T      Mercedes          [Operations]      837
MED-191      Smith, J       BMW               [Sales]           2817

然后,我尝试使用以下方法透视数据:

import numpy as np
pivoted_data = pd.pivot_table(data=df, index='department', values='expense', aggfunc=np.mean)

但是,我收到此错误:

TypeError: unhashable type: 'list'

因此,我使用以下方法将部门列从我认为是列表(稍后会详细介绍)转换为字符串:

df['new_department'] = [','.join(map(str, l)) for l in df['department']]

正如预期的那样,这会导致:

df = pd.DataFrame({'issue_key':['MED-187', 'MED-188', 'MED-190', 'MED-191'],
'creator': ['Smith, J', 'Williams, S', 'Wilson, T', 'Smith, J'],
'manufacturer': ['Mercedes', 'Audi', 'Mercedes', 'BMW'],
'department': [['Sales'], ['Finance'], ['Operations'], ['Sales']],
'expense': [29181, 12809, 837, 2817],
'new_department': ['Sales', 'Finance', 'Operations', 'Sales']
})
issue_key    creator        manufacturer      department        expense     new_department
MED-187      Smith, J       Mercedes          [Sales]           29181       Sales
MED-188      Williams, S    Audi              [Finance]         12809       Finance
MED-190      Wilson, T      Mercedes          [Operations]      837         Operations
MED-191      Smith, J       BMW               [Sales]           2817        Sales

而且,现在我能够成功地透视数据。

但是,在生产数据上,我收到以下错误:

df['new_department'] = [','.join(map(str, l)) for l in df['department']]
TypeError: 'NoneType' object is not iterable

在调查如何从 API 返回数据时,我调用了以下内容:

`df['activity'].unique()`

这导致:

array(["[<JIRA CustomFieldOption:  value='Sales', id='174'>]",
"[<JIRA CustomFieldOption:  value='Finance', id='179'>]",
"[<JIRA CustomFieldOption:  value='Operations', id='102'>]",
'None'], dtype=object)

从 API 返回的内容显示None,因此错误消息完全有意义。

知道如何克服这一点吗? 是否可以以某种方式转换None类型以循环访问列表? 或者,也许直接访问数组中的值?

我已经调查了几个小时,但无法解决。

提前感谢您的帮助!

如果问题是"部门"列中的流氓None值,那么您可以在列表推导式中使用条件语句来处理它:

df['new_department'] = [','.join(map(str, l)) if l is not None else 'NA' for l in df['department']]

或更一般地说:

unwanted_values = [None,np.NaN]
df['new_department'] = ['NA' if l in unwanted_values else ','.join(map(str, l)) for l in df['department']]

最新更新