获取包含键:值对字符串的列的索引



我有一个数据帧df:

Name   Student_info                                                          School
Rajat  {"FirstName":"Rajat", "LastName":"Sinha", "birthDate":"1999-05-01"}   XYZ
Vivek  {"FirstName":"Vivek", "LastName":"Vishwa", "birthDate":"1999-07-09"}  ABC
Ram    {"FirstName":"", "LastName":"Ram", "birthDate":"1999-05-09"}          ABC
John   {"FirstName":"", "LastName":"Mac", "birthDate":"1999-08-03"}          ABC

我想得到FirstNameLastName的索引(键和相应值的索引(。我怎么能拿到这个?例如:CCD_ 4在CCD_ 5列内的索引。

我试过了:

index = [i for i in df.columns if isinstance(data[i][0], dict)]

但这会使索引为null(因为它不是字典,而是"key":"value"对形式的值字符串(

我想检查"FirstName":""是否为空:

ValueToCheck = ""
ValuesInDataframe = ?  # get the index of empty string of key FirstName
if (ValueToCheck == ValuesInDataframe):
return true

获取索引以比较字符串的最佳方法是什么?

您的问题仍然有点不清楚。这能解决你的问题吗:

如果Student_info列包含字符串而不是字典,则首先执行以下操作:

df.Student_info = df.Student_info.map(eval)

现在,您可以使用.str.get为满足要求的行提取索引:

indices = df[df.Student_info.str.get('FirstName').eq('')].index

结果:

Int64Index([2, 3], dtype='int64')

所以

df.loc[indices]

结果在:

Name                                       Student_info School
2   Ram  {'FirstName': '', 'LastName': 'Ram', 'birthDat...'   ABC
3  John  {'FirstName': '', 'LastName': 'Mac', 'birthDat...'   ABC

我假设您的df是从csv(或类似的(文件中读取的,大致如下:

data = StringIO('''
Name     Student_info                                                       School
Rajat    {"FirstName":"Rajat","LastName":"Sinha","birthDate":"1999-05-01"}     XYZ
Vivek    {"FirstName":"Vivek","LastName":"Vishwa","birthDate":"1999-07-09"}    ABC
Ram      {"FirstName":"","LastName":"Ram","birthDate":"1999-05-09"}            ABC
John     {"FirstName":"","LastName":"Mac","birthDate":"1999-08-03"}            ABC
''')
df = pd.read_csv(data, sep = 's+')

在这种情况下,"Student_info"具有"看起来像"dicts的字符串。

我们可以通过pplyeval将它们转换为实际的dict,然后扩展为单独的列:

df_si = df['Student_info'].apply(eval).apply(pd.Series)

df_si看起来像这样:


FirstName   LastName    birthDate
0   Rajat       Sinha       1999-05-01
1   Vivek       Vishwa      1999-07-09
2               Ram         1999-05-09
3               Mac         1999-08-03

您可以使用join:将其与df的其余部分结合起来

df = df[['Name', 'School']].join(df_si)

现在df看起来是这样的:

Name    School    FirstName    LastName    birthDate
--  ------  --------  -----------  ----------  -----------
0  Rajat   XYZ       Rajat        Sinha       1999-05-01
1  Vivek   ABC       Vivek        Vishwa      1999-07-09
2  Ram     ABC                    Ram         1999-05-09
3  John    ABC                    Mac         1999-08-03

您现在可以提取Firstname为空字符串的行:

df[df['FirstName']=='']

输出:

Name    School    FirstName    LastName    birthDate
--  ------  --------  -----------  ----------  -----------
2  Ram     ABC                    Ram         1999-05-09
3  John    ABC                    Mac         1999-08-03

相关内容

  • 没有找到相关文章

最新更新