我有一个数据帧df
:
Name Student_info School
Rajat {"FirstName":"Rajat", "LastName":"Sinha", "birthDate":"1999-05-01"} XYZ
Vivek {"FirstName":"Vivek", "LastName":"Vishwa", "birthDate":"1999-07-09"} ABC
Ram {"FirstName":"", "LastName":"Ram", "birthDate":"1999-05-09"} ABC
John {"FirstName":"", "LastName":"Mac", "birthDate":"1999-08-03"} ABC
我想得到FirstName
和LastName
的索引(键和相应值的索引(。我怎么能拿到这个?例如:CCD_ 4在CCD_ 5列内的索引。
我试过了:
index = [i for i in df.columns if isinstance(data[i][0], dict)]
但这会使索引为null(因为它不是字典,而是"key":"value"
对形式的值字符串(
我想检查"FirstName":""
是否为空:
ValueToCheck = ""
ValuesInDataframe = ? # get the index of empty string of key FirstName
if (ValueToCheck == ValuesInDataframe):
return true
获取索引以比较字符串的最佳方法是什么?
您的问题仍然有点不清楚。这能解决你的问题吗:
如果Student_info
列包含字符串而不是字典,则首先执行以下操作:
df.Student_info = df.Student_info.map(eval)
现在,您可以使用.str.get
为满足要求的行提取索引:
indices = df[df.Student_info.str.get('FirstName').eq('')].index
结果:
Int64Index([2, 3], dtype='int64')
所以
df.loc[indices]
结果在:
Name Student_info School
2 Ram {'FirstName': '', 'LastName': 'Ram', 'birthDat...' ABC
3 John {'FirstName': '', 'LastName': 'Mac', 'birthDat...' ABC
我假设您的df是从csv(或类似的(文件中读取的,大致如下:
data = StringIO('''
Name Student_info School
Rajat {"FirstName":"Rajat","LastName":"Sinha","birthDate":"1999-05-01"} XYZ
Vivek {"FirstName":"Vivek","LastName":"Vishwa","birthDate":"1999-07-09"} ABC
Ram {"FirstName":"","LastName":"Ram","birthDate":"1999-05-09"} ABC
John {"FirstName":"","LastName":"Mac","birthDate":"1999-08-03"} ABC
''')
df = pd.read_csv(data, sep = 's+')
在这种情况下,"Student_info"具有"看起来像"dicts的字符串。
我们可以通过pply
和eval
将它们转换为实际的dict,然后扩展为单独的列:
df_si = df['Student_info'].apply(eval).apply(pd.Series)
df_si
看起来像这样:
FirstName LastName birthDate
0 Rajat Sinha 1999-05-01
1 Vivek Vishwa 1999-07-09
2 Ram 1999-05-09
3 Mac 1999-08-03
您可以使用join
:将其与df的其余部分结合起来
df = df[['Name', 'School']].join(df_si)
现在df
看起来是这样的:
Name School FirstName LastName birthDate
-- ------ -------- ----------- ---------- -----------
0 Rajat XYZ Rajat Sinha 1999-05-01
1 Vivek ABC Vivek Vishwa 1999-07-09
2 Ram ABC Ram 1999-05-09
3 John ABC Mac 1999-08-03
您现在可以提取Firstname为空字符串的行:
df[df['FirstName']=='']
输出:
Name School FirstName LastName birthDate
-- ------ -------- ----------- ---------- -----------
2 Ram ABC Ram 1999-05-09
3 John ABC Mac 1999-08-03