我在嵌套字典中有一长串键(主键(。在其中一个子键中,我想创建一个其值的列表。这是我嵌套字典中的一条记录。它们的结构都是相似的。
{'C4QY10_e':
{'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily", 'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': "Fatty acid synthase type I, Fatty acid synthase subunit alpha, 4'-phosphopantetheinyl transferase domain",
'nunique': [1]
}
我想转换到列表中的子键值是"interpre-description"。我希望它除以","。所以列表的[0]值将是"0";I型脂肪酸合成酶";和[1]";脂肪酸合成酶亚单位α";。这些值将保持输入顺序,这一点非常重要。
使用split()
:
yourdict = {'C4QY10_e':
{'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily", 'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': "Fatty acid synthase type I, Fatty acid synthase subunit alpha, 4'-phosphopantetheinyl transferase domain",
'nunique': [1]
}}
yourdict['C4QY10_e']['interpro description'] = yourdict['C4QY10_e']['interpro description'].split(', ')
print(yourdict)
{'C4QY10_e': {'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily",
'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': ['Fatty acid synthase type I',
'Fatty acid synthase subunit alpha',
"4'-phosphopantetheinyl transferase domain"],
'nunique': [1]}}
这里有另一个解决方案:
def to_list(_dict: dict, _key: str = 'interpro description') -> dict:
"""Convert a string to a list of strings.
Parameters
----------
_dict : dict
A dictionary to convert `_key` into a list.
_key : str
A key in the dictionary.
Returns
-------
dict
The original dictionary, with `_key` modified into a list of strings.
Notes
-----
Function accepts dictionaries with multiple levels.
"""
for key, value in _dict.items():
if isinstance(value, dict):
_dict[key] = to_list(value, _key)
if key == _key and isinstance(value, str):
_dict[key] = list(
map(
lambda value: value.lstrip(" "),
value.split(',')
)
)
return _dict
# == Example ==========
my_dict = {
'C4QY10_e': {
'protein accession': 'C4QY10_e',
'sequence length': [1879],
'analysis': 'Pfam',
'signature accession': 'PF18314, PF02801, PF18325, PF00109, PF01648',
'signature description': "Fatty acid synthase type I helical domain, Beta-ketoacyl synthase, Fatty acid synthase subunit alpha Acyl carrier domain, 4'-phosphopantetheinyl transferase superfamily",
'start location': [328, 139, 1761],
'stop location': [528, 300, 1861],
'e-value': [4.7e-73, 1.3e-72, 1.4e-18],
'interpro accession': 'IPR041550, IPR040899, IPR008278',
'interpro description': "Fatty acid synthase type I, Fatty acid synthase subunit alpha, 4'-phosphopantetheinyl transferase domain",
'nunique': [1],
}
}
_my_dict = to_list(my_dict)
print(_my_dict['C4QY10_e']['interpro description'])
# Prints:
# ['Fatty acid synthase type I', 'Fatty acid synthase subunit alpha', "4'-phosphopantetheinyl transferase domain"]