下面是我试图定义但不成功的函数。我可以用最简单的python内置函数来完成这个。
def clean_data(data: List[List[str]]) -> None:
"""Modify data so that the applicable string values are converted to their
appropriate type.
The indexes of the string values that are converted, and their
corresponding type, are:
- COLUMN_RIDING is an int
- COLUMN_VOTER is an int
- COLUMN_RANK is a list of strings
- COLUMN_RANGE is a list of integers
- COLUMN_APPROVAL is a list of booleans
>>> row = ['0', '1', 'NDP;Liberal;Green;CPC', '1;4;2;3', 'NO;YES;NO;NO']
>>> clean_data([row])
>>> row == SAMPLE_DATA_1[0]
True
>>> row = ['117', '12', 'Liberal;CPC;NDP;Green', '4;0;5;0', 'YES;NO;YES;NO']
>>> clean_data([row])
>>> row == [117, 12, ['CPC', 'GREEN', 'LIBERAL', 'NDP'], [4, 0, 5, 0], [True, False, True, False]
True
"""
clean = []
for i in range(len(data)):
if COLUMN_RIDING[i] == str:
clean.append(
return clean
根据您的示例:
>>> row = ['117', '12', 'Liberal;CPC;NDP;Green', '4;0;5;0', 'YES;NO;YES;NO']
>>> clean_data([row])
>>> row == [117, 12, ['CPC', 'GREEN', 'LIBERAL', 'NDP'], [4, 0, 5, 0], [True, False, True, False]
循环遍历数据的想法是正确的。
看起来COLUMN_RANK、COLUMN_RANGE和COLUMN_APPROVAL是由";"分隔的列表。因此,如果我们找到一个";"在string对象中,我们应该拆分字符串并遍历它。当我们迭代时,我们应该检测它是否为整数,YES/NO表示布尔值,否则它是字符串。
如果COLUMN_RIDING或COLUMN_VOTER,只需将其作为整数添加
def clean_data(data):
clean = []
for column in data:
if ";" in column:
tmp = []
for value in column.split(";"):
if value.isdigit(): #'4;0;5;0'
tmp.append(int(value))
elif value == "YES":
tmp.append(bool(value))
elif value == "NO":
tmp.append(False)
else: #['CPC', 'GREEN', 'LIBERAL', 'NDP']
tmp.append(value.upper())
clean.append(tmp)
else:
clean.append(int(column))
return clean