我有以下数组:"['book', 'read']" "['cup', 'drink']"
等,我想将其转换为允许我应用MultiLabelBinarizer的列表。
目前,它要么给我单个字符,要么只输出 0。
Y = train_labels.iloc[:, 0].values
values = np.array(Y)
mlb = MultiLabelBinarizer(classes=("drink","cup","book", "read"))
output = mlb.fit_transform(values)
print(output)
预期成果:
[0 0 1 1]
[1 1 0 0]
实际结果:
[0 0 0 0]
[0 0 0 0]
我怀疑您需要注意输入的正确格式以MultiLabelBinarizer
.
y : 可迭代的可迭代对象
每个示例的一组标签(任何可排序和可哈希的对象(。
证明:
txt = [['book', 'read'],['cup', 'drink']]
mlb = MultiLabelBinarizer(classes=("drink","cup","book", "read"))
mlb.fit_transform(txt)
array([[0, 0, 1, 1],
[1, 1, 0, 0]])
请告诉我们这是否解决了您的问题。
关于数据格式的说明
如果您坚持认为您的数据位于您在帖子中指定的数组中:
arr = ["['book', 'read']","['cup', 'drink']"]
以下代码片段会将其转换为正确的格式:
import re
[["".join(re.findall("w",f)) for f in lst] for lst in [s.split(",") for s in arr]]
[['book', 'read'], ['cup', 'drink']]
另一种方法:
您可以使用ast.literal_eval()
转换数组
arr = ["['book', 'read']","['cup', 'drink']","['book', 'read']","['book', 'read']"]
import ast
X = [ast.literal_eval(i) for i in arr]
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=("drink","cup","book", "read"))
output = mlb.fit_transform(X)
print(output)
输出:
[[0 0 1 1]
[1 1 0 0]
[0 0 1 1]
[0 0 1 1]]