我一直在Open Refine上清理一个表。现在是这样的:
REF Handle Size Price
2002, 2003 t-shirt1 M, L 23
3001, 3002, 3003 t-shirt2 S, M, L 24
我需要在REF和Size中拆分这些多值单元格,以便我得到:
REF Handle Size Price
2002 t-shirt1 M 23
2003 t-shirt1 L 23
3001 t-shirt2 S 24
3002 t-shirt2 M 24
3003 t-shirt2 L 24
有可能在Open Refine中做到这一点吗?"分割多值单元格…"命令只处理一列。谢谢你!安娜丽塔
是有可能的:
- 使用","作为分隔符分割第一列。
- 将第二列移动到位置1
- 将项目显示为记录(而不是行)
- 使用","作为分隔符分割列3
- 填充第4列和第2列
- 重新排序的列
这是我的GREL食谱:
[
{
"op": "core/row-removal",
"description": "Remove rows",
"engineConfig": {
"facets": [
{
"invert": false,
"expression": "row.starred",
"selectError": false,
"omitError": false,
"selectBlank": false,
"name": "Starred Rows",
"omitBlank": false,
"columnName": "",
"type": "list",
"selection": [
{
"v": {
"v": true,
"l": "true"
}
}
]
}
],
"mode": "row-based"
}
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 1",
"columnName": "Column 1",
"keyColumnName": "Column 1",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/column-move",
"description": "Move column Column 2 to position 0",
"columnName": "Column 2",
"index": 0
},
{
"op": "core/multivalued-cell-split",
"description": "Split multi-valued cells in column Column 3",
"columnName": "Column 3",
"keyColumnName": "Column 2",
"separator": ", ",
"mode": "plain"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 4",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 4"
},
{
"op": "core/fill-down",
"description": "Fill down cells in column Column 2",
"engineConfig": {
"facets": [],
"mode": "record-based"
},
"columnName": "Column 2"
},
{
"op": "core/column-reorder",
"description": "Reorder columns",
"columnNames": [
"Column 1",
"Column 2",
"Column 3",
"Column 4"
]
}
]
Herve
刚刚找到一个不错的,免费的OpenRefine插件,它提供了"Unpaired pivot"VIB-Bits插件
从他们的文档:
3.2.1 Unpaired pivot…未配对枢轴是将按行组织的数据转换为该数据的表示形式单独列中的数据。一个简单的例子是转换
<表类>
类别 价值
tbody> <<tr> 1道明> tr> 2 b 3 c 2 tbody> 表类>