从表中删除值为[Accumulo]的行

我有一个表，目前设置如下:

rowId : colFam: colQual -> value
in001 : user : name -> erp
in001 : user : age -> 23
in001 : group : name -> employee
in001 : group : name -> developer

我似乎想不出一个方法来删除一个组条目，或者为此改变它。假设我想删除与雇员的行，因为我现在是一个经理。添加是显而易见的，但我似乎无法弄清楚如何访问employee，因为两组具有相同的colFam和colQual。

我知道mutation.putDelete(colFam,colQual)，但这并不适用于这里，因为它的结果将删除两者。或者扫描每一行，得到键值对，比如

for(Entry<Key,Value> e: scanner){
    e.getValue().toString() // atleast I can access it here
}

但即使这样，怎么知道要删除什么呢?这只是我设计桌子的一个缺陷吗?

虽然Accumulo的Key-Value模式允许您这样做，但正如您发现的那样，它存在问题。该值的原始意图是它可以随着时间的推移而改变，该值的版本由密钥的时间戳部分唯一标识(假设密钥的所有其他部分都是相同的)。通过关闭VersioningIterator，您可以保留Key值的历史记录。

解决这个问题最常见的方法是使用一些序列化的数据结构将所有"组名"存储在一个值中。一个简单的方法是CSV"雇员，开发人员"，然后你的更新将是"雇员，开发人员，经理"。您可以使用Hadoop Writable, Google Protocol Buffers或Apache Thrift(或许多其他工具)等工具来获得更紧凑的表示，更容易的编程访问和向后兼容性。

可以直接删除

in001 : group : name -> employee

通过使用:compact和一个自定义过滤器来精确地从压缩中排除该值。(没有经过测试，但应该有效。)用途:

IteratorSetting config = new IteratorSetting(10, "excludeTermFilter", ExcludeTermFilter.class);
config.setTermToExclude("group","name","employee");
List<IteratorSetting> filterList = new ArrayList<IteratorSetting>();
filterList.add(config);
connector.tableOperations().compact(tableName, startRow, endRow, filterList, true, false);

与相应的值和这个自定义过滤器(基于GrepIterator):

public class ExcludeTermFilter extends Filter {    
  private byte termToExclude[];
  private byte columnFamily[];
  private byte columnQualifier[];
  @Override
  public boolean accept(Key k, Value v) {
    return !(match(v.get(),termToExclude) &&
             match(k.getColumnFamilyData(),columnFamily) &&
             match(k.getColumnQualifierData(),columnQualifier) 
            );
  }
  private boolean match(ByteSequence bs, byte[] term) {
    return indexOf(bs.getBackingArray(), bs.offset(), bs.length(), term) >= 0;
  }
  private boolean match(byte[] ba, byte[] term) {
    return indexOf(ba, 0, ba.length, term) >= 0;
  }
  // copied code below from java string and modified    
  private static int indexOf(byte[] source, int sourceOffset, int sourceCount, byte[] target) {
    byte first = target[0];
    int targetCount = target.length;
    int max = sourceOffset + (sourceCount - targetCount);
    for (int i = sourceOffset; i <= max; i++) {
      /* Look for first character. */
      if (source[i] != first) {
        while (++i <= max && source[i] != first)
          continue;
      }
      /* Found first character, now look at the rest of v2 */
      if (i <= max) {
        int j = i + 1;
        int end = j + targetCount - 1;
        for (int k = 1; j < end && source[j] == target[k]; j++, k++)
          continue;
        if (j == end) {
          /* Found whole string. */
          return i - sourceOffset;
        }
      }
    }
    return -1;
  }
  @Override
  public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
    GrepIterator copy = (GrepIterator) super.deepCopy(env);
    copy.termToExclude = Arrays.copyOf(termToExclude, termToExcludelength);
    copy.columnFamily = Arrays.copyOf(columnFamily, termToExcludelength);
    copy.columnQualifier = Arrays.copyOf(columnQualifier, termToExcludelength);
    return copy;
  }
  @Override
  public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options, IteratorEnvironment env) throws IOException {
    super.init(source, options, env);
    termToExclude = options.get("etf.term").getBytes(UTF_8);
    columnFamily = options.get("etf.family").getBytes(UTF_8);
    columnQualifier = options.get("etf.qualifier").getBytes(UTF_8);
  }
  /**
   * Encode the family, qualifier and termToExclude as an option for a ScanIterator
   */
  public static void setTermToExclude(IteratorSetting cfg, String family, String qualifier, String termToExclude) {
    cfg.addOption("etf.family", family);
    cfg.addOption("etf.qualifier", qualifier);
    cfg.addOption("etf.term", termToExclude);
  }
}

或者，您可以使用不同的模式

rowId : colFam: colQual -> value
in001 : user : name -> erp 
in001 : user : age -> 23
in001 : group/0 : name -> employee
in001 : group/1 : name -> developer

或者

rowId : colFam: colQual -> value
in001 : user : name -> erp 
in001 : user : age -> 23
in001 : group : 0/name -> employee
in001 : group : 1/name -> developer

这是，对于"多有"关系，您为每个关系(在colFamily或colQualifier中)引入一个键，允许您独立操作每个关系。

相关内容

最新更新

热门标签：