我正在尝试使用Hadoop API复制到HDFS后检查文件的一致性 - DFSCleint.getFileChecksum()。
我得到了上述代码的以下输出:
Null
HDFS : null
Local : null
任何人都可以指出错误或错误吗?这是代码:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileChecksum;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocalFileSystem;
import org.apache.hadoop.fs.Path;
public class fileCheckSum {
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
FileSystem hadoopFS = FileSystem.get(conf);
// Path hdfsPath = new Path("/derby.log");
LocalFileSystem localFS = LocalFileSystem.getLocal(conf);
// Path localPath = new Path("file:///home/ubuntu/derby.log");
// System.out.println("HDFS PATH : "+hdfsPath.getName());
// System.out.println("Local PATH : "+localPath.getName());
FileChecksum hdfsChecksum = hadoopFS.getFileChecksum(new Path("/derby.log"));
FileChecksum localChecksum = localFS.getFileChecksum(new Path("file:///home/ubuntu/derby.log"));
if(null!=hdfsChecksum || null!=localChecksum){
System.out.println("HDFS Checksum : "+hdfsChecksum.toString()+"t"+hdfsChecksum.getLength());
System.out.println("Local Checksum : "+localChecksum.toString()+"t"+localChecksum.getLength());
if(hdfsChecksum.toString().equals(localChecksum.toString())){
System.out.println("Equal");
}else{
System.out.println("UnEqual");
}
}else{
System.out.println("Null");
System.out.println("HDFS : "+hdfsChecksum);
System.out.println("Local : "+localChecksum);
}
}
}
您没有在conf
上设置远程地址并且基本上使用相同的配置,因此hadoopFS
和localFS
都指向LocalFileSystem
的实例。
getFileChecksum
未针对 LocalFileSystem
实现并返回 null。不过,它应该适用于DistributedFileSystem
,如果您的conf
指向分布式集群,FileSystem.get(conf)
应该返回一个 DistributedFileSystem
实例,该实例返回 CRC5 的 MD5 的 MD32 大小为 bytes.per.checksum
的块的校验和。此值取决于块大小和群集范围的配置,bytes.per.checksum
。这就是为什么这两个参数也被编码在分布式校验和的返回值中作为算法的名称:MD5-of-xxxMD5-of-yyyCRC32,其中 xxx 是每个块的 CRC 校验和数,yyy 是bytes.per.checksum
参数。
该getFileChecksum
并非旨在跨文件系统进行比较。虽然可以在本地模拟分布式校验和,或者手工制作map-reduce作业来计算本地哈希的等效值,但我建议依靠Hadoop自己的完整性检查,当文件写入Hadoop或从Hadoop读取时发生
试试这个。在此,我计算了本地和HDFS文件的MD5,然后比较了两个文件相等的相同值。希望这有帮助。
public static void compareChecksumForLocalAndHdfsFile(String sourceHdfsFilePath, String sourceLocalFilepath, Map<String, String> hdfsConfigMap)
throws Exception {
System.setProperty("HADOOP_USER_NAME", hdfsConfigMap.get(Constants.USERNAME));
System.setProperty("hadoop.home.dir", "/tmp");
Configuration hdfsConfig = new Configuration();
hdfsConfig.set(Constants.USERNAME, hdfsConfigMap.get(Constants.USERNAME));
hdfsConfig.set("fsURI", hdfsConfigMap.get("fsURI"));
FileSystem hdfs = FileSystem.get(new URI(hdfsConfigMap.get("fsURI")), hdfsConfig);
Path inputPath = new Path(hdfsConfigMap.get("fsURI") + "/" + sourceHdfsFilePath);
InputStream is = hdfs.open(inputPath);
String localChecksum = getMD5Checksum(new FileInputStream(sourceLocalFilepath));
String hdfsChecksum = getMD5Checksum(is);
if (null != hdfsChecksum || null != localChecksum) {
System.out.println("HDFS Checksum : " + hdfsChecksum.toString() + "t" + hdfsChecksum.length());
System.out.println("Local Checksum : " + localChecksum.toString() + "t" + localChecksum.length());
if (hdfsChecksum.toString().equals(localChecksum.toString())) {
System.out.println("Equal");
} else {
System.out.println("UnEqual");
}
} else {
System.out.println("Null");
System.out.println("HDFS : " + hdfsChecksum);
System.out.println("Local : " + localChecksum);
}
}
public static byte[] createChecksum(String filename) throws Exception {
InputStream fis = new FileInputStream(filename);
byte[] buffer = new byte[1024];
MessageDigest complete = MessageDigest.getInstance("MD5");
int numRead;
do {
numRead = fis.read(buffer);
if (numRead > 0) {
complete.update(buffer, 0, numRead);
}
} while (numRead != -1);
fis.close();
return complete.digest();
}
// see this How-to for a faster way to convert
// a byte array to a HEX string
public static String getMD5Checksum(String filename) throws Exception {
byte[] b = createChecksum(filename);
String result = "";
for (int i = 0; i < b.length; i++) {
result += Integer.toString((b[i] & 0xff) + 0x100, 16).substring(1);
}
return result;
}
输出:
HDFS Checksum : d99513cc4f1d9c51679a125702bd27b0 32
Local Checksum : d99513cc4f1d9c51679a125702bd27b0 32
Equal