在Java中，可以解析存档库(.A)和.SO(共享对象)文件的内容以检索人类可读文本的内容

因此，一个示例是Java Windows PE（便携式可执行文件）解析器，它能够解析Windows .exe和.dll文件以检索产品名称和版本信息，此外版权信息。https://github.com/kichik/pecoff4j基本上，您将其传递给notepad.exe之类的文件，它将返回以下

CompanyName = Microsoft Corporation
FileDescription = Notepad
FileVersion = 6.1.7600.16385 (win7_rtm.090713-1255)
InternalName = Notepad
LegalCopyright = © Microsoft Corporation. All rights reserved.
OriginalFilename = NOTEPAD.EXE
ProductName = Microsoft® Windows® Operating System
ProductVersion = 6.1.7600.16385

此工具基本上使用几个Java InputStream库来访问文件中的某些字节，并返回原始数据读取的适当ASCII表示。

对于我的问题，我已经尝试使用以下方法，即使我尝试对其进行标准化，也可以返回不可读的文本：

public static void readContent(String file){
          BufferedReader buff = null;
        try {
            buff = new BufferedReader(new InputStreamReader(new DataInputStream(new FileInputStream(file)),"UTF-8"));
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        while(true){
            String line=null;
            try {
                line = buff.readLine();
            } catch (IOException e) {
                e.printStackTrace();
            }
            if(line == null){
                break;
            }
            line =  Normalizer.normalize(line, Normalizer.Form.NFD).replaceAll("\p{InCombiningDiacriticalMarks}+", "");
            System.out.println(line);
}

如果有人可以将我指向正确的方向，如果有可能实现我想要的方法。

我找到了我的问题的解决方案，以防万一有人也在徘徊答案。出于机密原因，我不会在这里发布所有代码，但它很简单：

 fis = new FileInputStream(file);
 bis = new BufferedInputStream(fis);
 PeekableInputStream pis = new PeekableInputStream(bis);
 Set<String> lines = new HashSet<String>();
 int currentByte;
 StringBuilder currentLine = null;
while (((currentByte = is.read()) != -1) && lines.size() < 10000) {
            char ch = (char) currentByte;
            if (isStringChar(ch)) {
                if (currentLine == null) {
                    currentLine = new StringBuilder(8);
                }
                // found a char, add it to the current line
                currentLine.append(ch);
           }
}

现在，如果您在CurrentLine字符串中查看，如果原始作者包含了它们，则会找到诸如许可证版权之类的信息。

相关内容

最新更新

热门标签：