如何快速读取文件以检查签名/幻数



我是一名学生,对C++和安全性还很陌生。我接到了一项关于检查文件中签名/幻数的作业,我在加快阅读时间方面遇到了一个小问题。

我的想法是使用ifstream以二进制模式读取文件,将其数据存储在向量中,然后将其转换为十六进制字符串。最后,我将检查给定的签名是否存在于十六进制字符串中。

理论上一切都很顺利,只是分配矢量内存、读取和转换文件数据的整个过程需要很长时间。只有读取部分需要44ms。

我想知道我该如何改进?这是我的代码

UINT CheckForSignature(CString source, CString dest_path) {
// source is the HEX string need to find in file, dest_path is the destination of the file
ifstream file(dest_path, ios::binary);
if (file.is_open()) {
// check for size of the file
file.seekg(0, ios::end);
int iFileSize = file.tellg();
// if the file size exceed 50MB, pass
if (iFileSize > 50000000) {
// return -1, means file exceed 50MB, which do not need to be checked
return -1; 
}
// read file and store data in hex string
file.seekg(0, ios::beg);
vector<char> memblock(iFileSize);
file.read(((char*)memblock.data()), iFileSize); // 18ms alloc memory
ostringstream ostrData; // 44ms read file
// add to a total of 62ms
// if consider the time need to translate all the memblock
// then this will be long as hell
// need to improve this
for (int i = 0; i < memblock.size(); i++) {
int z = memblock[i] & 0xff;
ostrData << hex << setfill('0') << setw(2) << z;
}
string strDataHex = ostrData.str();
string strHexSource = (CT2A)source;
if (strDataHex.find(strHexSource) != string::npos) {
// return 1, means there exits the signature in the file
return 1;
}
else {
// return 0; means there isn't the signature in the file
return 0;
}
}
}

我愿意接受所有关于解决方案和代码改进的帮助和建议。非常感谢!

有更高性能的方法来读取和检查文件内容。

这里我展示了一种天真/简单的方法(只是一个例子(

我创建了一个51M文件;0000〃;最后(我取消了尺寸限制(:

~/projects$ l data.bin 
-rw-r--r-- 1 manuel manuel 51M jul 27 02:51 data.bin

(显示最后两行。(

~/projects$ tail data.bin | hexdump
0000b80 11b9 dddd 8fe9 bab1 134d 5645 eb74 81ce
0000b90 3030 3030 000a                         
0000b95

运行代码(20次运行(:

~/projects$ ./runtest.sh 131072 20
0 2360 1 2333 2 2355 3 2360 4 2349 5 2350 6 2353 7 2346 8 2342 9 2381 10 2378 11 2394 12 2338 13 2363 14 2392 15 2374 16 2365 17 2433 18 2426 19 2397 
Average: 2369

运行我的示例(20次运行(:

~/projects$ ./runtest.sh 131072 20 mio
0 105 1 103 2 104 3 104 4 104 5 105 6 104 7 104 8 104 9 102 10 102 11 104 12 104 13 103 14 102 15 103 16 103 17 105 18 104 19 104 
Average: 103

带有5M文件。

您的:

~/projects$ ./runtest.sh 131072 20
0 238 1 243 2 244 3 242 4 243 5 244 6 239 7 245 8 243 9 246 10 239 11 246 12 243 13 242 14 240 15 243 16 242 17 245 18 240 19 243 
Average: 242

示例:

~/projects$ ./runtest.sh 131072 20 mio
0 10 1 10 2 10 3 11 4 10 5 10 6 10 7 10 8 10 9 10 10 11 11 10 12 10 13 10 14 10 15 10 16 10 17 10 18 10 19 10 
Average: 10

要编译和运行的脚本(对于我的示例,您可以尝试几种缓冲区大小(:

#! /bin/bash
n=10
mio=""
bs=1024
if [ "$1" != "" ]
then
bs=$1
fi
if [ "$2" == "" ]
then
echo "Ups. Repeating? Will try with 10"
else
n=$2
fi
if [ "$3" != "" ]
then
mio="-DMIO"
fi
rm -f main
g++ -Wall -Wextra -g main.cc -o main -Wpedantic -std=c++2a -DBLOCK_SIZE=$bs $mio
tot=0
run=0
while [ "$run" != "$n" ]
do
text=$(./main)
mic=$(echo $text | cut - -d' ' -f 4)
echo -n "$run $mic "
tot=$(($tot + $mic))
run=$(($run + 1))
done
echo
tot=$(($tot / $run))
echo "Average: $tot"
int main()
{
string dest_path{"data.bin"};
const unsigned char hex[] = {0x30, 0x30, 0x30, 0x30, 0x00 }; //  what to look for
#ifdef MIO
ifstream file(dest_path, ios::binary);
int numblocks = 0;
std::chrono::high_resolution_clock::time_point init;
std::chrono::high_resolution_clock::time_point finish;
bool found = false;
bool you_bet = false;
unsigned char memblock[BLOCK_SIZE];
size_t posf = 0;
size_t sizeofhex = sizeof(hex) - 1;

if (file.is_open()) {
init = std::chrono::high_resolution_clock::now();
do {
file.read((char *)memblock, BLOCK_SIZE);
if (file.eof()) {
you_bet = true;
}
for (long int i = 0; i < file.gcount(); ++i) {
if (memblock[i] == hex[0] && std::memcmp(&memblock[i], hex, sizeofhex) == 0) {
finish = std::chrono::high_resolution_clock::now();
found = true;
posf = i;
}
}
file.seekg(-sizeof(hex), ios::cur); // prevent between two blocks signature
++numblocks;
} while (!you_bet || !found);
}
auto res = std::chrono::duration_cast<std::chrono::milliseconds>(finish - init).count();
if (found) {
cout << "Yep! Found! Milliseconds: " << res
<< " at page " << (numblocks/BLOCK_SIZE)
<< " byte " << posf
<< ", total " << ((numblocks * BLOCK_SIZE) + posf)
<< endl;
} else {
cout << "Hmm... not found"  << endl;
}
#else
std::chrono::high_resolution_clock::time_point init;
std::chrono::high_resolution_clock::time_point finish;
ifstream file(dest_path, ios::binary);
if (file.is_open()) {
// check for size of the file
file.seekg(0, ios::end);
int iFileSize = file.tellg();
file.seekg(0, ios::beg);
init = std::chrono::high_resolution_clock::now();
vector<char> memblock(iFileSize);
file.read(((char*)memblock.data()), iFileSize); // 18ms alloc memory
ostringstream ostrData; // 44ms read file
// add to a total of 62ms
// if consider the time need to translate all the memblock
// then this will be long as hell
// need to improve this
for (size_t i = 0; i < memblock.size(); i++) {
int z = memblock[i] & 0xff;
ostrData << hex << setfill('0') << setw(2) << z;
}
string strDataHex = ostrData.str();
string strHexSource = "0000";
if (strDataHex.find(strHexSource) != string::npos) {
// return 1, means there exits the signature in the file
finish = std::chrono::high_resolution_clock::now();
auto res = std::chrono::duration_cast<std::chrono::milliseconds>(finish - init).count();
cout << "Yep! Found! Microseconds: " << res
<< endl;
return 1;
}
else {
// return 0; means there isn't the signature in the file
return 0;
}
}
#endif
return 1;
}

相关内容

  • 没有找到相关文章

最新更新