如何实现可以通过换行符拆分的流



以下代码有效,但与我使用(linux)管道向(修改后的)程序提供解压缩数据的管道相比,效率低下了大约两倍。我需要程序中的稳定流,我可以按n继续拆分。有没有办法使用(字符串?)流或任何其他技巧来做到这一点?

int main(int argc, char *argv[]) {
static const int unzipBufferSize = 8192;
long long int counter = 0;
int i = 0, p = 0, n = 0;
int offset = 0;
char *end = NULL;
char *begin = NULL;
unsigned char unzipBuffer[unzipBufferSize];
unsigned int unzippedBytes;
char * inFileName = argv[1];
char buffer[200];
buffer[0] = '';
bool breaker = false;
char pch[4][200];
Read *aRead = new Read;
gzFile inFileZ;
inFileZ = gzopen(inFileName, "rb");
while (true) {
    unzippedBytes = gzread(inFileZ, unzipBuffer, unzipBufferSize);
    if (unzippedBytes > 0) {
        unzipBuffer[unzippedBytes] = ''; //put a 0-char after the total buffer
        begin = (char*) &unzipBuffer[0]; // point to the address of the first char
        do {
            end = strchr(begin,(int)'n'); //find the end of line
            if (end != NULL) *(end) = ''; // put 0-char to use it as a c-string
            pch[p][0] = ''; \ put a 0-char to be able to strcat
            if (strlen(buffer) > 0) { // if buffer from previous iteration contains something
                strcat(pch[p], buffer); // cat it to the p-th pch
                buffer[0] = ''; \ set buffer to null-string or ""
            }
            strcat(pch[p], begin); // put begin (or rest of line in case there was a buffer into p-th pch
            if (end != NULL) { // see if it already points to something
                begin = end+1; // if so, advance begin to old end+1
                p++;
            }
            if(p>3) { // a 'read' contains 4 lines, so if p>3
                strcat(aRead->bases,pch[1]); // we use line 2 and 4 as
                strcat(aRead->scores,pch[3]); // bases and scores
                //do things with the reads
                aRead->bases[0] = ''; //put them back to 0-char
                aRead->scores[0] = '';
                p = 0; // start counting next 4 lines
            }
        } 
        while (end != NULL );
        strcat(buffer,pch[p]); //move the left-over of unzipBuffer to buffer
    }
    else {
        break; // when no unzippedBytes, exit the loop
    }
}

您的主要问题可能是标准的 C 字符串库。

通过使用strxxx()功能,每次调用都会多次遍历完整的缓冲区,首先是strchr(),然后是strlen(),然后是每个strcat()调用。使用标准库是一件好事,但在这里,它只是效率低下。

尝试是否可以想出一些更简单的东西,每个字符只接触一次(代码只是为了显示原理,不要指望它有效):

do
{
    do
    {
       *tp++ = *sp++;
    } while (sp < buffer_end && *sp != 'n');
    /* new line, do whatever it requires */
    ...
    /* reset tp to beginning of buffer */
} while (sp < buffer_end);

我正在尝试让它工作,但它所做的只是在运行时给出分段错误:

do {
    unzippedBytes = gzread(inFileZ, unzipBuffer, unzipBufferSize);
    if (unzippedBytes > 0) {
        while (*unzipBuffer < unzippedBytes) {
            *pch = *unzipBuffer++;
            cout << pch;
            i++;
        }
        i=0;
    }
    else break;
} while (true);

我在这里做错了什么?

最新更新