有没有办法从流中获取"n"?



我正在尝试使用一个文件,并将其转换为某种数据结构(Text是段落的"数组",paragraph是句子的"阵列",sentent是单词的"数组,它们是char*)。

为了让自己的一切都变得容易,我使用了数据流(确切地说是ifstream),但我遇到的一个问题是定义段落的结尾(2'\n'被认为是段落的结尾)。简单的方法是逐个字符地处理文本,并检查每一个文本是否是空格或"\n",但这很长,有点痛苦。

代码看起来像这样:

    std::ifstream fd(filename);
    char buffer[128];
    while(fd >> buffer)
    {
        /* Some code in here that does things with buffer */
    }

而且——好吧,它很有效,但完全忽略了所有段落。fd.get(buffer, 128, 'n')也不能按需要工作——它在读取1次后就切断了所有内容。

那么,有没有一种方法比逐个读取更容易做到这一点?无法使用getline(),因为任务禁止我们使用向量或字符串。

更新

看来std::istream::getline可能会帮我,但它仍然不是我所期望的。它读起来,好吧,第一行,然后发生了一些奇怪的事情。

代码看起来是这样的:

std::ifstream fd(fl);
char buffer[128];
fd.getline(buffer, 128);
std::cout << "555 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;
fd.getline(buffer, 128);
std::cout << "777 - [" << buffer << "]" << std::endl;
std::cout << fd.gcount() << std::endl;

输出看起来像

]55 - [text from file
23
]77 - [
2

是的,我想我不明白发生了什么。

据我所知,您可能不会使用任何std容器。

所以我认为有可能:

  1. 将整个文件读取到缓冲区
  2. 标记段落的缓冲区
  3. 标记每个段落的句子
  4. 标记每个句子中的单词

对于第一部分,您可以使用:

//! Reads a file to a buffer, that must be deleted afterwards
char* readFile(const char *filename) {
  std::ifstream ifs(filename, std::ifstream::binary);
  if (!filename.good())
    return NULL;
  ifs.seekg(0, ifs.end);
  size_t len = ifs.tellg();
  ifs.seekg(0, ifs.beg);
  char* buffer = new char[len];
  if (!buffer) { // Check for failed alocation
    ifs.close();
    return NULL;
  }
  if (ifs.read(buffer, len) != len) { // Check if the entire file was read
    delete[] buffer;
    buffer = NULL;
  }
  ifs.close();
  return buffer;
}

有了这个函数,我们现在只需要使用它并标记字符串。为此,我们必须定义我们的类型(基于链表,使用C编码格式)

struct Word {
  char *contents;
  Word *next;
};
struct Sentence {
  Word *first;
  Sentence *next;
};
struct Paragraph {
  Sentence *first;
  Paragraph *next;
};
struct Text {
  Paragraph *first;
};

定义了类型后,我们现在可以开始阅读我们的文本:

//! Splits a sentence in as many Word elements as possible
void readSentence(char *buffer, size_t len, Word **target) {
    if (!buffer || *buffer == '' || len == 0) return;
    *target = new Word;
    (*target)->next = NULL;
    char *end = strpbrk(buffer, " trn");
    if (end != NULL) {
        (*target)->contents = new char[end - buffer + 1];
        strncpy((*target)->contents, buffer, end - buffer);
        (*target)->contents[end - buffer] = '';
        readSentence(end + 1, strlen(end + 1), &(*target)->next);
    }
    else {
        (*target)->contents = _strdup(buffer);
    }
}
//! Splits a paragraph from a text buffer in as many Sentence as possible
void readParagraph(char *buffer, size_t len, Sentence **target) {
    if (!buffer || *buffer == '' || len == 0) return;
    *target = new Sentence;
    (*target)->next = NULL;
    char *end = strpbrk(buffer, ".;:?!");
    if (end != NULL) {
        char *t = new char[end - buffer + 2];
        strncpy(t, buffer, end - buffer + 1);
        t[end - buffer + 1] = '';
        readSentence(t, (size_t)(end - buffer + 1), &(*target)->first);
        delete[] t;
        readParagraph(end + 1, len - (end - buffer + 1), &(*target)->next);
    }
    else {
        readSentence(buffer, len, &(*target)->first);
    }
}
//! Splits as many Paragraph as possible from a text buffer
void readText(char *buffer, Paragraph **target) {
    if (!buffer || *buffer == '') return;
    *target = new Paragraph;
    (*target)->next = NULL;
    char *end = strstr(buffer, "nn"); // With this, we have a pointer to the end of a paragraph. Pass to our sentence parser.
    if (end != NULL) {
        char *t = new char[end - buffer + 1];
        strncpy(t, buffer, end - buffer);
        t[end - buffer] = '';
        readParagraph(t, (size_t)(end - buffer), &(*target)->first);
        delete[] t;
        readText(end + 2, &(*target)->next);
    }
    else
        readParagraph(buffer, strlen(buffer), &(*target)->first);
}
Text* createText(char *contents) {
    Text *text = new Text;
    readText(contents, &text->first);
    return text;
}

例如,您可以这样使用它:

int main(int argc, char **argv) {
    char *buffer = readFile("mytext.txt");
    Text *text = createText(buffer);
    delete[] buffer;
    for (Paragraph* p = text->first; p != NULL; p = p->next) {
        for (Sentence* s = p->first; s != NULL; s = s->next) {
            for (Word* w = s->first; w != NULL; w = w->next) {
                std::cout << w->contents << " ";
            }
        }
        std::cout << std::endl << std::endl;
    }
    return 0;
}

请记住,这个代码可能工作,也可能不工作,因为我没有测试它。

来源:

  • http://www.cplusplus.com/reference/

相关内容

  • 没有找到相关文章

最新更新