在 elasticsearch 中映射具有多个级别的书籍,嵌套与父子关系


book: {
  properties: {
    isbn:     {       //- ISBN of the book
      type: 'string'  //- 9783791535661
    title:    {       //- Title of the book
      type: 'string'  //- Alice in Wonderland
    author:   {       //- Author of the book(maybe should be array)
      type: 'string'  //- Lewis Carroll
    category: {       //- Category of the book(maybe should be array)
      type: 'string'  //- Fantasy
    toc: {            //- Array of the chapters in the book
      type: 'nested',
      properties: {
        html: {           //- HTML Content of a chapter
          type: 'string'  //- <!DOCTYPE html><html>...</html>
        title: {          //- Title of the chapter
          type: 'string'  //- Down the Rabbit Hole 
        fileName: {       //- File name of this chapter
          type: 'string'  //- chapter_1.html
        firstPage: {      //- The first page of this chapter
          type: 'integer' //- 3
        numberOfPages: {  //- How many pages are in this chapter
          type: 'integer' //- 27
        sections: {       //- An array of all of the sections within a chapter
          type: 'nested',
          properties: {
            html: {           //- The html content of a section
              type: 'string'  //- <section>...</section>
            title: {          //- The title of a section
              type: 'string'  //- section number 2 or something
            figures: {        //- Array of the figures within a section
              type: 'nested',
              properties: {
                html: {           //- HTML content of a figure
                  type: 'string'  //- <figure>...</figure>
                caption: {        //- The name of a figure
                  type: 'string'  //- Figure 1
                id: {             //- Id of a figure
                  type: 'string', // figure4
            paragraphs: {     //- Array of the paragraphs within a section
              type: 'nested',
              properties: {   
                html: {           //- HTML content of a paragraph
                  type: 'string', //- <p>...</p>
                id: {             //- Id of a paragraph
                  type: 'string', // paragraph3

整本书 html 的大小约为 250kB。我想查询诸如

- the best matching paragraph including it's nearest paragraphs on either side
- the best matching section from a single book including any child sections
- the best figure given it is inside a section with a matching title
- etc


如果您使用 nested 类型,则所有内容都将包含在同一个_source文档中,这对于大书籍来说可能相当拗口。







例如,如果您尝试查找特定的章节部分,则查询将返回正确的文档 - 整本书。我想这可能不是你要找的,因此parent/child关系将是合适的方式。

