将 TXT 文件中的数据块导入 SQLEt3 数据库



我想创建一个sqlite3数据库来使用python。我正在使用ArnetMiner数据集,其中每个实体都有"块"数据。块的描述如下:

    #* --- paperTitle
    #@ --- Authors
    #year ---- Year
    #conf --- publication venue
    #citation --- citation number (both -1 and 0 means none)
    #index ---- index id of this paper
    #arnetid ---- pid in arnetminer database
    #% ---- the id of references of this paper (there are multiple lines, with each indicating a reference)
    #! --- Abstract

下面是一个示例:

#*Spatial Data Structures.
#@Hanan Samet,Wei Lee Chang,Jose Fernandez
#year1995
#confModern Database Systems
#citation2743
#index25
#arnetid27
#%165
#%356
#%786754
#%3243
#!An overview is presented of the use of spatial data structures in spatial databases. The focus is on hierarchical data structures, including a number of variants of quadtrees, which sort the data with respect to the space occupied by it. Such techniques are known as spatial indexing methods. Hierarchical data structures are based on the principle of recursive decomposition. 

这是我的问题:

如何将其导入到我创建的 sqlite3 表中?

通常我使用的数据集只是用制表符分隔,所以我在创建表后只想说以下内容:

.separator "t"
.import Data.txt table_name

我按如下方式创建了表:

CREATE TABLE publications (
    PaperTitle varchar(150),
    Year int,
    Conference varchar(150),
    Citations int,
    ID int primary key,
    arnetId int,
    Abstract text
);
CREATE TABLE authors (
    ID int primary key,
    Name varchar (100)
);
CREATE TABLE authors_publications (
    PaperID int,
    AuthorID int
);
CREATE TABLE publications_citations (
    PaperID int,
    CitationID int
);

基本上,我想我在问是否有一种快速方法可以将数据集导入我创建的数据库表中?还是我必须编写python脚本并一次插入每个块?

最好的方法是自己解析数据并将其重写为 csv 文件,然后直接将它们导入我的数据库表。

最新更新