使用Scriptella转换ETL的一部分



我正在试用Scriptella,看看它是否能满足我的需求。到目前为止,它似乎是一个很好的工具。我花了几个小时研究示例脚本,搜索论坛,并试图掌握嵌套查询/脚本的窍门。

这是我的ETL文件的一个示例,为了简洁起见,对其进行了一些清理。添加了以#开头的行,而不是实际ETL文件的一部分。我正在尝试插入/检索ID,然后将它们传递给稍后的脚本块。最有前途的方法似乎是使用全局变量,但在尝试检索值时,我会得到null。稍后,我将在脚本块中添加代码,这些代码在将字段添加到DB之前解析并显著转换字段。

没有错误。我只是没有得到我期望的操作系统ID和类别ID。提前谢谢。

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
    <connection id="in" driver="csv" url="mycsvfile.csv"/>
    <connection id="dest" url="jdbc:mysql://localhost:3306/pvm3" user="user" password="password"/>
    <connection id="js" driver="script"/>
    <query connection-id="in">
        <!-- all columns are selected, notably: OPERATINGSYSTEM, CATEGORY, QID, TITLE -->
        <query connection-id="dest">
            #Check to see if the OS already exists, and get the ID if it does
            select max(os_id) as os_id, count(*) as os_cnt from etl_os where os = ?OPERATINGSYSTEM;
            #If it doesnt exist then add it and get the auto_increment value
            <script if="os_cnt==0">
                insert into etl_os(os) values(?OPERATINGSYSTEM);
                <query connection-id="dest">
                    select last_insert_id() as os_id;
                    #Store in global so it can be accessed in later script blocks
                    <script connection-id="js">
                        etl.globals.put('os_id', os_id);
                    </script>
                </query>
            </script>
            #Same style select/insert as above for category_id (excluded for brevity)
            #See if KB record exists by qid, if not then add it with the OS ID and category ID we got earlier
            <query connection-id="dest">
                select max(qid) as existing_qid, count(*) as kb_cnt from etl_qids where qid = ?QID
                <script if="kb_cnt==0">
                    insert into etl_qids(qid, category_id, os_id) values (?QID, ?{etl.globals.get('category_id')}, ?{etl.globals.get('os_id')});
                </script>
            </query>
        </query>
    </query>
</etl>

找到了如何做到这一点。本质上,在将数据传递给脚本之前,只需嵌套查询即可修改数据。下面是解决方案的快速类型。一开始我不明白,在传递行进行处理之前,可以立即嵌套查询来转换行。我的印象也是只有脚本才能操纵数据。

(查询)原始数据->(查询)操作数据->(脚本)写入新数据。

.. in is a CSV file ..
.. js is a driver="script" block ..
<query connection-id="in">
    <query connection-id="js">
        //transform data as needed here
        if (BASE_TYPE == '-') BASE_TYPE = '0';
        if (SECONDARY_TYPE == '-') SECONDARY_TYPE = '0';
        SIZES = SIZES.toLowerCase();
        query.next(); //call nested scripts
        <script connection-id="db">
            INSERT IGNORE INTO sizes(size) VALUES (?SIZE);
            INSERT IGNORE INTO test(base_type,secondary_type) VALUES (?BASE_TYPE, ?SECONDARY_TYPE);
        </script>
    </query>
</query>

最新更新