为什么我需要一个带有pg_dump的临时文件



我一直收到来自pgdumplib的错误,并将问题归结为如何重定向pg_dump的输出。

这是我想做的,但它在RuntimeError: Unsupported data format:中始终失败

F=/tmp/test_Fc_format.dump
ssh codimd "sudo -u codimd bash -c 'cd /; pg_dump -d codimd -Fc'" >$F
python -c "import pgdumplib; dump = pgdumplib.load('$F')"

我可以通过重定向远程机器上的文件来解决这个问题。此序列始终有效:

F=/tmp/test_Fc_format.dump
ssh codimd "sudo -u codimd bash -c 'cd /; pg_dump -d codimd -Fc >/tmp/934354 && cat /tmp/934354'" >$F
python -c "import pgdumplib; dump = pgdumplib.load('$F')"

请注意,唯一的区别是第二个序列添加了>/tmp/934354 && cat /tmp/934354,即首先将pg_dump的输出重定向到远程机器上的文件,然后将其发送到stdout。在这两种情况下,生成的文件大小相同(尽管由于数据库处于联机状态而不相同(。

本地和远程机器都运行Ubuntu 20.04。

为什么这一额外步骤是必要的,有没有更好的方法来解决这个问题?

更新1:这也给出了错误:

F=/tmp/test_Fc_format.dump
ssh codimd "sudo -u codimd bash -c 'cd /; pg_dump -d codimd -Fc |tee /tmp/934354 >/dev/null && cat /tmp/934354'" >$F
python -c "import pgdumplib; dump = pgdumplib.load('$F')"

换句话说,为了工作,pg_dump似乎需要重定向到本地文件或-f选项。

更新2:以下是在每个数据库上使用hd后,数据库的坏版本和好版本之间的差异的完整列表(注意pg_dump从未两次产生相同的输出(:

2c2
< 00000010  00 11 00 00 00 00 19 00  00 00 00 16 00 00 00 00  |................|
---
> 00000010  00 2e 00 00 00 00 19 00  00 00 00 16 00 00 00 00  |................|
349c349
< 000015c0  31 38 01 01 00 00 00 01  00 00 00 00 00 00 00 00  |18..............|
---
> 000015c0  31 38 01 01 00 00 00 02  b0 30 00 00 00 00 00 00  |18.......0......|
370c370
< 00001710  31 34 01 01 00 00 00 01  00 00 00 00 00 00 00 00  |14..............|
---
> 00001710  31 34 01 01 00 00 00 02  1e 5f 00 00 00 00 00 00  |14......._......|
387c387
< 00001820  00 00 01 00 00 00 00 00  00 00 00 00 b7 0b 00 00  |................|
---
> 00001820  00 00 02 21 d9 60 00 00  00 00 00 00 b7 0b 00 00  |...!.`..........|
399c399
< 000018e0  01 00 00 00 00 00 00 00  00 00 bf 0b 00 00 00 01  |................|
---
> 000018e0  02 91 0a 4c 01 00 00 00  00 00 bf 0b 00 00 00 01  |...L............|
412,413c412,413
< 000019b0  03 00 00 00 32 32 30 01  01 00 00 00 01 00 00 00  |....220.........|
< 000019c0  00 00 00 00 00 00 ba 0b  00 00 00 01 00 00 00 00  |................|
---
> 000019b0  03 00 00 00 32 32 30 01  01 00 00 00 02 ce 0b 4c  |....220........L|
> 000019c0  01 00 00 00 00 00 ba 0b  00 00 00 01 00 00 00 00  |................|
425c425
< 00001a80  35 01 01 00 00 00 01 00  00 00 00 00 00 00 00 00  |5...............|
---
> 00001a80  35 01 01 00 00 00 02 0b  ee 4c 01 00 00 00 00 00  |5........L......|
438c438
< 00001b50  00 00 01 00 00 00 00 00  00 00 00 00 b8 0b 00 00  |................|
---
> 00001b50  00 00 02 28 ee 4c 01 00  00 00 00 00 b8 0b 00 00  |...(.L..........|
456c456
< 00001c70  01 00 00 00 01 00 00 00  00 00 00 00 00 00 c7 0b  |................|
---
> 00001c70  01 00 00 00 02 45 ee 4c  01 00 00 00 00 00 c7 0b  |.....E.L........|

更新3:事实证明,这与ssh无关。我认为pg_dump需要一个可查找的文件作为输出。在这里,我演示了在重定向输出文件之前插入|cat会导致文件损坏。如果为真,这是pg_dump中的错误吗?

$ pg_dump -d codimd -Fc >/tmp/good
$ python3 -c "import pgdumplib; dump = pgdumplib.load('/tmp/good')"
$ # no error
$ pg_dump -d codimd -Fc |cat >/tmp/bad
$ python3 -c "import pgdumplib; dump = pgdumplib.load('/tmp/bad')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/pgdumplib/__init__.py", line 24, in load
return dump.Dump(converter=converter).load(filepath)
File "/usr/local/lib/python3.8/dist-packages/pgdumplib/dump.py", line 254, in load
raise RuntimeError('Unsupported data format')
RuntimeError: Unsupported data format
$ 

我端的一些测试:

F=test_dmp.out
pg_dump -d test -U postgres -Fc > $F | ls -al test_dmp.out 
-rw-r--r-- 1 aklaver users 0 Sep 30 14:06 test_dmp.out
pg_dump -d test -U postgres -Fc > temp_file.out && cat temp_file.out > $F | ls -al test_dmp.out 
-rw-r--r-- 1 aklaver users 97488 Sep 30 14:08 test_dmp.out
pg_dump -d test -U postgres -Fc -f $F | ls -al test_dmp.out 
-rw-r--r-- 1 aklaver users 97488 Sep 30 14:08 test_dmp.out

我知道这并不能回答为什么>不起作用,但它提供了一个替代方案。由于某种原因,>创建了一个空文件。其他选项则不然。

更新

这似乎与SSH转移有关:

F=test_dump.out
#Using local only.
pg_dump -d test -U postgres -Fc > $F
python -c "import pgdumplib; dump = pgdumplib.load('$F'); print('Database: {}'.format(dump.dbname))"
Database: test
#Using SSH, different database
ssh arkansas "sudo -u aklaver bash -c 'cd /; pg_dump -d redmine -U postgres -Fc'" >$F
python -c "import pgdumplib; dump = pgdumplib.load('$F'); print('Database: {}'.format(dump.dbname))"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/aklaver/py_virt/py37/lib/python3.7/site-packages/pgdumplib/__init__.py", line 24, in load
return dump.Dump(converter=converter).load(filepath)
File "/home/aklaver/py_virt/py37/lib/python3.7/site-packages/pgdumplib/dump.py", line 254, in load
raise RuntimeError('Unsupported data format')
RuntimeError: Unsupported data format
#Though the file itself ends up being ok. The below does not error out.
#There seems to be something asynchronous going on. In other words 
#pgdumplib is reading the file before it is complete.
pg_restore  -f test.sql test_dump.out

最新更新