如何在PHP中提取所有嵌套的tar.gz和zip到一个目录



我需要在PHP中提取一个tar.gz文件。该文件包含许多JSON文件、tar.gzzip和子目录。我只需要将JSON文件移动到目录中/数据集/处理,并不断提取嵌套的tar.gz和zip以从中获取所有JSON文件。这些文件也可以有嵌套的文件夹/目录。

结构如下:

origin.tar.gz
├───sub1.tar.gz
│   ├───sub2.tar.gz
│   ├───├───a.json
│   ├───├───├───├───├───├───...(unknown depth)
│   ├───b.json
│   ├───c.json
├───sub3.zip
│   ├───sub4.tar.gz
│   ├───├───d.json
│   ├───├───├───├───├───├───...(unknown depth)
│   ├───e.json
│   ├───f.json
├───subdirectory
│   ├───g.json
├───h.json
├───i.json
|   ..........
|   ..........
|   ..........
|   many of them

一旦它被提取/数据集将类似于此

Dataset/processing
├───a.json
├───b.json
├───c.json
├───d.json
├───e.json
├───f.json
├───g.json
├───h.json
├───i.json
|   ..........
|   ..........
|   ..........
|   many of them

我知道如何在PHP中使用PharData提取tar.gz,但它只适用于单个级别的深度。我在想,如果某种递归可以使这种方法适用于多层次的深度。

$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz

我已经对我的代码进行了一些改进并尝试过,它适用于多深度,但当有一个目录(文件夹或嵌套文件夹(也包含JSON时会失败。有人能帮我把它们也提取出来吗。

<?php
$path = './';
// Extraction of compressed file
function fun($path) {    
$array = scandir($path); 
for ($i = 0; $i < count($array); $i++) {
if($i == 0 OR $i == 1){continue;}
else {
$item = $array[$i];
$fileExt = explode('.', $item);
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
$pathnew = $path.$item; // Dataset ./data1.tar.gz
$phar = new PharData($pathnew);
// Moving the files
$phar->extractTo($path);
// Del the files
unlink($pathnew);
$i=0;
}
}
$array = scandir($path);

}
}
fun($path);
// Move only the json to ./dataset(I will add it later)
?>

提前谢谢。

在第一步,提取你的tar.gz文件,如你提到的:

$phar = new PharData('origin.tar.gz');
$phar->extractTo('/full/path'); // extract all files in the tar.gz

然后递归地读取目录,将所有json类型的文件移动到目标目录中,下面是我的注释代码:

$dirPath='./';       // the root path of your very first extraction of your tar.gz
recursion_readdir($dirPath,1);

function recursion_readdir($dirPath,$Deep=0){
$resDir=opendir($dirPath);
while($basename=readdir($resDir)){
//current file path
$path=$dirPath.'/'.$basename;
if(is_dir($path) AND $basename!='.' AND $basename!='..'){
//it is directory, then go deeper
$Deep++;//depth+1
recursion_readdir($path,$Deep);
}else if(basename($path)!='.' AND basename($path)!='..'){
//it is not directory,
//when the file is json file
if(strstr($basename,'json')) {
//copy the file to your destination path
copy($path, './dest/' . $basename);
} else if(strstr($basename,'tar')){
//when the file is tar.gz file, extract this tar.gz file
$phar = new PharData($basename);
$phar->extractTo($dirPath, null, true);
}
}
}
closedir($resDir);
}
function forChar($char='-',$times=0){
$result='';
for($i=0;$i<$times;$i++){
$result.=$char;
}
return $result;
}

我做了一些研究后解决了这个问题。这就解决了问题。

有3个功能:

  • recursiveScanProtected((:它提取所有压缩文件
  • scanJSON((:它将扫描JSON文件并将它们移动到处理文件夹中
  • delete_files((:此函数删除除了根目录中有JSON文件和index.php的处理文件夹之外的所有内容
<?php
// Root directory
$path = './';
// Directory where I want to extract the JSON files
$path_json = $path.'processing/';

// Function to extract all the compressed files
function recursiveScanProtected($dir, $conn) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
for ($i = 0; $i < count($tree); $i++) {
$file = $tree[$i];
if (is_dir($file)) {
recursiveScanProtected($file, $conn); // Recursive call if directory
} elseif (is_file($file)) {
$item = $file;
$fileExt = explode('.', $item); 
// Getting the extension of the file
$fileActualExt = strtolower(end($fileExt));
// Check if the file is a zip or a tar.gz
if(($fileActualExt == 'gz') or ($fileActualExt == 'zip')){
// Moving the file - Overwriting true
$phar->extractTo($dir.$i."/", null, true);
// Del the compressed file
unlink($item);
recursiveScanProtected($dir.$i, $conn); // Recursive call
}
}
}
}
}
}
recursiveScanProtected($path, $conn);

// Move the JSON files to processing
function scanJSON($dir, $path_json) {
if($dir != '') {
$tree = glob(rtrim($dir, '/') . '/*');
if (is_array($tree)) {
foreach($tree as $file) {
if (is_dir($file)) {
// Do not scan processing recursively, but all other directories should be scanned
if($file != './processing'){
scanJSON($file, $path_json);
}
} elseif (is_file($file)) {
$ext = pathinfo($file);
if(strtolower($ext['extension']) == 'json'){
// Move the JSON files to processing
rename($file, $path_json.$ext['basename']);
}
}
}
}
}
}
scanJSON($path, $path_json);
/* 
* php delete function that deals with directories recursively
* It deletes everything except ./dataset/processing and index.php
*/
function delete_files($target) {
if(is_dir($target)){
$files = glob( $target . '*', GLOB_MARK ); //GLOB_MARK adds a slash to directories returned
foreach( $files as $file ){
if($file == './processing/' || $file == './index.php'){
continue;
} else{
delete_files( $file );
}
}
if($target != './'){
rmdir( $target );
}
} elseif(is_file($target)) {
unlink( $target );  
}
}
delete_files($path);
?>

最新更新