正在读取PHP中的WebVTT文件

有人有使用PHP读取WebVTT（.vtt）文件的经验吗？

我正在CakePHP中开发一个应用程序，在那里我需要阅读一堆vtt文件，并获得开始时间和相关文本。

因此，以文件为例：

00:00.999-->00:04.999第一句00:04.999-->00:07.999第二句00:07.999-->00:10.999第三句带有换行符00:10.999-->00:14.999第四句关于三线

我需要能够提取这样的东西：

00:00.999第一句00:04.999第二句00:07.999第三句带换行符00:10.999三行第四句

请注意，可以有换行符，因此每个时间戳之间没有固定的行数。

我的计划是搜索"-->"，这是每个时间戳之间的一个公共字符串。有人知道如何最好地实现这一点吗？

要解析文件，可以使用这样的库：

$subtitles = Subtitles::loadFromFile('subtitles.vtt');
$blocks = $subtitles->getInternalFormat(); // array
foreach ($blocks as $block) {
    echo $block['start'];
    echo ' ';
    foreach ($block['lines'] as $line) {
        echo $line . ' ';
    }
    echo "n";
}

它还将从包含样式和其他小错误的文件中获取文本。

https://github.com/mantas-done/subtitles

这似乎实现了我所需要的，即输出开始时间和任何后续的文本行。我使用的文件相当小，所以使用PHP的file（）函数将所有内容读取到数组中似乎是可以的；但不确定这是否适用于大文件。

    $file = 'test.vtt'; 
    $file_as_array = file($file, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
    foreach ($file_as_array as $f) {    
        // Find lines containing "-->"  
        $start_time = false;
        if (preg_match("/^(d{2}:[d.]+) --> d{2}:[d.]+$/", $f, $match)) {              
            $start_time = explode('-->', $f);
            $start_time = $start_time[0];
            echo '<br>';
            echo $start_time;
        }
        // It's a line of the file that doesn't include a timestamp, so it's caption text. Ignore header of file which includes the word 'WEBVTT'
        if (!$start_time && (!strpos($f, 'WEBVTT')) ) {             
            echo ' ' . $f . ' ';
        }   
    }       
}

您可以这样做：

<?PHP
function send_reformatted($vtt_file){
 // Add these headers to ease saving the output as text file
    header("Content-type: text/plain");
    header('Content-Disposition: inline; filename="'.$vtt_file.'.txt"');
    $f = fopen($vtt_file, "r");
    $line_new = "";
    while($line = fgets($f)){
        if (preg_match("/^(d{2}:[d.]+) --> d{2}:[d.]+$/", $line, $match)) {
            if($line_new) echo $line_new."n";
            $line_new = $match[1];
        } else{
            $line = trim($line);
            if($line) $line_new .= " $line";
        }
    }
    echo $line_new."n";
    fclose($f);
}

send_reformatted("test.vtt");
?>

相关内容

最新更新

热门标签：