I have approched in the following way
$parser = new Parser(); $pdf = $parser->parseFile($_FILES['pdf']['tmp_name']); $text = $pdf->getText(); // Normalize NBSP → plain space $text = preg_replace('/\x{00A0}/u', ' ', $text); $text = preg_replace('/\R+(?=\s*Weekly\s+Totals)/iu', ' ', $text); // Split into lines $lines = preg_split('/\R/u', $text); $results = []; $currentType = null; foreach ($lines as $line) { $line = trim($line); and it detects names when
// Only lines with "Weekly Totals" matter now if (stripos($line, 'Weekly Totals') === false) { continue; } // Match: Name Weekly Totals <left> $<cost> <actual> if (preg_match('/^(.+?)\s+Weekly\s+Totals\s+\d+\.\d{2}\s+\$[\d,]+\.\d{2}\s+(\d+\.\d{2})$/i', $line, $m)) { $name = ucwords(strtolower($m[1])); But in my pdf some names are long and they get split into two lines in the same column "Josh Brook Silvester Damien Junior Weekly Totals" in the pdf is as
Josh Brook Silvester Damien 34.90 $259. 32.00
Junior Weekly Totals
Hence, it is not getting detected in the results. The whole person details are ignored and the next person in the pdf is retrieved (left 34.90, cost 259, actual 32)
How is it possible to approach this situation? I tried fixing the regex expression in different ways, but did not work.
The issue is only with names that split into two lines. Detects all other names falling in a single line along with 'Weekly'