Byte Order Mark Madness
Last modified 2024/12/07 11:48We had an issue today where a CSV file was being read via. SplFileInfo
.
The array keys were being inferred from the headers, and the values transformed based on rules per column.
The issue was that the column was not found:
// data read from CSV
$data = [
'SALUTATION' => '1',
];
var_dump($data['SALUTATION']); // UNDEFINED INDEX ERROR
Very confusing!
It turns out that the CSV file was prefixed with a byte order mark, which was
being incorrectly parsed into the first header (SALUTATION
).
As we found out using:
var_dump(array_map(function (string $key) {
return array_map(function (string $char) {
return ord($char);
}, str_split($key));
}, array_keys($result)));
Resulting in:
0 => array:13 [
0 => 239
1 => 187
2 => 191
3 => 83
4 => 65
5 => 76
6 => 85
7 => 84
8 => 65
9 => 84
10 => 73
11 => 79
12 => 78
]
Where 83 .. 78
are the chars for SALUTATION
and 187
, 191
and 83
are the extra chars.
We googled these numbers and discovered that they are a byte-order-mark, and removing them fixes the issue.
Not sure why this BOM causes an issue (other files also have BOM but are seemingly correctly parsed).