Removing Byte-Order-Mark using sed

Working with UTF-8-encoded PHP files in web applications, a common, hard-to-track-down error is the following: “Headers already sent” or “Cannot modify header information“. This usually happens during a call to the function header(), which manipulates the HTTP header.

One reason for this is that the UTF-8 file starts with an invisible(!) byte order mark (BOM) consisting of the three bytes 0xEF,0xBB,0xBF. The BOM can be removed by opening the file in a suitbale text editor and unticking the Add Byte Order Mark (BOM) .option (or similar).

A more convenient way using sed is the following:

(-i enables in-place operation of sed; 1 denotes that one replacement should happen; ^ denotes the start of a line)

Example

Let’s consider a file consisting of two lines (‘A’, ‘B’) stored with the BOM:

Investigating this file with the hex tool od, :

we obtain the following output:

The three BOM bytes are clearly visible.

After running

The output looks as follows, proving that the BOM is gone:

References

Leave a Reply

Your email address will not be published. Required fields are marked *

Please type the characters of this captcha image in the input box

Please type the characters of this captcha image in the input box