What Character Encoding does SagePay Use?

You can look through the SagePay website, the forums, the documentation, but try as you might, there is no statement on what character encoding SagePay uses when posting billing addresses, cart contents etc.

I’ll tell you what it is: ISO-8859-1

Yes, it is an 8-bit extended ASCII encoding, from the days of DOS.

If your website uses UTF-8, then do ensure you recode any strings you send to SagePay, otherwise you will end up with ŝŏmë réãllý wëírd characters (or “Å?Å?më réãllý wëírd” when that is submitted to SagePay through the WooCommerce SagePay Form payment gateway, at version 1.1.1).

Web developers: get to grips with character encoding, HTTP transfer methods and what it all means when remote systems need to interact.

PHP will recode UTF-8 to ISO8859-1 using the utf8_decode() function. However, do be careful when using that function, as it will fail if the UTF-8 string is not valid – and there are many ways in which the byte stream can be invalid, or corrupt, depending on where the string came from.

There are also a bunch of other encoding functions you could use, but again, be careful you know exactly where your string came from and how valid it is.

If there is a way to send UTF-8 encoding to SagePay, then I would love to know. However, the fact that the documentation is totally silent on the issue, means I do not have high hopes.


 

Update January 2015

Just got a tweet from @SagePaySupport informing me that SagePay supports “the first 250 characters of UTF-8”. That seemed a bit of a strange statement. It may be technically correct, though perhaps a little misleading.

UTF-8 encodes unicode characters as a series of bytes. The first 250 unicode characters are the same characters as the ISO8859-1 character set. So the first 250 characters of UTF-8 are the same as ISO8859-1. What differs is the encoding. While ISO8859-1 encodes each character as  a single byte, UTF-8 will encode those first 250 characters as either one or two bytes.

So maybe SagePay will accept some UTF-8 characters (though I suspect not) but their internal storage seems to be well and truly stuck with the extended ASCII of ISO8859-1. That may well be dictated to some extent by the systems that SagePay needs to connect to, but that is really no excuse for handling only single-byte extended ASCII between the websites and the back-end reporting and monitoring tools.

Now, there is a glimmer of hope. I have read a post that says UTF16BE works with SagePay. I’ll give that a try and see what happens. Not holding my breath though 🙁

3 Responses to What Character Encoding does SagePay Use?

  1. arooj 2013-01-16 at 12:17 #

    Thanks. It’s been really useful. After searching for hours at-least found the solution 🙂

  2. Andrew 2013-12-17 at 21:25 #

    I’ve been struggling this too! It really is the dark ages.

    • Jason Judge 2013-12-17 at 22:17 #

      Cool, glad it has been of help.

      What are you using for your shop payment gateway? I’ve put together a SagePay library that tries to handle all the mysteries of the SagePay documentation. You can find it here: https://github.com/academe/SagePay

      There is also OmniPay – https://github.com/omnipay – which is getting very popular, especially now its monolithic structure has been split up into separate gateway repositories.

Leave a Reply