The Payment Card Industry Data Security Standard requires protection of stored cardholder data (Primary Account Number, or PAN) using any of the following approaches (Requirement 3.4):

payment-account-number-pan

  • One-way hashes based on strong cryptography, (hash must include the entire PAN);
  • Truncation (hashing cannot be used to replace the truncated segment of the PAN);
  • Index tokens and pads (pads must be securely stored);
  • Strong cryptography with associated key-management processes and procedures.

Let's look at the first option more in detail.

What is hashing?

Hashing is a form of one-way encryption, whereby a data element is transformed into a unique fixed size data element (hash value or hash) without a way to get the original data element from the hash value.

For example, if we apply SHA-256 hash algorithm to a test a PAN, such as:

4111 1111 1111 1111

we get a hash value of:

9bbef19476623ca56c17da75fd57734dbf82530686043a6e491c6d71befe8f6e

We also know that any other PAN, when hashed, will result in a different unique value of the same size, and there is no way to get the PAN data if we only know the hash value.

Why and when do we need hashing?

Since PAN data cannot be reverted to its original format from hash values, it may only be used in certain scenarios with specific objectives in mind.

For example, if a cardholder buys a rail ticket online and comes to an unattended terminal to pick up the ticket, the merchant may ask the cardholder to insert the card that was used to buy the ticket into the terminal to make sure that the cardholder has a physical card (card-not-present transaction, card-present verification at the time of pick-up). To avoid storage of PAN data, the merchant may choose to store only hash values. At the time of the ticket pick-up, the PAN of the inserted card can be hashed and the hash value can be compared to the one of the card used to purchase the ticket online (these should match).

Another example is a Fraud Management system that stores hash values of PAN data in order to be able to identify all transactions made with the same card (the hash value of the same PAN will always be the same). By storing only hash values of PAN data within the Fraud Management system you are not exposing real PAN data even to Fraud Analysts.

It is advised, however, not to use hashing unless you have one of these specific objectives as part of your business requirements. We have seen scenarios where hash values of PAN data are stored without proper business justification, usually as part of legacy system. Storage of hashed PAN data can lead to all sorts of problems with PCI Compliance, so this should only be done if needed.

Implications for PCI DSS Compliance

With early versions of the PCI Data Security Standard it was still quite common to deem hashed PAN data out of scope, i.e. not a cardholder data anymore, as there is no way to retrieve the PAN by just knowing the hash.

PCI DSS 2.0 (which was released in October 2010) contained a new note within requirement 3.4 stating that it is a relatively trivial effort for a malicious individual to reconstruct original PAN data if they have access to both the truncated and hashed version of a PAN. In fact, we believe that it is possible for anyone to reconstruct original PAN data if they have access only to hashed version of a PAN.

Let's analyse the first scenario

If you store the hash value of the test PAN 4111 1111 1111 1111 (9bbef19476623ca56c17da75fd57734dbf82530686043a6e491c6d71befe8f6e) and a truncated version of the PAN (411111******1111) in the same database or even in the same table, you open the door for an easy data compromise. If a malicious individual has access to these data elements and knows the hashing algorithm (which is easy to guess based on the fixed size of the hash value), all they have to do is calculate SHA-256 hash values for all possible options from 411111(000000)1111, 411111(000001)1111, 411111(000002)1111 to 411111(999999)1111 and find the one which matches the value of 9bbef19476623ca56c17da75fd57734dbf82530686043a6e491c6d71befe8f6e (Pic 1).

This kind of attack is also known as 'brute force' attack. In this scenario a malicious individual has to calculate one million hash values, which is not difficult when you consider the computing power of a simple laptop.

protect-hashed-cardholder-data-pci-dss-brute-force

Note: To avoid long processing time for brute force attacks, it is possible to use pre-calculated table of hash values of all possible PANs, however, in this example calculation of one million hash values will not be a problem to an attacker.

This is the main reason why requirement 3.4 of PCI DSS 3.0 (and so does the upcoming PCI DSS version 3.1) states that:

"Where hashed and truncated versions of the same PAN are present in an entity’s environment, additional controls should be in place to ensure that the hashed and truncated versions cannot be correlated to reconstruct the original PAN"

How do we protect ourselves from 'brute-force' attacks and meet Requirement 3.4?

One of the most common answers we see is using so-called 'salted hashing', whereby a secret value is added to each PAN before performing a hash algorithm.

For example, the hash value of:

4111111111111111$alt

is

2a2e033b311bf13f86b5aa2d9751b59044d5d103edbdfd5090414db1ff0a4220

and it will not match any of hash values calculated in a brute-force attack shown in Pic 1. In other words, now an attacker must know the salt value to be able to perform a successful brute-force attack. This is a very effective measure as long as the salt value (the best practice is to use a random value at least 32 bit long) is kept secret, i.e. protected. You could argue that you must now protect the salt value in the same way as you would protect encryption key for encrypted PAN data. This can prove to be quite cumbersome and depends a lot on the environment.

Other examples of the 'additional controls' detailed in PCI DSS requirement 3.4 are: use of separate storage systems, one for hashed and one for truncated PAN data, that are isolated from each other using segmentation and/or separate access controls; configuring file/database systems to prevent the existence of any cross-references or links between a hash value and a truncated PAN data; use of real-time monitoring and dynamic response to detect and prevent requests to access correlating PAN values.

Important note about this post

After taking under consideration some appropriate comments of our LinkedIN page followers, we have decided to review this article accordingly. We want to remind our readers that this content describes few particular scenarios of specific business use cases within the payment card industry and some of the problems arising from those scenarios. It is not intended to go in-depth into realms of hashing nor to propose a generalized type of approach.

Advantio is a Qualified Security Assessor (QSA). Feel free to contact our team of experts to get more information or advice applicable to your particular environment.

Irmantas Brazaitis

Written by Irmantas Brazaitis

PCI QSA and Information Security professional with a vast experience within payment card industry, I have got a sound experience in ATM security having worked for global payment service provider alongside the Fraud team, involved in end-to-end fraud prevention process (from monitoring of suspicious transactions to seizure of criminals).

Certified as Qualified Security Assessor (QSA) by PCI Security Standards Council.