How did you like the article?
3
How did you like the article?
3

ECC RAM – the error-correcting code memory

Security is of paramount importance in many industries when it comes to processing data. Companies handling important business processes are reliant on data stability, provided  from the likes of a hosting service that stores their customers’ information. If there’s a serious memory error, it’s not just a financial loss that occurs, but a company’s position on the market can be seriously weakened if worst comes to worst. The more memory that’s stored, the more likely it is that errors will occur. This is why it’s so important to place great emphasis on comprehensive protection of data in work and server environments that require high data integrity. For example, ECC RAM is used in place of ordinary memory in order so that single-bit errors can be prevented.

ECC RAM: background and definition

Random Access Memory (RAM) is a storage medium used in computer systems as a memory. It’s also known as the main memory and is responsible for the execution of programs including the resulting user data. The volatile contents of the main memory are stored as binary code, which consists solely of zeros and ones, which makes it easier for the computer to process them. A single binary digit is called a 'bit'. These various causes

  • Voltage variations,
  • Overclocking,
  • Defective and old storage modules,
  • or energetic emission

can lead to a bit error whereby memory entry is changed. This is where a bit assumes the wrong value, i.e. '1' instead of '0' and vice versa. This is hardly noticeable in many applications. If a bit error occurs, for example, when working with an image-editing program, one pixel might receive a different color, which isn’t noticeable to the human eye. On the other hand, it is quite different in complex databases or calculation applications where a single bit error can lead to fatal consequences. In addition, a bit error can cause system crashes when it occurs in a part of the memory used by the operating system.

The simple solution to the problem is error correcting code (ECC). This is a data code which has the ability to detect and correct single bit errors. In addition, ECC can detect rare two-bit errors. In order to benefit from this error correction method, ordinary RAM modules are extended by an ECC memory chip, which is where ECC RAM comes into play.

How the error correction process works

The error correction process for single-bit errors (which is used for RAM modules) was developed in 1950 by the mathematician, Richard Hamming, which is why the code is called the Hamming code. The special feature of this code is that several parity bits are used. They are also known as control bits and form different validation groups with the actual useful bits. If you want to use the Hamming code for single-bit error correction, you require a seven-digit binary code, consisting of three parity bits (P), four useful bits (N), and three validation groups. The parity bits are thereby set to the code word positions, whose number is a power of 2, in this example, 1, 2, and 4:

The validation groups of the parity bits of the received bit sequences are compared with the stored bit sequences. An error will always occur when the total number of bits with the value 1 is odd.

Applied to the exemplary bit sequence 0001001, the Hamming code determines the error as follows:

  • The validation group of parity bit 1 (1, 3, 5, 7) contains a bit with the value 1 and is therefore incorrect.
  • The validation group of parity bit 2 (2, 3, 5, 7) contains a bit with the value 1 and is therefore incorrect.
  • The validation group of parity bit 3 (4, 5, 6, 7) contains a bit with the value 1 and is therefore correct.

Since code word position 3 is present in the first two incorrect validation groups, this is where the error is. The correct bit sequence is 0011001.

ECC RAM – also suitable for personal use?

ECC fully protects the main memory against single bit errors and thereby prevents a large portion of possible data storage errors. Closely linked to this is the reduction of system crashes, which is particularly important for services or applications that guarantee high availability and have to serve a large number of users. These ECC RAM advantages ensure that the special memory modules are particularly required as a server RAM solution and are part of the compulsory program in high-performance centers.

ECC RAM has minor disadvantages, however, compared to non-ECC RAM: on the one hand, the error-correcting memory modules are somewhat more expensive than the usual working memory modules, and the error detection process leads to an average 2% decrease in the system’s performance. Also, ECC RAM is not supported on all mainboards. So, if you plan on using ECC RAM on a normal board, you should first check the compatibility and assess the benefits. A combination of ECC RAM and non-ECC RAM is not possible. By default, your personal computer or server comes with an ordinary working memory module without error correction.

Network Operating Systems Storage Systems