Benchmarking a BadRAM patched system

On this page:

Elsewhere:

Terms: I shall coin the term hole for a faulty byte in RAM, and refer to such faulty RAM modules as BadRAM. To contradict, classical (hole-free) RAM will be referred to here as OK RAM. Many BadRAMs contain holes all over, spread in a regular pattern, but I have developed a patch that makes Linux run smoothly on such RAMs.

Description of available hardware

My computer used to run with 128 MB of flawless RAM, with a CAS timing of 2. It has TLB and cache, as any Pentium-II system. In the new situation, I added two RAM modules of 32 MB each, each with holes, and each with a CAS timing of 3.

The first BadRAM has 512 holes, spread through the 8MB-16MB range of its 32MB. The second BadRAM has 256 holes, spread through the 0MB-8MB range of its 32MB.

The two interesting cases to compare would be:

  1. The OK RAM only,
  2. The BadRAM only.

Correction of Influential Factors

Factors of influence on this measurement are:

  1. The memory size influences buffering and so on,
  2. The different CAS timing for the OK RAM and the BadRAM,
  3. The pages sacrificed because they contain a hole reduce the size of available RAM,
  4. Networking, daemons, the weather and quantum-mechanical non-determinism.

These factors are dealt with as follows:

  1. The used memory size will be equaled between the two tests (using the LILO boot option mem=...),
  2. The BIOS will be instructed to assume a CAS timing of 3 in all cases; leaving a BadRAM in after the used region of RAM will help to convince the BIOS that this is the right value,
  3. The amount of flawless memory offered to Linux will be reduced to the actually available amount in the BadRAM case (64MB minus 3MB of sacrificed pages in this case on i386),
  4. The tests are performed in single user mode (no networking) by root and be the averages of 5 independent measurements.
I hope and expect this accounts for all possible problems.

Note: Why reduce the flawless memory with the pages that are sacrificed from the BadRAMs? Well, the point I intend to demonstrate is that BadRAM performs equally well as normal RAM after bad pages have been taken out. So, I should compare the 61MB of BadRAM with 61MB out of the flawless RAM.

Software: The measurements are performed with lmbench-2alpha10.

Measurements

These are the available results for these measurements:
[A] 61MB of OK RAM[B] 64MB of BadRAM, 3MB wrong
MeasurementsResultset #1Resultset #1
Resultset #2Resultset #2
Resultset #3Resultset #3
Resultset #4Resultset #4
Resultset #5Resultset #5
Linux informationdmesgdmesg
/proc/meminfo/proc/meminfo
/proc/cmdline/proc/cmdline
LMbanch' madeseesee
statsstats
The following subsections deal with the latency tables in the make see results. Bandwidths are not discussed, as they are more likely to be influenced by the fact that they address different RAMs than by the BadRAM algorithms (which take no part in them).

The dmesg values reported for memory are different:

Memory: 60244k/62464k available (940k kernel code, 416k reserved, 804k data, 60k init, 0k badram)
Memory: 60212k/65536k available (940k kernel code, 416k reserved, 836k data, 60k init, 3072k badram)
The first line shows that no pages received a `BadRAM' treatment, and therefore, that no influence of BadRAM routines on runtime performance is possible. Note the difference in data segment for the kernel; no doubt, this is because bad pages are stored in the page tables, even though the memory is never made available.

Processor, Processes

From the make see results, one table of interest is the process(or) table, which are almost equal for both measurements. These tables are:

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh
                             call  I/O stat clos       inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   8K
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   8K
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   8K
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   8K
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   8K
for OK RAM, and for BadRAM is:
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh
                             call  I/O stat clos       inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.5    3 0.3K   2K   9K
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   9K
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   9K
i686-linu  Linux 2.2.14  351  0.8  1.2    7    9 0.04K  2.6    3 0.3K   2K   9K
i686-linu  Linux 2.2.14  351  0.9  1.2    7    9 0.04K  2.6    3 0.3K   2K   9K
The difference is mainly in the last line, which is 8K for OK RAM, and 9K for BadRAM. What does that mean?

Context Switching

The tables for context switching times are:

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
i686-linu  Linux 2.2.14    1     19     58    19    106      22     192
i686-linu  Linux 2.2.14    1     19     58    19    125      23     192
i686-linu  Linux 2.2.14    1     19     58    19     97      26     192
i686-linu  Linux 2.2.14    1     18     58    19    125      22     192
i686-linu  Linux 2.2.14    1     19     58    19    108      26     192
for OK RAM, and for BadRAM it is:
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
i686-linu  Linux 2.2.14    1     18     58    19    131      23     192
i686-linu  Linux 2.2.14    1     19     58    19    104      24     191
i686-linu  Linux 2.2.14    1     18     58    19     94      22     192
i686-linu  Linux 2.2.14    1     18     58    19     92      24     192
i686-linu  Linux 2.2.14    1     19     58    19    112      24     192
The averages for these columns are:
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Measurement             2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--- -------- ---------- ----- ------ ------ ------ ------ ------- -------
[A] 61MB     OK RAM       1    18,8   58.0   19.0   112    23.8     192
[B] 64MB-3MB BadRAM       1    18.4   58.0   19.0   107    23.4     192
The last measurement fell out a little lower, but the result was rounded out; I have some difficulties believing in more than 3 digits of true value for a measurement of 5 minutes. To my utter surprise, BadRAM seems to cause improvements for the other values! I have the tendency to assign that to the measurements.

Local Communication Latencies

The latency tables for local communication are, for OK RAM:

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
and for BadRAM:
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
i686-linu  Linux 2.2.14     1     9   17
How boring; any differences fall under the benchmark's threshold :).

Virtual Memory Latencies

The tables for context swithing times are, for OK RAM:

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page
                        Create Delete Create Delete  Latency Fault   Fault
--------- ------------- ------ ------ ------ ------  ------- -----   -----
i686-linu  Linux 2.2.14     19      2     85      3     4624     1    0.8K
i686-linu  Linux 2.2.14     19      2     85      3     4603     1    0.8K
i686-linu  Linux 2.2.14     19      2     82      3     4656     1    0.8K
i686-linu  Linux 2.2.14     19      2     81      3     4690     1    0.8K
i686-linu  Linux 2.2.14     19      2     76      3     4642     1    0.8K
and for BadRAM:
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page
                        Create Delete Create Delete  Latency Fault   Fault
--------- ------------- ------ ------ ------ ------  ------- -----   -----
i686-linu  Linux 2.2.14     19      2     85      3     4647     1    0.7K
i686-linu  Linux 2.2.14     19      2     85      3     4361     1    0.7K
i686-linu  Linux 2.2.14     19      2     85      3     4472     1    0.7K
i686-linu  Linux 2.2.14     19      2     79      3     4444     1    0.7K
i686-linu  Linux 2.2.14     19      2     76      3     4500     1    0.7K
The averages for these columns are:
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Measurement               0K File      10K File      Mmap    Prot    Page
                        Create Delete Create Delete  Latency Fault   Fault
--- -------- ---------- ------ ------ ------ ------  ------- -----   -----
[A] 61MB     OK RAM         19      2   81.8      3     4643     1    0.8K
[B] 64MB-3MB BadRAM         19      2   82.0      3     4485     1    0.7K
Here too, there are no signs of worse performance caused by BadRAM. We are not interested in the question whether BadRAM performs better than OK RAM, just whether there is a performance loss when replacing OK RAM with a same amount of OK memory in BadRAM.

Memory Latency

The tables for memory latency are, for OK RAM:

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------   ---  ----   ----    --------    -------
i686-linu  Linux 2.2.14   351     8     62         163
i686-linu  Linux 2.2.14   351     8     62         163
i686-linu  Linux 2.2.14   351     8     62         163
i686-linu  Linux 2.2.14   351     8     62         163
i686-linu  Linux 2.2.14   351     8     78         163
and for BadRAM:
Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------   ---  ----   ----    --------    -------
i686-linu  Linux 2.2.14   351     8     78         163
i686-linu  Linux 2.2.14   351     8     62         163
i686-linu  Linux 2.2.14   351     8     62         163
i686-linu  Linux 2.2.14   351     8     62         163
i686-linu  Linux 2.2.14   351     8     62         163
And these results show no distinction between OK RAM and BadRAM performance either.

Note: I am unsure what to do with the `check graphs' message.

Conclusion

BadRAM performs equally well as normal RAM after bad pages have been taken out.

This is as expected. There is no influence to be expected, because the BadRAM's bad pages are never supplied to the kernel allocation routines of Linux. Although the regular appearance of holes in a RAM leads to increased fragmentation of page ranges, this is not a major problem because most memory is user space memory, which is allocated page-by-page anyway. In user space, memory page regions are formed from single pages through the MMU.

Reactions

If you are interested in this project, you are welcome to enter a response in my guestbook for this page. It is also a way to propose ideas that others can in turn respond to. And it's far less crowded than the linux-kernel list :-) And of course, you can also choose to mail me on vanrein@zonnet.nl.

My current snail mail address is:

	Rick van Rein
	Geulstraat 96
	7523 TW Enschede
	the Netherlands
This overrules the address in the patch's documentation, which is intended as a longer-lasting address.


Visit the meta index for an overview of web sites by me.