cpuburn-1.4/000075500000000000000000000000001213405604500130325ustar00rootroot00000000000000cpuburn-1.4/Design000064400000000000000000000133361213405604500141740ustar00rootroot00000000000000I wrote these programs to fill a vacuum. Chris Brady's memtest-86 is an excellent program for testing memory, but I wanted something that would do stability testing for CPUs since I had decided to overclock my pair of Celeron 366's on an Abit BP-6 motherboard. No comments from the peanut gallery. burnBX was added to test RAM & controller stability Other than much vilified overclockers, other people may find these programs useful. System builders may wish to test their systems and heatsinks. PC buyers may wish to test their systems, particularly if they have doubts about the builder's expertise. Leaving out thermal interface material (grease) on the heatsink is a likely flaw. The usual advice is to run kernel compiles. This is dangerous since a crash will certainly corrupt the filesystem with all the files make -j 4 will have open. Worse, I doubt that gcc has any significant FPU code. Worse still, gcc is compiled with gcc, and I doubted that it would produce highly optimized code. Since I couldn't find anything, I decided to write it. It's certain that Intel and other CPU manufacturers have devoted enormous effort to CPU testing. They have some programs for stability testing and parts speed rating ("binning"). Some of these (HIPWR30.EXE) are available to qualified Intel customers under NDA. I wanted a program that would load the CPU to maximum. Unintentionally, code optimization does this. I chose a base of FPU code (DDOT) since I believed from 8087 days that the FPU consumes alot of current, and was untested by gcc. Then integer instructions were slipped into their shadow to try to keep the other P6 ports loaded. Agner Fog's excellent article helped quite a bit. Trial and much error. I also tried to chose data (all-bits-lit) that would maximize power consumption. But I do not claim that my code is the most optimized nor the most power consuming. There could always be better. Once I found lm-sensors, I could measure the results of my efforts. Subject thermister vagaries, here are my results [revised]: 29'C at idle (hlt) 41' doing idle loop 46' mprime95 (as-is or reniced -19) 47' make -j 4 on kernel 47' 2 * burnP5 (estimated) 47' 2 * burnBX L (default, 4 MB) 48' 2 * burnMMX L 48' 2 * burnK6 (estimated) 50' 2 * burnMMX F (default, 64 kB in L2) 51' 2 * burnMMX D (16 kB, L1 cache) 51' 2 * burnP6 on zeroes for data 52' 2 * burnP6 with FF's for data All at 2 * 5.5 * 97 MHz (26'C ambient). Higher and my CPU1 will lockup under burnP6 in 5-10 min . kernel compiles are stable to 99 MHz for 24 h. But 98 MHz will give `burnBX` errors every 5-8 hours, and 95 MHz will give burnMMX D errors every ~6 hours, so now I run 94 MHz. Errors seem to increase 10x for every 1 MHz. I got tired of waiting for temperature steady-state so I measure current instead. Mostly I use the ATX power harness as a shunt, and measure current by voltage drop. Email for details. This permits testng many different instruction mix ideas quickly. As it turns out, the orignal burnP6 is close to the best I've found, needing only minor tweaking for a 2% improvement. The optimum burnK6 is also fairly similar, with just minor architectural adjustments for AMD. I also did some measurements with an inductive ammeter. They gave 90% of the estimated maximum datasheet current draw for burnP6. So I'm fairly happy with the code. But suggestions for improvement are most welcome. I don't claim this code is perfect, nor that it will catch all system deficiencies. BURNBX: This program has been quite frustrating to develop. It's hard to measure the results. I've finally hit on a reasonable pattern (walking bit through carry, inverted every quadword except for cacheline leadoff) that really brings out errors, and occasional lockups (more on FreeBSD). The 82443BX only gets to 42'C. Essentially, burnBX is a RAM tester, using whatever pages the OS allocates to the process. As such, it cannot test kernel RAM. But it is designed to be very intense, using the P6 optimized `rep movsd` instructions. Please note that burnBX is _not_ optimal on AMD K6 based systems because they don't have the optimized `rep mosvd` block move. Beta testers have mostly reported quick error terminations. Their impact should not be minimized, because such a data error could occur in kernel code, causing system crashes. The errors may be from the CPU/BX bus, in which case ECC RAM will not help. The cause is not perfectly clear, but general case & 440BX cooling helps and so does an adequate powersupply. 300W is suggested. Errors on my "instrumented" version of burnBX have not been isolated to one memory cell but have been distributed across many addresses and a few bits [only one at a time]. It is suspected that there is a bus or transistor driver problem. Or there may be undetected transients in the 3.3 voltage. REVISED BURNMMX: I started this project as simply a way for AMD system owners to check out their systems. I was very surpised when my own system started throwing errors with the MMX memory moves, and had to downclock from 2 * 5.5 * 97 MHz to 94 MHz. It would seem that the simple memory moves are more fragile (less robust to interrupts) than the 2% higher bandwidth string moves. BURNK7: I finally bought an AMD Athlon and had to write a tester even though I don't overclock it. Writing burnK7 was much trial and error, but the ammeter gave me immediate feedback on my efforts. The powerful K7 core was easy and fun to optimize. I parallel pathed DDOT to remove a dependancy, and could have gone much further, but current didn't increase, so I stuffed in integer instructions which did increase current. On my 850 Thunderbird, burnK7 draws 9% more power than burnK6. Robert Redelmeier redelm@ev1.net June 15, 2001 cpuburn-1.4/Makefile000064400000000000000000000001201213405604500144630ustar00rootroot00000000000000all : burnP5 burnP6 burnK6 burnK7 burnBX burnMMX .S: gcc -s -nostdlib -o $@ $< cpuburn-1.4/README000064400000000000000000000074071213405604500137220ustar00rootroot00000000000000N E W burnK7 for the AMD Athlon/Duron has been released. These programs are designed to load x86 CPUs as heavily as possible for the purposes of system testing. They have been optimized for different processors. FPU and ALU instructions are coded an assembler endless loop. They do not test every instruction. The goal has been to maximize heat production from the CPU, putting stress on the CPU itself, cooling system, motherboard (especially voltage regulators) and power supply (likely cause of burnBX/MMX errors). burnP5 is optimized for Intel Pentium w&w/o MMX processors P6 is for Intel PentiumPro, PentiumII&III and Celeron CPUs K6 is for AMD K6 processors K7 is for AMD Athlon/Duron processors MMX is to test cache/memory interfaces on all CPUs with MMX BX is an alternate cache/memory test for Intel CPUs TO USE: root priviliges are NOT required. It has been designed for ELF Linux, but also tested under FreeBSD. and a.out. Burn Testing is best done from a ramdisk distribution (tomsrtbt) or with filesystems unmounted or mounted read-only. untar the source in a convenient directory: `tar zxf cpuburn` compile excutables `make` run desired program in background [ _repeat_ for SMP]: `burnP6 || echo $? &` Monitor progress of cpuburn by `ps`. When finished, `kill` the burn* process(es). If you have temperature probes (fingers) or the lm-sensors package, you can check your CPU temperature and/or system voltages. If an error occurs in calculations, it will be preserved, and the program will terminate with error code 254 for an integer/memory error, and error code 255 for a FP/MMX error. Error checking happens every 10-40 sec for burnP6/K6/K7 and I haven't seen any CPU errors in testing [lockups occur first]. burnBX and burnMMX check for error every 512 MB (4-10 sec), and error termination is frequently seen, lockups are rarer. burnBX and burnMMX are essentially very intense RAM testers. They can also take an optional parameter indicating the RAM size to be tested: A = 2 kB E = 32 kB I = 512 kB M = 8 MB B = 4 F = 64 J = 1 MB N = 16 C = 8 G = 128 K = 2 O = 32 D = 16 H = 256 L = 4 P = 64 `burnBX L` (4 MB) and `burnMMX F` (64 kB) are the default sizes. A-E mostly test L1 cache, F-H test L2 cache, and H-P force their way to RAM. But even A-E will have some cacheline writeouts to RAM. In spite of it's name, burnBX can be run on any chipset [RAM controller] and tests alot more than the RAM controller. Unfortunately, burnBX is not optimal on AMD processors. burnMMX is preferable for any CPU that has an MMX unit. burnBX/MMX needs about 72 MB of total RAM + swap to start (not necessarily free), but doesn't use this much unless you request it. They will throw a `Sig 11` if you don't have enough swap. If you don't want to add more, you can adjust the .bss section downward as indicated in the source comments. I use very simple memory management. They can also test swap, and at least on my system, I can run 2*`burnBX 8` with 128 MB SDRAM with some use of swap, but no excessive thrashing[seeks]. YMMV. If sub-spec, your system may lock up after 2-10 minutes. It shouldn't. burn* are just an unpriviliged user processes. But it probably means your CPU is undercooled, most likely no thermal grease or other interface material between CPU & heatsink. Or some other deficiency. A power cycle should reset the system. But you should fix it. Robert Redelmeier redelm@ev1.net *** WARNING *** This program is designed to heavily load CPU chips. Undercooled, overclocked or otherwise weak systems may fail causing data loss (filesystem corruption) and possibly permanent damage to electronic components. Nor will it catch all flaws. *** USE AT YOUR OWN RISK *** cpuburn-1.4/burnBX.S000064400000000000000000000047111213405604500143610ustar00rootroot00000000000000# cpuburn-1.4: burnBX Chipset/DRAM Loading Utility # Copyright 2000 Robert J. Redelmeier. All Right Reserved # Licensed under GNU General Public Licence 2.0. No warrantee. # *** USE AT YOUR OWN RISK *** .text #ifdef WINDOWS .globl _main _main: movl 4(%esp),%eax movl $12, %ecx # default L = 4 MB subl $1,%eax # 1 string -> no paramater jz no_size movl 8(%esp),%eax # address of strings movl 4(%eax),%eax # address of first paramater movzb (%eax),%ecx # first parameter - a byte no_size: subl $12, %esp # stack allocation #else .globl _start _start: subl $12, %esp #stack space movl 20(%esp), %eax movl $12, %ecx # default L = 4 MB testl %eax, %eax # is a param given? jz no_size movl (%eax), %ecx no_size: #endif decl %ecx andl $15, %ecx movl $256, %eax shll %cl, %eax movl %eax, 4(%esp) # save blocksize movl $256*1024, %eax shrl %cl, %eax movl %eax, 8(%esp) # save count blks / 512 MB movl 4(%esp), %ecx shrl $4, %ecx movl $buffer, %edi xorl %eax, %eax notl %eax more: # init fill of 2 cachelines movl %eax, %edx # qwords F-F-0-F , F-0-F-0 notl %edx movl %eax, 0(%edi) movl %eax, 4(%edi) movl %eax, 8(%edi) movl %eax, 12(%edi) movl %edx, 16(%edi) movl %edx, 20(%edi) movl %eax, 24(%edi) movl %eax, 28(%edi) movl %eax, 32(%edi) movl %eax, 36(%edi) movl %edx, 40(%edi) movl %edx, 44(%edi) movl %eax, 48(%edi) movl %eax, 52(%edi) movl %edx, 56(%edi) movl %edx, 60(%edi) rcll $1, %eax # walking zero, 33 cycle leal 64(%edi), %edi # odd inst to preserve CF decl %ecx jnz more cld thrash: # MAIN LOOP movl 8(%esp), %edx mov_again: movl $buffer, %esi movl $buf2, %edi movl 4(%esp), %ecx rep # move block up movsl movl $buffer + 32, %edi movl $buf2, %esi movl 4(%esp), %ecx subl $8, %ecx rep # move block back shifting movsl # by 1 cacheline movl $buffer, %edi movl $8, %ecx rep # replace last c line movsl decl %edx # do again for 512 MB. jnz mov_again movl $buffer, %edi # DATA CHECK xorl %ecx, %ecx .align 16, 0x90 test: mov 0(%edi,%ecx,4), %eax cmp %eax, 4(%edi,%ecx,4) jnz error incl %ecx incl %ecx cmpl 4(%esp), %ecx jc test jmp thrash error: # error abend movl $1, %eax #ifdef WINDOWS addl $12, %esp # deallocate stack ret #else movl $-2, %ebx pushl %ebx # *BSD syscall convention pushl %eax int $0x80 #endif .bss # Data allocation .align 32 .lcomm buffer, 32 <<20 # reduce both to 8 <<20 for only .lcomm buf2, 32 <<20 # 16 MB virtual memory available # cpuburn-1.4/burnK6.S000064400000000000000000000023521213405604500143270ustar00rootroot00000000000000# cpuburn-1.4: burnK6 CPU Loading Utility # Copyright 1999 Robert J. Redelmeier. All Right Reserved # Licensed under GNU General Public Licence 2.0. No warrantee. # *** USE AT YOUR OWN RISK *** .text #ifdef WINDOWS .globl _main _main: #else .globl _start _start: #endif finit pushl %ebp movl %esp, %ebp andl $-32, %ebp subl $96, %esp fldpi fldl rt fstpl -24(%ebp) fldl e fstpl -32(%ebp) movl half, %edx movl %edx, -8(%ebp) after_check: xorl %eax, %eax movl %eax, %ebx lea -1(%eax), %esi movl $400000000, %ecx movl %ecx, -4(%ebp) .align 32, 0x90 crunch: fldl 8-24(%ebp,%esi,8) # CALC BLOCK fmull 8-32(%ebp,%esi,8) addl half+9(%esi,%esi,8), %edx jnz . + 2 faddp fldl 8-24(%ebp,%esi,8) decl %ebx subl half+9(%esi,%esi,8), %edx jmp . + 2 fmull 8-32(%ebp,%esi,8) incl %ebx decl 8-4(%ebp,%esi,8) fsubp jnz crunch # time for testing ? test %ebx, %ebx # TEST BLOCK jnz int_exit cmpl half, %edx jnz int_exit fldpi fcomp %st(1) fstsw %ax sahf jz after_check decl %ebx int_exit: decl %ebx addl $96, %esp popl %ebp movl $1, %eax #ifdef WINDOWS ret #else push %ebx push %eax int $0x80 #endif .align 32,0 half: .long 0x7fffffff,0 e: .long 0xffffffff,0x3fdfffff rt: .long 0xffffffff,0x3fefffff cpuburn-1.4/burnK7.S000064400000000000000000000025651213405604500143360ustar00rootroot00000000000000# cpuburn-1.4: burnK7 CPU Loading Utility # Copyright 2000 Robert J. Redelmeier. All Right Reserved # Licensed under GNU General Public Licence 2.0. No warrantee. # *** USE AT YOUR OWN RISK *** .text #ifdef WINDOWS .globl _main _main: #else .globl _start _start: #endif finit pushl %ebp movl %esp, %ebp andl $-32, %ebp subl $96, %esp fldl rt fstpl -24(%ebp) fldl e fstpl -32(%ebp) fldpi fldpi xorl %eax, %eax xorl %ebx, %ebx xorl %ecx, %ecx movl half, %edx lea -1(%eax), %esi movl %eax, -12(%ebp) movl %edx, -8(%ebp) after_check: movl $850000000, -4(%ebp) .align 32, 0x90 crunch: fxch # CALC BLOCK fldl 8-24(%ebp,%esi,8) # 17 instr / 6.0 cycles addl half+9(%esi,%esi,8), %edx fmull 8-32(%ebp,%esi,8) faddp decl %ecx fldl 8-24(%ebp,%esi,8) decl %ebx incl 8-12(%ebp,%esi,8) subl half+9(%esi,%esi,8), %edx incl %ecx fmull 8-32(%ebp,%esi,8) incl %ebx decl 8-4(%ebp,%esi,8) jmp . + 2 fsubp %st, %st(2) jnz crunch # time for testing ? test %ebx, %ebx # TEST BLOCK jnz int_exit test %ecx, %ecx jnz int_exit cmpl half, %edx jnz int_exit fcom %st(1) fstsw %ax sahf jz after_check decl %ebx int_exit: decl %ebx addl $96, %esp popl %ebp movl $1, %eax #ifdef WINDOWS ret #else push %ebx push %eax int $0x80 #endif .align 32,0 .fill 64 half: .long 0x7fffffff,0 e: .long 0xffffffff,0x3fdfffff rt: .long 0xffffffff,0x3fefffff cpuburn-1.4/burnMMX.S000064400000000000000000000061451213405604500145140ustar00rootroot00000000000000# cpuburn-1.4: burnMMX Chipset/DRAM Loading Utility # Copyright 2000 Robert J. Redelmeier. All Right Reserved # Licensed under GNU General Public Licence 2.0. No warrantee. # *** USE AT YOUR OWN RISK *** .text #ifdef WINDOWS .globl _main _main: movl 4(%esp),%eax movl $6, %ecx # default f = 64 kB subl $1, %eax # is a param given? jz no_size movl 8(%esp),%eax # address of strings movl 4(%eax),%eax # address of first paramater movzb (%eax),%ecx # first parameter - a byte no_size: subl $12, %esp # stack space #else .globl _start _start: subl $12, %esp movl 20(%esp), %eax movl $6, %ecx # default f = 64 kB testl %eax, %eax # is a param given? jz no_size movl (%eax), %ecx no_size: #endif emms movq rt, %mm0 decl %ecx andl $15, %ecx # mask off ASCII bits movl $256, %eax shll %cl, %eax movl %eax, 4(%esp) # save blocksize movl $256*1024, %eax shrl %cl, %eax movl %eax, 8(%esp) # save count blks / 512 MB movl 4(%esp), %ecx # initial fill of 2 cachelines shrl $4, %ecx movl $buffer, %edi xorl %eax, %eax notl %eax more: movl %eax, %edx # qwords F-F-0-F , F-0-F-0 notl %edx movl %eax, 0(%edi) movl %eax, 4(%edi) movl %eax, 8(%edi) movl %eax, 12(%edi) movl %edx, 16(%edi) movl %edx, 20(%edi) movl %eax, 24(%edi) movl %eax, 28(%edi) movl %eax, 32(%edi) movl %eax, 36(%edi) movl %edx, 40(%edi) movl %edx, 44(%edi) movl %eax, 48(%edi) movl %eax, 52(%edi) movl %edx, 56(%edi) movl %edx, 60(%edi) rcll $1, %eax # walking zero, 33 cycle leal 64(%edi), %edi # odd inst to preserve CF decl %ecx jnz more thrash: # OUTER LOOP movl 8(%esp), %edx # reset count for 512 MB mov_again: movq %mm0, %mm1 movq %mm0, %mm2 movl $buffer, %esi movl $buf2, %edi movl 4(%esp), %ecx shll $2, %ecx # move block up addl %ecx, %esi addl %ecx, %edi negl %ecx .align 16, 0x90 0: # WORKLOOP 7 uops/ 3 clks in L1 movq 0(%esi,%ecx),%mm7 pmaddwd %mm0, %mm1 pmaddwd %mm0, %mm2 movq %mm7, 0(%edi,%ecx) addl $8, %ecx jnz 0b movl $buffer + 32, %edi # move block back movl $buf2, %esi # shifting by movl 4(%esp), %ecx # one cacheline subl $8, %ecx shll $2, %ecx addl %ecx, %esi addl %ecx, %edi negl %ecx .align 16, 0x90 0: # second workloop movq 0(%esi,%ecx),%mm7 pmaddwd %mm0, %mm1 pmaddwd %mm0, %mm2 movq %mm7, 0(%edi,%ecx) addl $8, %ecx jnz 0b movl $buffer, %edi movsl # replace last c line movsl movsl movsl movsl movsl movsl movsl decl %edx # do again for 512 MB. jnz mov_again xorl %ebx ,%ebx # DATA CHECK decl %ebx pcmpeqd %mm2, %mm1 psrlq $16, %mm1 movd %mm1, %eax incl %eax jnz error # MMX calcs OK? decl %ebx subl $32, %edi xorl %ecx, %ecx test: # Check data (NOT optimized) mov 0(%edi,%ecx,4), %eax cmp %eax, 4(%edi,%ecx,4) jnz error incl %ecx incl %ecx cmpl 4(%esp), %ecx jc test jmp thrash error: # error abend emms movl $1, %eax #ifdef WINDOWS addl $12, %esp # deallocate stack ret #else push %ebx push %eax int $0x80 #endif rt: .long 0x7fffffff, 0x7fffffff .bss # Data allocation .align 32 .lcomm buffer, 32 <<20 # reduce both to 8 <<20 for only .lcomm buf2, 32 <<20 # 16 MB virtual memory available # cpuburn-1.4/burnP5.S000064400000000000000000000024131213405604500143310ustar00rootroot00000000000000# cpuburn-1.4: burnP5 CPU Loading Utility # Copyright 1999 Robert J. Redelmeier. All Right Reserved # Licensed under GNU General Public Licence 2.0. No warrantee. # *** USE AT YOUR OWN RISK *** .text #ifdef WINDOWS .globl _main _main: #else .globl _start _start: #endif finit pushl %ebp movl %esp, %ebp andl $-32, %ebp subl $96, %esp fldl half fstpl -24(%ebp) fldl one fstl -16(%ebp) fld %st fld %st after_check: xorl %eax, %eax movl %eax, %ebx movl $200000000, %ecx .align 32, 0x90 # MAIN LOOP 16 flops / 18 cycles crunch: fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) fmull -24(%ebp) fxch %st(1) faddl -16(%ebp) fxch %st(2) decl %ecx jnz crunch jmp after_check addl $96, %esp # never reached popl %ebp # no checking done movl $1, %eax #ifdef WINDOWS ret #else int $0x80 #endif .align 32,0 half: .long 0xffffffff,0x3fdfffff one: .long 0xffffffff,0x3fefffff cpuburn-1.4/burnP6.S000064400000000000000000000025071213405604500143360ustar00rootroot00000000000000# cpuburn-1.4: burnP6 CPU Loading Utility # Copyright 1999 Robert J. Redelmeier. All Right Reserved # Licensed under GNU General Public Licence 2.0. No warrantee. # *** USE AT YOUR OWN RISK *** .text #ifdef WINDOWS .globl _main _main: #else .globl _start _start: #endif finit pushl %ebp movl %esp, %ebp andl $-32, %ebp subl $96, %esp fldpi fldl rt fstpl -24(%ebp) fldl e fstpl -32(%ebp) movl half, %edx movl %edx, -8(%ebp) after_check: xorl %eax, %eax movl %eax, %ebx lea -1(%eax), %esi movl $539000000, %ecx # check after this count movl %ecx, -4(%ebp) .align 32, 0x90 crunch: # MAIN LOOP 21uops / 8.0 clocks fldl 8-24(%ebp,%esi,8) fmull 8-32(%ebp,%esi,8) addl half, %edx jnz . + 2 faddp fldl -24(%ebp) decl %ebx subl half+9(%esi,%esi,8), %edx jmp . + 2 fmull 8-32(%ebp,%esi,8) incl %ebx decl 8-4(%ebp,%esi,8) fsubp jnz crunch test %ebx, %ebx # Testing block mov $0, %ebx jnz int_exit cmpl half, %edx jnz int_exit fldpi fcomp %st(1) fstsw %ax sahf jz after_check # fp result = pi ? decl %ebx int_exit: # error abort decl %ebx addl $96, %esp popl %ebp movl $1, %eax # Linux syscall #ifdef WINDOWS ret #else push %ebx push %eax # *BSD syscall int $0x80 #endif .align 32,0 half: .long 0x7fffffff,0 e: .long 0xffffffff,0x3fdfffff rt: .long 0xffffffff,0x3fefffff #