1 files changed, 53 insertions, 53 deletions
diff --git a/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README b/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README
index 93d433c..55c0a29 100755..100644
--- a/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README
+++ b/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README
@@ -1,53 +1,53 @@
-This directory contains mpn functions optimized for DEC Alpha processors.
+This directory contains mpn functions optimized for DEC Alpha processors.
-RELEVANT OPTIMIZATION ISSUES
+RELEVANT OPTIMIZATION ISSUES
-EV4
+EV4
-1. This chip has very limited store bandwidth.  The on-chip L1 cache is
+1. This chip has very limited store bandwidth.  The on-chip L1 cache is
-write-through, and a cache line is transfered from the store buffer to the
+write-through, and a cache line is transfered from the store buffer to the
-off-chip L2 in as much 15 cycles on most systems.  This delay hurts
+off-chip L2 in as much 15 cycles on most systems.  This delay hurts
-mpn_add_n, mpn_sub_n, mpn_lshift, and mpn_rshift.
+mpn_add_n, mpn_sub_n, mpn_lshift, and mpn_rshift.
-2. Pairing is possible between memory instructions and integer arithmetic
+2. Pairing is possible between memory instructions and integer arithmetic
-instructions.
+instructions.
-3. mulq and umulh is documented to have a latency of 23 cycles, but 2 of
+3. mulq and umulh is documented to have a latency of 23 cycles, but 2 of
-these cycles are pipelined.  Thus, multiply instructions can be issued at a
+these cycles are pipelined.  Thus, multiply instructions can be issued at a
-rate of one each 21nd cycle.
+rate of one each 21nd cycle.
-EV5
+EV5
-1. The memory bandwidth of this chip seems excellent, both for loads and
+1. The memory bandwidth of this chip seems excellent, both for loads and
-stores.  Even when the working set is larger than the on-chip L1 and L2
+stores.  Even when the working set is larger than the on-chip L1 and L2
-caches, the perfromance remain almost unaffected.
+caches, the perfromance remain almost unaffected.
-2. mulq has a measured latency of 13 cycles and an issue rate of 1 each 8th
+2. mulq has a measured latency of 13 cycles and an issue rate of 1 each 8th
-cycle.  umulh has a measured latency of 15 cycles and an issue rate of 1
+cycle.  umulh has a measured latency of 15 cycles and an issue rate of 1
-each 10th cycle.  But the exact timing is somewhat confusing.
+each 10th cycle.  But the exact timing is somewhat confusing.
-3. mpn_add_n.  With 4-fold unrolling, we need 37 instructions, whereof 12
+3. mpn_add_n.  With 4-fold unrolling, we need 37 instructions, whereof 12
-   are memory operations.  This will take at least
+   are memory operations.  This will take at least
-        ceil(37/2) [dual issue] + 1 [taken branch] = 20 cycles
+        ceil(37/2) [dual issue] + 1 [taken branch] = 20 cycles
-   We have 12 memory cycles, plus 4 after-store conflict cycles, or 16 data
+   We have 12 memory cycles, plus 4 after-store conflict cycles, or 16 data
-   cache cycles, which should be completely hidden in the 20 issue cycles.
+   cache cycles, which should be completely hidden in the 20 issue cycles.
-   The computation is inherently serial, with these dependencies:
+   The computation is inherently serial, with these dependencies:
-     addq
+     addq
-     /   \
+     /   \
-   addq  cmpult
+   addq  cmpult
-     |     |
+     |     |
-   cmpult  |
+   cmpult  |
-       \  /
+       \  /
-        or
+        or
-   I.e., there is a 4 cycle path for each limb, making 16 cycles the absolute
+   I.e., there is a 4 cycle path for each limb, making 16 cycles the absolute
-   minimum.  We could replace the `or' with a cmoveq/cmovne, which would save
+   minimum.  We could replace the `or' with a cmoveq/cmovne, which would save
-   a cycle on EV5, but that might waste a cycle on EV4.  Also, cmov takes 2
+   a cycle on EV5, but that might waste a cycle on EV4.  Also, cmov takes 2
-   cycles.
+   cycles.
-     addq
+     addq
-     /   \
+     /   \
-   addq  cmpult
+   addq  cmpult
-     |      \
+     |      \
-   cmpult -> cmovne
+   cmpult -> cmovne
-STATUS
+STATUS

diff --git a/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README b/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README index 93d433c..55c0a29 100755..100644 --- a/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README +++ b/linden/indra/libgcrypt/libgcrypt-1.2.2/mpi/alpha/README
@@ -1,53 +1,53 @@
1	This directory contains mpn functions optimized for DEC Alpha processors.	1	This directory contains mpn functions optimized for DEC Alpha processors.
2		2
3	RELEVANT OPTIMIZATION ISSUES	3	RELEVANT OPTIMIZATION ISSUES
4		4
5	EV4	5	EV4
6		6
7	1. This chip has very limited store bandwidth. The on-chip L1 cache is	7	1. This chip has very limited store bandwidth. The on-chip L1 cache is
8	write-through, and a cache line is transfered from the store buffer to the	8	write-through, and a cache line is transfered from the store buffer to the
9	off-chip L2 in as much 15 cycles on most systems. This delay hurts	9	off-chip L2 in as much 15 cycles on most systems. This delay hurts
10	mpn_add_n, mpn_sub_n, mpn_lshift, and mpn_rshift.	10	mpn_add_n, mpn_sub_n, mpn_lshift, and mpn_rshift.
11		11
12	2. Pairing is possible between memory instructions and integer arithmetic	12	2. Pairing is possible between memory instructions and integer arithmetic
13	instructions.	13	instructions.
14		14
15	3. mulq and umulh is documented to have a latency of 23 cycles, but 2 of	15	3. mulq and umulh is documented to have a latency of 23 cycles, but 2 of
16	these cycles are pipelined. Thus, multiply instructions can be issued at a	16	these cycles are pipelined. Thus, multiply instructions can be issued at a
17	rate of one each 21nd cycle.	17	rate of one each 21nd cycle.
18		18
19	EV5	19	EV5
20		20
21	1. The memory bandwidth of this chip seems excellent, both for loads and	21	1. The memory bandwidth of this chip seems excellent, both for loads and
22	stores. Even when the working set is larger than the on-chip L1 and L2	22	stores. Even when the working set is larger than the on-chip L1 and L2
23	caches, the perfromance remain almost unaffected.	23	caches, the perfromance remain almost unaffected.
24		24
25	2. mulq has a measured latency of 13 cycles and an issue rate of 1 each 8th	25	2. mulq has a measured latency of 13 cycles and an issue rate of 1 each 8th
26	cycle. umulh has a measured latency of 15 cycles and an issue rate of 1	26	cycle. umulh has a measured latency of 15 cycles and an issue rate of 1
27	each 10th cycle. But the exact timing is somewhat confusing.	27	each 10th cycle. But the exact timing is somewhat confusing.
28		28
29	3. mpn_add_n. With 4-fold unrolling, we need 37 instructions, whereof 12	29	3. mpn_add_n. With 4-fold unrolling, we need 37 instructions, whereof 12
30	are memory operations. This will take at least	30	are memory operations. This will take at least
31	ceil(37/2) [dual issue] + 1 [taken branch] = 20 cycles	31	ceil(37/2) [dual issue] + 1 [taken branch] = 20 cycles
32	We have 12 memory cycles, plus 4 after-store conflict cycles, or 16 data	32	We have 12 memory cycles, plus 4 after-store conflict cycles, or 16 data
33	cache cycles, which should be completely hidden in the 20 issue cycles.	33	cache cycles, which should be completely hidden in the 20 issue cycles.
34	The computation is inherently serial, with these dependencies:	34	The computation is inherently serial, with these dependencies:
35	addq	35	addq
36	/ \	36	/ \
37	addq cmpult	37	addq cmpult
38	\| \|	38	\| \|
39	cmpult \|	39	cmpult \|
40	\ /	40	\ /
41	or	41	or
42	I.e., there is a 4 cycle path for each limb, making 16 cycles the absolute	42	I.e., there is a 4 cycle path for each limb, making 16 cycles the absolute
43	minimum. We could replace the `or' with a cmoveq/cmovne, which would save	43	minimum. We could replace the `or' with a cmoveq/cmovne, which would save
44	a cycle on EV5, but that might waste a cycle on EV4. Also, cmov takes 2	44	a cycle on EV5, but that might waste a cycle on EV4. Also, cmov takes 2
45	cycles.	45	cycles.
46	addq	46	addq
47	/ \	47	/ \
48	addq cmpult	48	addq cmpult
49	\| \	49	\| \
50	cmpult -> cmovne	50	cmpult -> cmovne
51		51
52	STATUS	52	STATUS
53		53