Abstract: Method, apparatus, and program means for performing a packed multiply high with round and shift operation. The method of one embodiment comprises receiving a first operand having a first set of L data elements. A second operand having a second set of L data elements is received. L pairs of data elements are multiplied together to generate a set of L products. Each of the L pairs includes a first data element from the first set of L data element and a second data element from a corresponding data element position of the second set of L data elements. Each of the L products are rounded to generate L rounded values. Each of said L rounded values are scaled to generate L scaled values. Each of the L scaled values are truncated for storage at a destination. Each truncated value is to be stored at a data element position corresponding to its pair of data elements.
DESCRIPTION (COMPLETE)
OCR NOT PREPARE DUE TO
PRINT PROBLEM
What is claimed is:
1. A method comprising:
receiving a first operand having a first set of L data elements;
receiving a second operand having a second set of L data elements;
multiplying together L pairs of data elements to generate a set of L products, wherein each of said L pairs includes a first data element from said first set of L data element and a second data element from a corresponding data element position of said second set of L data elements;
rounding each of said L products to generate L rounded values;
scaling each of said L rounded values to generate L scaled values; and
truncating each of said L scaled values for storage at a destination, wherein each truncated value is to be stored at a data element position corresponding to its pair of data elements.
2. The method of claim 1 wherein said rounding further comprises adding a ' 1' to a designated bit location for each of said L products.
3. The method of claim 2 wherein said designated bit location is bit 14 for each of said L products.
4. The method of claim 3 wherein said scaling further comprises shifting each of said L rounded values left by one bit.
5. The method of claim 4 wherein said truncating further comprises extracting sixteen most significant bits from each of said L scaled values to obtain L truncated values.
6. The method of claim 5 wherein each data element position of said first and second operands are processed in parallel.
7. The method of claim 6 wherein said processing comprises said multiplying, said rounding, said scaling, and said truncating.
8. The method of claim 2 wherein said rounding further comprises:
shifting each of said L products right by fourteen bits to generate a set of L
eighteen bit wide shifted values; and
adding a ' I' to a least significant bit position of each of said shifted values.
9. The method of claim 8 wherein said scaling further comprises shifting each of said L rounded values right by one bit to generate a set of L scaled values.
10. The method of claim 9 wherein said truncating further comprises selecting sixteen least significant bits from each of said L scaled values to obtain L truncated values.
11. The method of claim 1 wherein said first operand and said second operand are each packed data operands comprised of a plurality of data elements.
12. The method of claim 11 wherein each of said data elements holds a signed integer value.
13. The method of claim 1 wherein said destination is a packed data block.
14. The method of claim 13 wherein each data element is a word in length.
15. - The method of claim 14 wherein said first operand, said second operand, and said
destination are each 64 bits in length.
16. The method of claim 14 wherein said first operand, said second operand, and said destination are each 128 bits in length.
17. The method of claim 14 wherein said first operand and said second operand reside
in SIMD registers.
18. A method comprising:
receiving an instruction to perform a packed multiply high with round and shift operation on two operands, wherein said packed multiply with round and shift operation comprises
multiplying each data element in a first set of packed data elements with a corresponding data element in a second set of packed data elements to generate a set of products,
rounding and shifting each of said set of products to generate a set of results, and
selecting a plurality of bits from each of said results to generate a set of truncated results; said instruction having a format comprising:
a first field to specify an op code to provide information about said packed multiply with round and shift operation;
a second field to specify a first source address for a first operand having
said first set of packed data elements; and
a third field to specify a second source address for a second operand having said second set of packed data elements; and executing said instruction to generate said set of truncated results for storage as packed data elements in a destination register.
19. The method of claim 18 wherein said op code is to indicate whether said set of
truncated results for said packed multiply high with round and shift operation is
comprised of high order bits or low order bits of said set of results.
20. The method of claim 19 wherein said first source address is a first address of a first register to store Said first set of packed data elements.
21. The method of claim 20 wherein said first register is also a destination for said set of truncated results for said packed multiply with round and shift operation.
22. The method of claim 21 wherein said second source address is a second address of a second register to store said second set of packed data elements.
23. The method of claim 18 wherein said first field includes a bit to indicate whether said packed multiply with round and shift operation is a signed operation or an unsigned operation.
24. The method of claim 23 wherein said first field further includes at least two bits to indicate whether each plurality of bits selected from said set of results are comprised of high order bits of a particular result or of low order bits of said result.
25. The method of claim 18 wherein said format further comprises a sign field to indicate whether said packed multiply with round and shift operation is a signed or unsigned operation.
26. The method of claim 25 wherein said format further comprises a size field to indicate a length of each of said packed data elements.
27. The method of claim 26 wherein said format further comprises a fourth field to specify a destination address to receive said set of results for said packed multiply with round and shift operation.
28. The method of claim 18 wherein said information for said op code indicates a packed multiply with round and shift of signed integers and to select high order bits of
each of said results for a truncated result,
29. The method of claim 18 wherein said rounding comprises adding' 1' to bit 14 each of said productk to obtain a set of rounded values and wherein said shifting comprises shifting each of said rounded values left one bit position.
30. The method of claim 29 wherein each plurality of bits from each of said results are sixteen high order bits of that particular result.
31. An apparatus comprising:
an execution unit to execute one or more instructions of an instruction set, said instruction set to include at least one instruction to perform a packed multiply with round and shift operation, wherein said execution unit in response to said at least one instruction to perform said packed multiply with round and shift operation,
multiplies each data element in a first set of packed data elements with a corresponding data element in a second set of packed data elements to generate a set of products,
rounds and shifts each of said set of products to generate a set of results, and
selects a plurality of bits from each of said results to generate a set of truncated results; wherein said at least one instruction having a format comprising:
a first field to specify an op code to provide information about said packed multiply with round and shift operation,
a second field to specify a first source address for a first operand having said first set of packed data elements, and
a third field to specify a second source address for a second operand having said second set of packed data elements.
32. The apparatus of claim 31 wherein said truncated results comprising selected portions of each of said results are stored as packed data elements in a destination register.
33. The apparatus of claim 32 wherein said op code is to indicate whether said set of truncated results for said packed multiply high with round and shift operation is comprised of high order bits or low order bits of said set of results.
34. The apparatus of claim 31 wherein said first source address is a first address of a first register to store said first set of packed data elements and said second source address is a second address of a second register to store said second set of packed data elements.
35. The apparatus of claim 34 wherein said first register is also a destination for said set of truncated results for said packed multiply with round and shift operation.
36. The apparatus of claim 31 wherein said first field includes a bit to indicate whether said packed multiply with round and shift operation is a signed operation or an unsigned operation.
37. The apparatus of claim 36 wherein said first field further includes at least two bits to indicate whether each plurality of bits selected from said set of results are comprised of high order bits of a particular result or of low order bits of said result.
38. The apparatus of claim 31 wherein said format further comprises a sign field to indicate whether said packed multiply with round and shift operation is a signed or unsigned operation.
39. The apparatus of claim 38 wherein said format further comprises a size field to
indicate a length of each of said packed data elements.
40. The apparatus of claim 39 wherein said format further comprises a fourth field to specify a destination address to receive said set of results for said packed multiply with round and shift operation.
41. The apparatus of claim 31 wherein said information for said op code indicates a packed multiply with round and shift of signed integers and to select high order bits of each of said results for a truncated result.
42. The apparatus of claim 31 wherein said rounding comprises adding ' 1' to bit 14 each of said products to obtain a set of rounded values and wherein said shifting comprises shifting each of said rounded values left one bit position.
43. The apparatus of claim 42 wherein each plurality of bits from each of said results are sixteen high order bits of that particular result.
44. A system comprising:
a memory to store data and instructions;
a processor coupled to said memory on a bus, said processor operable to perform a multiply with round and shift operation in response to a multiply with round and shift instruction, said processor comprising:
a bus unit to receivesaid multiply wim round and shift instruction from said memory; and
an execution unit coupled to said bus unit, said execution unit to execute said multiply with round and shift instruction, said multiply with round and shift instruction to cause said execution unit to:
multiply each data element in a first set of packed data elements
with a corresponding data element in a second set of packed data elements to generate a set of products,
round and shift each of said set of products to generate a set of results, and
select a plurality of bits from each of said results to generate a set of truncated results.
45. The system of claim 44 wherein said multiply with round and shift instruction has
a format comprising:
a first field to specify an op code to provide information about said packed multiply with round and shift operation,
a second field to specify a first source address for a first operand having said first set of packed data elements, and
a third field to specify a second source address for a second operand having said second set of packed data elements.
46. The system of claim 45 wherein said information of said op code indicates a packed. multiply with round and shift of signed integers and to select high order of each of said results for a truncated result.
47. The system of claim 46 wherein said rounding comprises adding 'V to bit 14 each of said products to obtain a set of rounded values and wherein said shifting comprises shifting each of said rounded values left one bit position.
48. The system of claim 47 wherein each plurality of bits from each of said results are sixteen high order bits of that particular result.
49. The system of claim 45 wherein said op code is to indicate whether said set of
truncated results for said multiply with round and shift operation is comprised of high order bits or low order bits of said set of results.
50. The system of claim 48 wherein said first field includes a bit to indicate whether said packed multiply with round and shift operation is a signed operation or an unsigned operation.
51. The system of claim 50 wherein said first field further includes at least two bits to indicate whether each plurality of bits selected from said set of results are comprised of high order bits of a particular result or of low order bits of said result.
52. The system of claim 45 wherein said format further comprises:
a sign field to indicate whether said packed multiply with round and shift operation is a signed or unsigned operation; and
a size field to indicates length of each of said packed data elements.
53. The system of claim 45 wherein said format further comprises a fourth field to
specify a destination address to receive said set of results for said packed multiply with
round and shift operation.
54. The system of claim 44 wherein said truncated results comprising selected portions of each of said results are stored as packed data elements in a destination register.
55. The system of claim 45 wherein said first source address is a first address of a first register to store said first set of packed data elements and said second source address is a second address of a second register to store said second set of packed data elements.
56. The system of claim 46 wherein said first register is also a destination for said set of truncated results for said packed multiply with round and shift operation.
57. The system of claim 44 further comprising:
a wireless communication device to send and receive digital data over a wireless network, said wireless communication device coupled said memory to store said digital data and software, wherein said software include said multiply with round and shift instruction; and
an input output system responsive to said software to interface with said wireless communication device, said input output system to receive data for processing or to send data processed at least in part by said multiply with round and shift instruction.
58. A machine readable medium having embodied thereon a program, said program
being executable by a machine to perform a method comprising:
receiving a first operand having a first set of L data elements;
receiving a second operand having a second set of L data elements;
multiplying together L pairs of data elements to generate a set of L products, wherein each of said L pairs includes a first data element from said first set of L data element and a second data element from a corresponding data element position of said second set of L data elements;
rounding each of said L products to generate L rounded values;
scaling each of said L rounded values to generate L scaled values; and
truncating each of said L scaled values for storage at a destination, wherein each truncated value is to be stored at a data element position corresponding to its pair of data elements.
59. The machine readable medium of claim 58 wherein said rounding further
comprises adding a '1' to a designated bit location for each of said L products.
60. The machine readable medium of claim 59 wherein said designated bit location is bit 14 for each of said L products.
61. The machine readable medium of claim 60 wherein said scaling further comprises shifting each of said L rounded values left by one bit.
62. The machine readable medium of claim 61 wherein said truncating further comprises extracting sixteen most significant bits from each of said L scaled values to obtain L truncated values.
63. The machine readable medium of claim 62 wherein each data element position of said first and second operands are processed in parallel, said processing comprising said multiplying, said rounding, said scaling, and said truncating.
64. The machine readable medium of claim 58 wherein:
said rounding further comprises shifting each of said L products right by fourteen bits to generate a set of L eighteen bit wide shifted values and adding a ' 1' to a least significant bit position of each of said shifted values;
said scaling further comprises shifting each of said L rounded values right by one bit to generate a set of L scaled values; and
said truncating further comprises selecting sixteen least significant bits from each of said L scaled values to obtain L truncated values.
65. The machine readable medium of claim 58 wherein said first operand and said second operand are each packed data operands comprised of a plurality of data elements.
66. The machine readable medium of claim 65 wherein each of said data elements holds a word length signed integer value.
67. The machine readable medium of claim 66 wherein said first operand, said second
operand, and said destination are each 64 bits in length.
68. The machine readable medium of claim 66 wherein said first operand, said second
operand, and said destination are each 128 bits in length.
69. An apparatus substantially as hereinbefore described with reference o
the accompanying drawings.
70. A system substantially as hereinbefore described with reference to the accompanying drawings.
71. A machine readable medium substantially as hereinbefore described with reference to the accompanying drawings.
| # | Name | Date |
|---|---|---|
| 1 | 118-DEL-2005-FORM 13 [17-11-2021(online)].pdf | 2021-11-17 |
| 1 | 118-del-2005-gpa.pdf | 2011-08-21 |
| 2 | 118-del-2005-form-5.pdf | 2011-08-21 |
| 2 | 118-del-2005-abstract.pdf | 2011-08-21 |
| 3 | 118-del-2005-form-3.pdf | 2011-08-21 |
| 3 | 118-del-2005-claims.pdf | 2011-08-21 |
| 4 | 118-del-2005-correspondence-others.pdf | 2011-08-21 |
| 4 | 118-del-2005-form-2.pdf | 2011-08-21 |
| 5 | 118-del-2005-form-18.pdf | 2011-08-21 |
| 5 | 118-del-2005-correspondence-po.pdf | 2011-08-21 |
| 6 | 118-del-2005-form-1.pdf | 2011-08-21 |
| 6 | 118-del-2005-description (complete).pdf | 2011-08-21 |
| 7 | 118-del-2005-drawings.pdf | 2011-08-21 |
| 8 | 118-del-2005-form-1.pdf | 2011-08-21 |
| 8 | 118-del-2005-description (complete).pdf | 2011-08-21 |
| 9 | 118-del-2005-form-18.pdf | 2011-08-21 |
| 9 | 118-del-2005-correspondence-po.pdf | 2011-08-21 |
| 10 | 118-del-2005-correspondence-others.pdf | 2011-08-21 |
| 10 | 118-del-2005-form-2.pdf | 2011-08-21 |
| 11 | 118-del-2005-claims.pdf | 2011-08-21 |
| 11 | 118-del-2005-form-3.pdf | 2011-08-21 |
| 12 | 118-del-2005-form-5.pdf | 2011-08-21 |
| 12 | 118-del-2005-abstract.pdf | 2011-08-21 |
| 13 | 118-del-2005-gpa.pdf | 2011-08-21 |
| 13 | 118-DEL-2005-FORM 13 [17-11-2021(online)].pdf | 2021-11-17 |