PSHUFD—Shuffle Packed Doublewords

Opcode/Instruction Op/En 64/32 bit Mode Support CPUID Feature Flag Description

66 0F 70 /r ib

PSHUFD xmm1, xmm2/m128, imm8

RMI V/V SSE2 Shuffle the doublewords in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.

VEX.128.66.0F.WIG 70 /r ib

VPSHUFD xmm1, xmm2/m128, imm8

RMI V/V AVX Shuffle the doublewords in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.

VEX.256.66.0F.WIG 70 /r ib

VPSHUFD ymm1, ymm2/m256, imm8

RMI V/V AVX2 Shuffle the doublewords in ymm2/m256 based on the encoding in imm8 and store the result in ymm1.

Instruction Operand Encoding

Op/En Operand 1 Operand 2 Operand 3 Operand 4
RMI ModRM:reg (w) ModRM:r/m (r) imm8 NA

Description

Copies doublewords from source operand (second operand) and inserts them in the destination operand (first operand) at the locations selected with the order operand (third operand). Figure 4-12 shows the operation of the 256-bit VPSHUFD instruction and the encoding of the order operand. Each 2-bit field in the order operand selects the contents of one doubleword location within a 128-bit lane and copy to the target element in the destination operand. For example, bits 0 and 1 of the order operand targets the first doubleword element in the low and high 128-bit lane of the destination operand for 256-bit VPSHUFD. The encoded value of bits 1:0 of the order operand (see the field encoding in Figure 4-12) determines which doubleword element (from the respective 128-bit lane) of the source operand will be copied to doubleword 0 of the destination operand.

For 128-bit operation, only the low 128-bit lane are operative. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The order operand is an 8-bit immediate. Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.

SRC

X7

X6

X5

X4

X3

X2

X1

X0

DEST

Y7

Y6

Y5

Y4

Y3

Y2

Y1

Y0

00B - X4

Encoding

00B - X0

Encoding

01B - X5

of Fields in

ORDER

01B - X1

of Fields in

10B - X6

ORDER

10B - X2

ORDER

11B - X7

Operand

7

56

4

3

12

0

11B - X3

Operand

Figure 4-12. 256-bit VPSHUFD Instruction Operation

The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The order operand is an 8-bit immediate. Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.

Legacy SSE instructions: In 64-bit mode using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.

VEX.256 encoded version: Bits (255:128) of the destination stores the shuffled results of the upper 16 bytes of the source operand using the immediate byte as the order operand.

Note: VEX.vvvv is reserved and must be 1111b, VEX.L must be 0, otherwise the instruction will #UD.

Operation

PSHUFD (128-bit Legacy SSE version)

DEST[31:0] ← (SRC >> (ORDER[1:0] * 32))[31:0];
DEST[63:32] ← (SRC >> (ORDER[3:2] * 32))[31:0];
DEST[95:64] ← (SRC >> (ORDER[5:4] * 32))[31:0];
DEST[127:96] ← (SRC >> (ORDER[7:6] * 32))[31:0];
DEST[VLMAX-1:128] (Unmodified)

VPSHUFD (VEX.128 encoded version)

DEST[31:0] ← (SRC >> (ORDER[1:0] * 32))[31:0];
DEST[63:32] ← (SRC >> (ORDER[3:2] * 32))[31:0];
DEST[95:64] ← (SRC >> (ORDER[5:4] * 32))[31:0];
DEST[127:96] ← (SRC >> (ORDER[7:6] * 32))[31:0];
DEST[VLMAX-1:128] ← 0

VPSHUFD (VEX.256 encoded version)

DEST[31:0] ← (SRC[127:0] >> (ORDER[1:0] * 32))[31:0];
DEST[63:32] ← (SRC[127:0] >> (ORDER[3:2] * 32))[31:0];
DEST[95:64] ← (SRC[127:0] >> (ORDER[5:4] * 32))[31:0];
DEST[127:96] ← (SRC[127:0] >> (ORDER[7:6] * 32))[31:0];
DEST[159:128] ← (SRC[255:128] >> (ORDER[1:0] * 32))[31:0];
DEST[191:160] ← (SRC[255:128] >> (ORDER[3:2] * 32))[31:0];
DEST[223:192] ← (SRC[255:128] >> (ORDER[5:4] * 32))[31:0];
DEST[255:224] ← (SRC[255:128] >> (ORDER[7:6] * 32))[31:0];

Intel C/C++ Compiler Intrinsic Equivalent

(V)PSHUFD:

__m128i _mm_shuffle_epi32(__m128i a, int n)

VPSHUFD:

__m256i _mm256_shuffle_epi32(__m256i a, const int n)

Flags Affected

None.

SIMD Floating-Point Exceptions

None.

Other Exceptions

See Exceptions Type 4; additionally

#UD

If VEX.L = 1.

If VEX.vvvv ≠ 1111B.