SSE4 (Penryn) 43% faster at encoding that SSE2 in DivX

Toytown

Gawd
Joined
Jan 13, 2005
Messages
996
http://www.divx.com/divx/windows/codec/

Experimental support for SSE4 on Intel Penryn CPUs

The latest DivX Codec includes a new feature designed to take advantage of the new instructions in the new Intel Penryn line of processors. Just in case you haven’t managed to get your hands on one of these unreleased prototypes yourself, we’re publishing some of our own test results.

The graphs on the website, make it look great for video enthusiasts, nice to see how AMD's chip will compare against it.
 
Those numbers do look pretty nice. I think the graph is showing the improvement for a specific routine however. Anyone know enough about DivX to make an educated guess on the impact SSE4 will have on overall encoding times?
 
So, Conroe CPUs don't have SSE4, only Penryns do, correct?

Correct, SSE4 will be introduced with Penryn. The current C2Ds support SSE3 and SSSE3 as their latest SSE extensions.

Penryn will work on current 775 boards?

Isn't Penryn the core for mobile CPUs? The desktop version of Penryn will be Wolfdale. Not sure if Wolfdale will require the next Intel chipset.
 
The bearlake chipset that's coming out in the next several weeks will support Penryn right?
 
There is nothing even near future proofing for Intel.

There isn't for AMD either anymore is there? It seems like they've had a few new sockets come out over the past year or so.

My computer has future proofed. ;)
 
Wow, amazing! How do these perform in x264 I wonder...

I believe I read that the reason SSE4 performs so much better is because Intel finally added dot products into SSE. Supposedly, that gives huge performance increases over the current SSE architectures.
 
Wow, amazing! How do these perform in x264 I wonder...

I believe I read that the reason SSE4 performs so much better is because Intel finally added dot products into SSE. Supposedly, that gives huge performance increases over the current SSE architectures.

Dot product instructions are probably most useful for 3D work, but SSE4 does have a lot of new instructions for video processing like SAD (sum of absolute differences), an instruction that will probably improve H.264 encoding.
 
There are other differences between Conroe and Penryn , like lower instruction latencies.
I know, that's why i pointed out the speed increase beyond the clock speed difference. ;)
 
Impressive. If K10 really is the killer AMD says it is, we may be back to the days when AMD is better at gaming, yet Intel is better at video encoding.
 
Impressive. If K10 really is the killer AMD says it is, we may be back to the days when AMD is better at gaming, yet Intel is better at video encoding.

I wouldn't be too sure.
This SSE4 seems to have more uses than just video-encoding.
These new dotproducts and rounding instructions etc could be used for regular geometry and physics processing aswell. In other words: gaming.

Personally I don't think K10 is going to be better than Penryn at anything... well anything tangible then... sorta like how Athlons currently easily win any bandwidth-intensive synthetic benchmark, but can't win any actual bandwidth-intensive real-world application benchmarks.

So, the way I see it, K10 may have an advantage over Kentsfield, but this may only be a few months, not enough for AMD to get any significant number of processors out the door... then Penryn will take over, leaving AMD again behind, possibly until they can release K11 in another 2-3 years.
 
Amusing thing is that Penryn doesn't even have a complete implementation of SSE4.
 
SSE4.2 isn't as big of a deal (Penryn has the subset called SSE4.1 with 47 instructions). SSE4.2 adds these 7 instuctions:

2.3.1 String and Text Processing Instructions
• PCMPESTRI — Packed compare explict-length strings, return index in ECX/RCX
• PCMPESTRM — Packed compare explict-length strings, return mask in XMM0
• PCMPISTRI — Packed compare implict-length strings, return index in ECX/RCX
• PCMPISTRM — Packed compare implict-length strings, return mask in XMM0

2.3.2 Packed Comparison SIMD integer Instruction
SSE4.2 also provides a 128-bit integer SIMD instruction PCMPGTQ that performs
logical compare of greater-than on packed integer quadwords.

2.3.3 Application-Targeted Accelerator Instructions
• CRC32 — Provides hardware acceleration to calculate cyclic redundancy checks
for fast and efficient implementation of data integrity protocols.
• POPCNT — Accelerates software performance in the searching of bit patterns.
 
Current COre 2 Duos have SSE4 too. http://www.newegg.com/Product/Product.aspx?Item=N82E16819115004

Just click on the Specifications Tab and scroll to multimedia instructions.

http://www.intel.com/technology/magazine/computing/new-instructions-1006.pdf

For instance, SSE2 instructions gave software developers maximum flexibility in implementing algorithms and providing performance enhancements to software such as MPEG-2 video, MP3, 3D graphics, and more. The launch of the 90nm process–based Pentium 4 processor saw the introduction of SSE3. SSE3 includes 13 additional
SIMD instructions over SSE2 that are primarily designed to improve thread synchronization and x87-FP math capabilities.

A further advancement, Supplemental SSE3, is now available in Intel Core microarchitecture. Included in Intel® Xeon® 5100 processors (server and workstation) and the Intel Core 2 Duo processors (notebook and desktop), Supplemental SSE3 adds 32 new opcodes—including align and multiply-add—for yet greater performance.

Notes
1 Intel has not yet announced launch dates for 45nm products.

2 Most of these instructions will be available in Penryn, and some of the instructions will be available in microprocessors slated for release after Penryn.

Guys, it's only 8 pages long. Also see

http://download.intel.com/technology/architecture/new-instructions-paper.pdf

Also 8 pages. Called "Intel® Advanced Digital Media Boost" it Boosts a broad range of applications, including video, speech and image, photo processing, encryption, financial, engineering, and scientific applications.
 
Back
Top