AMD patent reveals complex chip design

Marees

2[H]4U
Joined
Sep 28, 2018
Messages
2,226
https://www.freepatentsonline.com/y2022/0320042.html

the allocation of GPU processing pipeline components amongst multiple discrete dies to create smaller footprint building blocks (e.g., the various parallel processing stacked die chiplets described herein) that may be subsequently communicably stitched together with an active bridge chip enables manufacture of graphics pipes/chips scalable in a chiplet manner while still able to form a device having similar performance relative to a larger monolithic processor.

This modular 3D graphics concept is scalable, separately updatable, and mitigates the cost of assembly by using small die with high yield aspects, and provides value in not only allowing for increased die yield of production per semiconductor wafer but also increases the amount of good dies per semiconductor wafer.

Screenshot_20230815-000410_Word.jpg




illustrated is a block diagram of a plan view 700 of a graphics processor MCM 702 employing graphics processing stacked die chiplets in accordance with some embodiments. The graphics processor MCM 702 (similar to the parallel processor MCM 202 of FIG. 2) is formed as a single semiconductor chip package including N=3 number of communicably coupled graphics processing stacked die chiplets 602 of FIG. 6. As shown in plan view 700, the graphics processor MCM 702 includes a first graphics processing stacked die chiplet 702a, a second graphics processing stacked die chiplet 702b, and a third graphics processing stacked die chiplet 702c.

As will be appreciated, the increased number of inter-die interconnect structures 608a,608b associated with graphics processing stacked die chiplets 602 allows for a larger number of stacked die chiplets to be communicably coupled together in a single package (e.g., relative to stacked die chiplets 402 which can only be paired, such as illustrated in FIG. 5, due to a single interconnect structure 408 on each stacked die chiplet 402). For example, in various embodiments, the graphics processor MCM 702 includes a first bridge chip 704a that communicably couples the first graphics processing stacked die chiplet 702a to the second graphics processing stacked die chiplet 702b. In particular, the first bridge chip 704a communicably couples the second inter-die interconnect structure 608b of the first graphics processing stacked die chiplet 702a to the first inter-die interconnect structure 608a of the second graphics processing stacked die chiplet 702b. Additionally, the graphics processor MCM 702 includes a second bridge chip 704b that communicably couples the second graphics processing stacked die chiplet 702b to the third graphics processing stacked die chiplet 702c. In particular, the second bridge chip 704b communicably couples the second inter-die interconnect structure 608b of the second graphics processing stacked die chiplet 702b to the first inter-die interconnect structure 608a of the third graphics processing stacked die chiplet 702c.

In various embodiments, the bridge chips 704 are passive or active, in which each bridge chip 704 includes just data/electrical connections or a given bridge chip 704 includes its own logic. For example, in some embodiments, each bridge chip 704 is an active bridge chip having active silicon to operate as a high-bandwidth die-to-die interconnect between the graphics processing stacked die chiplets 602. In other embodiments, the bridge chip 704 is a passive chip.

In some embodiments, an active bridge chip 704 includes one or more cache buffers and therefore extends beachfront edge connectivity, while still providing inter-base-die communications and to route cross die synchronization signals. Caches are naturally an active component (i.e., require electrical power for operations), so the bridge chip 704 is active for holding those cache buffers. Cache sizing is configurable, for example, as a function of the physical size of the active bridge chip 704, for different applications along with different stacked die chiplet configurations, and the stacked die chiplet(s) to which the active bridge chip 704 is communicably coupled do not pay the cost (e.g., costs related to physical space, power constraints, and the like) of this external cache on the bridge chip 704.

In various embodiments, the bridge chip 704 includes a local silicon interconnect (LSI) that provides a small silicon bond in free translation that communicably couples two logic chips together and provides inter-die connectivity between adjacent edges of the two dies with a limited physical scope (e.g., as opposed to mounting the stacked die chiplets 602 to a common interposer substrate and relying entirely on electrical connections provided by the interposer for inter-die communications, such as provided by conventional 2.5D topologies in which the interposer often spans the extent of an entire assembly). In this manner, the intermediary bridge chip 704 communicably couples multiple stacked die chiplets (e.g., the first graphics processing stacked die chiplet 602a and the second graphics processing stacked die chiplet 602b) together. Additionally, in various embodiments, the bridge chip 704 carries a data fabric (not shown) between the two stacked die chiplets to provide a common view of memory.

The coupling of multiple graphics processing stacked die chiplets (e.g., first graphics processing stacked die chiplet 602a to the second graphics processing stacked die chiplet 602b, which is in turn coupled to the third graphics processing stacked die chiplet 602c) together in a single package results in a device that effectively operates as a single large graphics complex die (GCD) but is constructed out of smaller, modular die components. In various embodiments, the graphics processor MCM 702 is communicably coupled to one or more external system memory modules 706 via the memory controller PHYs 614 of the graphics processing stacked die chiplets. Additionally, in some embodiments, the graphics processor MCM 702 also includes input/output (I/O) logic in a multimedia and I/O die (MID) 708 separate from the graphics processing stacked die chiplets 602.
 
That complex chip design looks a lot like how Apple stitches together the GPU’s in the Mx Pro and Max chips.

It also looks very similar to Nvidia’s designs but Nvidia has the connector on multiple edges so they aren’t limited to a linear line.

And Apple’s were connected at right angles not straight through.
 
It should be noted that with TSMC’s current price structure and backlog on complex packaging the proposed cost savings doesn’t exist. The Active Interposer components cost just as much as the chips above them to produce and need to be included in the mm^2 area for cost calculations. And then you also need to factor in the time it takes to package them together, it is not fast, and also prone to error.
 
It should be noted that with TSMC’s current price structure and backlog on complex packaging the proposed cost savings doesn’t exist. The Active Interposer components cost just as much as the chips above them to produce and need to be included in the mm^2 area for cost calculations. And then you also need to factor in the time it takes to package them together, it is not fast, and also prone to error.

No doubt it will increase the defect rate, but people make big money at AMD to figure out those issues with TSMC.
 
No doubt it will increase the defect rate, but people make big money at AMD to figure out those issues with TSMC.
I’m sure they do, and I’m also sure they hit one hell of a snag with it to shelve the launch plans for it.

This sort of packaging design was heavily rumoured for the MI350x as well and it has yet to materialize. It is very late at this stage.
 
I’m sure they do, and I’m also sure they hit one hell of a snag with it to shelve the launch plans for it.

This sort of packaging design was heavily rumoured for the MI350x as well and it has yet to materialize. It is very late at this stage.

They may have found something they did caused issues somewhere else that just is not a quick fix. Will see what happens but I tend to not worry about unrelease parts, only matter what they do when they release them. I would rather they hold off a launch then launch a product with issues.
 
They may have found something they did caused issues somewhere else that just is not a quick fix. Will see what happens but I tend to not worry about unrelease parts, only matter what they do when they release them. I would rather they hold off a launch then launch a product with issues.
Which is fine but at this rate their MI300 series is going to be up against what ever next gen of Nvidia is coming up so they might as well skip the 300 and just go right to 400.
 
I'd imagine they are more concerned with yield per wafer. At least at this point. Packaging efficiency and effectiveness can be improved. Starting with a high yield of components makes that easier.
 
Maybe we're heading for an age of not needing big ass expensive stand alone GPUs. Maybe in a few years we'll get discreet high end performance right smack on a regular CPU package.
 
Maybe we're heading for an age of not needing big ass expensive stand alone GPUs. Maybe in a few years we'll get discreet high end performance right smack on a regular CPU package.
No need to wait.! You can get that from apple right now!! <snicker >
 
Is this below claim correct 🤔

What's odd is if AMD is already testing it, that would mean it's a TSMC 4nm GPU as opposed to a 3nm part. There's no way AMD would have access to 3nm technology right because since Apple has bought up the entirety of TSMC's early N3 capacity.

https://www.extremetech.com/gaming/diagram-shows-alleged-radeon-8000-series-monster-chip
Don't know. I can envision a situation where TSMC still allows companies other than Apple to begin low volume testing on 3nm to prepare for future mass production.
 
What's odd is if AMD is already testing it, that would mean it's a TSMC 4nm GPU as opposed to a 3nm part. There's no way AMD would have access to 3nm technology right because since Apple has bought up the entirety of TSMC's early N3 capacity.
I know nothing, but I can imagine a difference between buying N3 production capacity and prototyping capacity those product planned to be mass produced only in 2024 not this year, I also imagine you can do a lot of work under some process before the final one, and a bit why sometimes they get a bit surprised with the final model performance.

That chiplet system is quite similar to their MI300 which should be TSMC 5, apparently.

Hopper Next launch in 2024, would Nvidia try to "cook" some early version with TSMC 3 soon ?

Moore law did explain a bit about it, talking about soft design and hard design, with the soft design being somewhat node agnostic quite late with simulator getting good (making possible to decide to go with Samsung instead of TSMC during the soft design stage), with the hard process of making it work on the final node, debug, etc.. being still a lot of work.
 
Don't know. I can envision a situation where TSMC still allows companies other than Apple to begin low volume testing on 3nm to prepare for future mass production.
You would think that but TSMC recently announced that Apple has 110% of their 3nm node until 2025.
 
Back
Top