Wednesday, May 10, 2017

GDDR5 memory timing details



In my Advanced Tonga BIOS editing post, I discussed some basic memory timing information, but did not get into the details.  GDDR5 memory is much more complex than the asynchronous DRAM of 20 years ago.  There are many sources of information on SDRAM, while GDDR information is harder to come by.  Although a thorough description of GDDR5 can be found in the spec published by JEDEC, neither nVIDIA nor AMD share information on how their memory controllers are programmed with memory timing information.  By analyzing the AMD video driver source, and with help from people contributing to a discussion on bitcointalk, I have come to understand most of the workings of AMD BIOS timing straps.

When a modern (R9 series and Rx series) AMD GPU card boots up, memory timing information (straps) are copied from the BIOS to registers in the memory controller.  Some timing information such as refresh frequency is not dependent on the memory speed and therefore is not contained in the memory strap table, but much of the important timing information is.  The memory controller registers are 32-bits wide, and so the 48-byte memory straps map to 12 different memory controller registers.  The shift masks in the Linux driver source are therefore non-functional, and can only be taken as hints as to the meaning of the individual bits.  Due to an apparently bureaucratic process for releasing open-source code, AMD engineers are generally reluctant to update such code.

Jumping right to the code, here's a C structure definition for the Rx memory straps:
SEQ_WR_CTL_D1_FORMAT SEQ_WR_CTL_D1;
SEQ_WR_CTL_2_FORMAT SEQ_WR_CTL_2;
SEQ_PMG_TIMING_FORMAT SEQ_PMG_TIMING;
SEQ_RAS_TIMING_FORMAT SEQ_RAS_TIMING;
SEQ_CAS_TIMING_FORMAT SEQ_CAS_TIMING;
SEQ_MISC_TIMING_FORMAT SEQ_MISC_TIMING;
SEQ_MISC_TIMING2_FORMAT SEQ_MISC_TIMING2;
uint32_t SEQ_MISC1;
uint32_t SEQ_MISC3;
uint32_t SEQ_MISC8;
ARB_DRAM_TIMING_FORMAT ARB_DRAM_TIMING;
ARB_DRAM_TIMING2_FORMAT ARB_DRAM_TIMING2;

Looking at the RAS timing, it consists of 6 fields: RCDW, RCDWA, RCDR, RCDRA, RRD, and RC.  The full field definitions can be found in my fork of Kristy-Leigh's code.  Many of the "pad" fields are likely the high bits of the preceding field that are not currently used.  I tested a couple pad fields already (MISC RP_RDA & RP), confirming that the pad bits were actually the high bits of the fields.


For GDDR5, some timing values have both Long and Short versions that apply for access within a bank group or to different bank groups.  The RRD field of RAS timing is likely RRDL, because the values typically seen for this field are 5 and 6.  If RRDS was 5, this would mean at most one page could be opened every five cycles, limiting 32-byte random read performance to 2/5 or 40% of the maximum interface speed.  From my work with Ethereum mining, I know that RRDS can be no more than 4.  In addition, performance tests with RRD timing reduced to 5 from 6 are congruent with it being RRDL.  The actual value of RRDS used by the memory controller does not seem to be contained in the timing strap.  The default 1750Mhz strap for Samsung K4G4 memory has a value of 10 for FAW, which can be no more than 4 * RRDS.  Therefore RRDS is most likely less than 4, and possibly as low as 2.

To simplify the process of modifying memory straps for improved performance, I wrote strapmod.  I also wrote a cgi wrapper for the program, which you can run from my server http://45.62.227.192/cgi-bin/strapmod.  For example, this is the output with the 1750Mhz strap for Samsung K4G4 memory:
Rx strap detected
Old, new RRD: 6 , 5
Old, new FAW: A , 0
Old, new 32AW: 7 , 0
Old, new ACTRD: 19 , 0x10
777000000000000022CC1C0010626C49D0571016B50BD509004AE700140514207A8900A003000000191131399D2C3617
777000000000000022CC1C0010625C49D0571016B50BD50900400700140514207A8900A003000000101131399D2C3617

12 comments:

  1. Good work, Ralph! How much of a boost did you see using the customized timings vs copying the 1500Mhz timings?

    ReplyDelete
    Replies
    1. The benefit depends on the type and the base strap used (I wasn't using the 1500Mhz strap though). The biggest benefit is with Rx cards running high memory clocks. For R9 (i.e. Tonga) running Hynix memory at 1625 with the 1375 strap, it doesn't need much tuning since it already has tight values for RRD and FAW. Elpida 1375 isn't as tight, so my strapmod utility can help.

      Delete
  2. Wow! It works great on Samsung memory of MSI Armor RX 470 series. 0.7 to 0.9 Mhash increase with your timings.

    ReplyDelete
  3. Truly amazing: I went from 28.4Mh/s to 31.7Mh/s using it. Great job.

    ReplyDelete
  4. Is there any disadvantage for customizing the straps with your tool? I have not found any disadvantage but I always OC first and undervolt with stock straps, I have tried to understand how this affect my testing with no luck ( I am still reading), i was thinking just to just customize my straps with your tool and then test undervolting and OC, It's incredible the quantity of time I spend running experiments just to understand how it works I already have very descent results, but I have a necessity of testing . Thanks in advance.

    ReplyDelete
    Replies
    1. I suppose using the custom straps could cause stability problems, but I did a lot of testing on Tonga and Polaris to find the tweaks that improve performance without impacting stability. A couple times I came close to bricking a card. To play it safe, always test custom straps above the boot-up strap. So if your BIOS memory clock is 1750, just change the strap after 1750 like 2000, and the strap will only get used when you overclock the memory beyond 1750.

      Delete
  5. By the way I am fan of your blog, It's very refreshing.

    ReplyDelete
  6. nice, good job!
    btw the site is offline :(

    ReplyDelete
    Replies
    1. My virtualhost provider changed service terms on me and suspended my service. I'm working on getting it back online.

      Delete
  7. Great job!
    Does it works only with Samsung memory?
    Not with Elpida, Hynix etc?

    ReplyDelete
    Replies
    1. I've tested strapmod with Hynix and Samsung memory on Rx cards. With R9 cards I've tested it with Hynix and Elpida memory.

      Delete