0019-EDAC-sb_edac-Don-t-create-a-second-memory-controller.patch 4.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102
  1. From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
  2. From: Qiuxu Zhuo <[email protected]>
  3. Date: Wed, 13 Sep 2017 18:42:14 +0800
  4. Subject: [PATCH] EDAC, sb_edac: Don't create a second memory controller if HA1
  5. is not present
  6. MIME-Version: 1.0
  7. Content-Type: text/plain; charset=UTF-8
  8. Content-Transfer-Encoding: 8bit
  9. Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
  10. server (DELL PowerEdge 730xd):
  11. EDAC sbridge: Some needed devices are missing
  12. EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  13. EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  14. EDAC sbridge: Couldn't find mci handler
  15. EDAC sbridge: Couldn't find mci handler
  16. EDAC sbridge: Failed to register device with error -19.
  17. The refactored sb_edac driver creates the IMC1 (the 2nd memory
  18. controller) if any IMC1 device is present. In this case only
  19. HA1_TA of IMC1 was present, but the driver expected to find
  20. HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.
  21. The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
  22. Zhang inserted one DIMM per channel for each CPU, and did random error
  23. address injection test with this patch:
  24. 4024 addresses fell in TOLM hole area
  25. 12715 addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
  26. 12774 addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
  27. 12798 addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
  28. 12913 addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
  29. 12674 addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
  30. 12686 addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
  31. 12882 addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
  32. 12934 addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
  33. 106400 addresses were injected totally.
  34. The test result shows that all the 4 channels belong to IMC0 per CPU, so
  35. the server really only has one IMC per CPU.
  36. In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
  37. implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
  38. used and should be ignored.
  39. Thus, do not create a second memory controller if the key HA1 is absent.
  40. [1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
  41. [2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
  42. Reported-and-tested-by: Yi Zhang <[email protected]>
  43. Signed-off-by: Qiuxu Zhuo <[email protected]>
  44. Cc: Tony Luck <[email protected]>
  45. Cc: linux-edac <[email protected]>
  46. Fixes: e2f747b1f42a ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
  47. Link: http://lkml.kernel.org/r/[email protected]
  48. [ Massage commit message. ]
  49. Signed-off-by: Borislav Petkov <[email protected]>
  50. (cherry picked from commit 15cc3ae001873845b5d842e212478a6570c7d938)
  51. Signed-off-by: Fabian Grünbichler <[email protected]>
  52. ---
  53. drivers/edac/sb_edac.c | 9 ++++++++-
  54. 1 file changed, 8 insertions(+), 1 deletion(-)
  55. diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
  56. index 80d860cb0746..7a3b201d51df 100644
  57. --- a/drivers/edac/sb_edac.c
  58. +++ b/drivers/edac/sb_edac.c
  59. @@ -455,6 +455,7 @@ static const struct pci_id_table pci_dev_descr_sbridge_table[] = {
  60. static const struct pci_id_descr pci_dev_descr_ibridge[] = {
  61. /* Processor Home Agent */
  62. { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0, 0, IMC0) },
  63. + { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
  64. /* Memory controller */
  65. { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA, 0, IMC0) },
  66. @@ -465,7 +466,6 @@ static const struct pci_id_descr pci_dev_descr_ibridge[] = {
  67. { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TAD3, 0, IMC0) },
  68. /* Optional, mode 2HA */
  69. - { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1, 1, IMC1) },
  70. { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA, 1, IMC1) },
  71. { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS, 1, IMC1) },
  72. { PCI_DESCR(PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TAD0, 1, IMC1) },
  73. @@ -2260,6 +2260,13 @@ static int sbridge_get_onedevice(struct pci_dev **prev,
  74. next_imc:
  75. sbridge_dev = get_sbridge_dev(bus, dev_descr->dom, multi_bus, sbridge_dev);
  76. if (!sbridge_dev) {
  77. + /* If the HA1 wasn't found, don't create EDAC second memory controller */
  78. + if (dev_descr->dom == IMC1 && devno != 1) {
  79. + edac_dbg(0, "Skip IMC1: %04x:%04x (since HA1 was absent)\n",
  80. + PCI_VENDOR_ID_INTEL, dev_descr->dev_id);
  81. + pci_dev_put(pdev);
  82. + return 0;
  83. + }
  84. if (dev_descr->dom == SOCK)
  85. goto out_imc;
  86. --
  87. 2.14.2