0006-x86-MCE-AMD-Allow-Reserved-types-to-be-overwritten-i.patch 3.5 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
  1. From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
  2. From: Yazen Ghannam <[email protected]>
  3. Date: Thu, 21 Nov 2019 08:15:08 -0600
  4. Subject: [PATCH] x86/MCE/AMD: Allow Reserved types to be overwritten in
  5. smca_banks[]
  6. Each logical CPU in Scalable MCA systems controls a unique set of MCA
  7. banks in the system. These banks are not shared between CPUs. The bank
  8. types and ordering will be the same across CPUs on currently available
  9. systems.
  10. However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
  11. other CPUs do not. In this case, the bank seen as Reserved on one CPU is
  12. assumed to be the same type as the bank seen as a known type on another
  13. CPU.
  14. In general, this occurs when the hardware represented by the MCA bank
  15. is disabled, e.g. disabled memory controllers on certain models, etc.
  16. The MCA bank is disabled in the hardware, so there is no possibility of
  17. getting an MCA/MCE from it even if it is assumed to have a known type.
  18. For example:
  19. Full system:
  20. Bank | Type seen on CPU0 | Type seen on CPU1
  21. ------------------------------------------------
  22. 0 | LS | LS
  23. 1 | UMC | UMC
  24. 2 | CS | CS
  25. System with hardware disabled:
  26. Bank | Type seen on CPU0 | Type seen on CPU1
  27. ------------------------------------------------
  28. 0 | LS | LS
  29. 1 | UMC | RAZ
  30. 2 | CS | CS
  31. For this reason, there is a single, global struct smca_banks[] that is
  32. initialized at boot time. This array is initialized on each CPU as it
  33. comes online. However, the array will not be updated if an entry already
  34. exists.
  35. This works as expected when the first CPU (usually CPU0) has all
  36. possible MCA banks enabled. But if the first CPU has a subset, then it
  37. will save a "Reserved" type in smca_banks[]. Successive CPUs will then
  38. not be able to update smca_banks[] even if they encounter a known bank
  39. type.
  40. This may result in unexpected behavior. Depending on the system
  41. configuration, a user may observe issues enumerating the MCA
  42. thresholding sysfs interface. The issues may be as trivial as sysfs
  43. entries not being available, or as severe as system hangs.
  44. For example:
  45. Bank | Type seen on CPU0 | Type seen on CPU1
  46. ------------------------------------------------
  47. 0 | LS | LS
  48. 1 | RAZ | UMC
  49. 2 | CS | CS
  50. Extend the smca_banks[] entry check to return if the entry is a
  51. non-reserved type. Otherwise, continue so that CPUs that encounter a
  52. known bank type can update smca_banks[].
  53. Fixes: 68627a697c19 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
  54. Signed-off-by: Yazen Ghannam <[email protected]>
  55. Signed-off-by: Borislav Petkov <[email protected]>
  56. Signed-off-by: Thomas Lamprecht <[email protected]>
  57. ---
  58. arch/x86/kernel/cpu/mce/amd.c | 2 +-
  59. 1 file changed, 1 insertion(+), 1 deletion(-)
  60. diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
  61. index 6ea7fdc82f3c..08e09c8c269f 100644
  62. --- a/arch/x86/kernel/cpu/mce/amd.c
  63. +++ b/arch/x86/kernel/cpu/mce/amd.c
  64. @@ -266,7 +266,7 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
  65. smca_set_misc_banks_map(bank, cpu);
  66. /* Return early if this bank was already initialized. */
  67. - if (smca_banks[bank].hwid)
  68. + if (smca_banks[bank].hwid && smca_banks[bank].hwid->hwid_mcatype != 0)
  69. return;
  70. if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_IPID(bank), &low, &high)) {