700-v5.5-net-core-allow-fast-GRO-for-skbs-with-Ethernet-heade.patch 3.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
  1. From: Alexander Lobakin <[email protected]>
  2. Date: Fri, 15 Nov 2019 12:11:35 +0300
  3. Subject: [PATCH] net: core: allow fast GRO for skbs with Ethernet header in
  4. head
  5. Commit 78d3fd0b7de8 ("gro: Only use skb_gro_header for completely
  6. non-linear packets") back in May'09 (v2.6.31-rc1) has changed the
  7. original condition '!skb_headlen(skb)' to
  8. 'skb->mac_header == skb->tail' in gro_reset_offset() saying: "Since
  9. the drivers that need this optimisation all provide completely
  10. non-linear packets" (note that this condition has become the current
  11. 'skb_mac_header(skb) == skb_tail_pointer(skb)' later with commmit
  12. ced14f6804a9 ("net: Correct comparisons and calculations using
  13. skb->tail and skb-transport_header") without any functional changes).
  14. For now, we have the following rough statistics for v5.4-rc7:
  15. 1) napi_gro_frags: 14
  16. 2) napi_gro_receive with skb->head containing (most of) payload: 83
  17. 3) napi_gro_receive with skb->head containing all the headers: 20
  18. 4) napi_gro_receive with skb->head containing only Ethernet header: 2
  19. With the current condition, fast GRO with the usage of
  20. NAPI_GRO_CB(skb)->frag0 is available only in the [1] case.
  21. Packets pushed by [2] and [3] go through the 'slow' path, but
  22. it's not a problem for them as they already contain all the needed
  23. headers in skb->head, so pskb_may_pull() only moves skb->data.
  24. The layout of skbs in the fourth [4] case at the moment of
  25. dev_gro_receive() is identical to skbs that have come through [1],
  26. as napi_frags_skb() pulls Ethernet header to skb->head. The only
  27. difference is that the mentioned condition is always false for them,
  28. because skb_put() and friends irreversibly alter the tail pointer.
  29. They also go through the 'slow' path, but now every single
  30. pskb_may_pull() in every single .gro_receive() will call the *really*
  31. slow __pskb_pull_tail() to pull headers to head. This significantly
  32. decreases the overall performance for no visible reasons.
  33. The only two users of method [4] is:
  34. * drivers/staging/qlge
  35. * drivers/net/wireless/iwlwifi (all three variants: dvm, mvm, mvm-mq)
  36. Note that in case with wireless drivers we can't use [1]
  37. (napi_gro_frags()) at least for now and mac80211 stack always
  38. performs pushes and pulls anyways, so performance hit is inavoidable.
  39. At the moment of v2.6.31 the mentioned change was necessary (that's
  40. why I don't add the "Fixes:" tag), but it became obsolete since
  41. skb_gro_mac_header() has gone in commit a50e233c50db ("net-gro:
  42. restore frag0 optimization"), so we can simply revert the condition
  43. in gro_reset_offset() to allow skbs from [4] go through the 'fast'
  44. path just like in case [1].
  45. This was tested on a 600 MHz MIPS CPU and a custom driver and this
  46. patch gave boosts up to 40 Mbps to method [4] in both directions
  47. comparing to net-next, which made overall performance relatively
  48. close to [1] (without it, [4] is the slowest).
  49. v2:
  50. - Add more references and explanations to commit message
  51. - Fix some typos ibid
  52. - No functional changes
  53. Signed-off-by: Alexander Lobakin <[email protected]>
  54. Signed-off-by: David S. Miller <[email protected]>
  55. ---
  56. --- a/net/core/dev.c
  57. +++ b/net/core/dev.c
  58. @@ -5404,8 +5404,7 @@ static void skb_gro_reset_offset(struct
  59. NAPI_GRO_CB(skb)->frag0 = NULL;
  60. NAPI_GRO_CB(skb)->frag0_len = 0;
  61. - if (skb_mac_header(skb) == skb_tail_pointer(skb) &&
  62. - pinfo->nr_frags &&
  63. + if (!skb_headlen(skb) && pinfo->nr_frags &&
  64. !PageHighMem(skb_frag_page(frag0))) {
  65. NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0);
  66. NAPI_GRO_CB(skb)->frag0_len = min_t(unsigned int,