1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586 |
- From: Emmanuel Grumbach <[email protected]>
- Date: Fri, 31 Aug 2018 11:31:06 +0300
- Subject: [PATCH] mac80211: fix a race between restart and CSA flows
- We hit a problem with iwlwifi that was caused by a bug in
- mac80211. A bug in iwlwifi caused the firwmare to crash in
- certain cases in channel switch. Because of that bug,
- drv_pre_channel_switch would fail and trigger the restart
- flow.
- Now we had the hw restart worker which runs on the system's
- workqueue and the csa_connection_drop_work worker that runs
- on mac80211's workqueue that can run together. This is
- obviously problematic since the restart work wants to
- reconfigure the connection, while the csa_connection_drop_work
- worker does the exact opposite: it tries to disconnect.
- Fix this by cancelling the csa_connection_drop_work worker
- in the restart worker.
- Note that this can sound racy: we could have:
- driver iface_work CSA_work restart_work
- +++++++++++++++++++++++++++++++++++++++++++++
- |
- <--drv_cs ---|
- <FW CRASH!>
- -CS FAILED-->
- | |
- | cancel_work(CSA)
- schedule |
- CSA work |
- | |
- Race between those 2
- But this is not possible because we flush the workqueue
- in the restart worker before we cancel the CSA worker.
- That would be bullet proof if we could guarantee that
- we schedule the CSA worker only from the iface_work
- which runs on the workqueue (and not on the system's
- workqueue), but unfortunately we do have an instance
- in which we schedule the CSA work outside the context
- of the workqueue (ieee80211_chswitch_done).
- Note also that we should probably cancel other workers
- like beacon_connection_loss_work and possibly others
- for different types of interfaces, at the very least,
- IBSS should suffer from the exact same problem, but for
- now, do the minimum to fix the actual bug that was actually
- experienced and reproduced.
- Signed-off-by: Emmanuel Grumbach <[email protected]>
- Signed-off-by: Luca Coelho <[email protected]>
- Signed-off-by: Johannes Berg <[email protected]>
- ---
- --- a/net/mac80211/main.c
- +++ b/net/mac80211/main.c
- @@ -255,8 +255,27 @@ static void ieee80211_restart_work(struc
-
- flush_work(&local->radar_detected_work);
- rtnl_lock();
- - list_for_each_entry(sdata, &local->interfaces, list)
- + list_for_each_entry(sdata, &local->interfaces, list) {
- + /*
- + * XXX: there may be more work for other vif types and even
- + * for station mode: a good thing would be to run most of
- + * the iface type's dependent _stop (ieee80211_mg_stop,
- + * ieee80211_ibss_stop) etc...
- + * For now, fix only the specific bug that was seen: race
- + * between csa_connection_drop_work and us.
- + */
- + if (sdata->vif.type == NL80211_IFTYPE_STATION) {
- + /*
- + * This worker is scheduled from the iface worker that
- + * runs on mac80211's workqueue, so we can't be
- + * scheduling this worker after the cancel right here.
- + * The exception is ieee80211_chswitch_done.
- + * Then we can have a race...
- + */
- + cancel_work_sync(&sdata->u.mgd.csa_connection_drop_work);
- + }
- flush_delayed_work(&sdata->dec_tailroom_needed_wk);
- + }
- ieee80211_scan_cancel(local);
-
- /* make sure any new ROC will consider local->in_reconfig */
|