Browse Source

Help: Add documentation for CTest hardware allocation

Kyle Edwards 6 years ago
parent
commit
e9500271a3

+ 6 - 0
Help/command/ctest_test.rst

@@ -17,6 +17,7 @@ Perform the :ref:`CTest Test Step` as a :ref:`Dashboard Client`.
              [EXCLUDE_FIXTURE_SETUP <regex>]
              [EXCLUDE_FIXTURE_SETUP <regex>]
              [EXCLUDE_FIXTURE_CLEANUP <regex>]
              [EXCLUDE_FIXTURE_CLEANUP <regex>]
              [PARALLEL_LEVEL <level>]
              [PARALLEL_LEVEL <level>]
+             [HARDWARE_SPEC_FILE <file>]
              [TEST_LOAD <threshold>]
              [TEST_LOAD <threshold>]
              [SCHEDULE_RANDOM <ON|OFF>]
              [SCHEDULE_RANDOM <ON|OFF>]
              [STOP_TIME <time-of-day>]
              [STOP_TIME <time-of-day>]
@@ -82,6 +83,11 @@ The options are:
   Specify a positive number representing the number of tests to
   Specify a positive number representing the number of tests to
   be run in parallel.
   be run in parallel.
 
 
+``HARDWARE_SPEC_FILE <file>``
+  Specify a
+  :ref:`hardware specification file <ctest-hardware-specification-file>`. See
+  :ref:`ctest-hardware-allocation` for more information.
+
 ``TEST_LOAD <threshold>``
 ``TEST_LOAD <threshold>``
   While running tests in parallel, try not to start tests when they
   While running tests in parallel, try not to start tests when they
   may cause the CPU load to pass above a given threshold.  If not
   may cause the CPU load to pass above a given threshold.  If not

+ 1 - 0
Help/manual/cmake-properties.7.rst

@@ -414,6 +414,7 @@ Properties on Tests
    /prop_test/LABELS
    /prop_test/LABELS
    /prop_test/MEASUREMENT
    /prop_test/MEASUREMENT
    /prop_test/PASS_REGULAR_EXPRESSION
    /prop_test/PASS_REGULAR_EXPRESSION
+   /prop_test/PROCESSES
    /prop_test/PROCESSOR_AFFINITY
    /prop_test/PROCESSOR_AFFINITY
    /prop_test/PROCESSORS
    /prop_test/PROCESSORS
    /prop_test/REQUIRED_FILES
    /prop_test/REQUIRED_FILES

+ 229 - 0
Help/manual/ctest.1.rst

@@ -90,6 +90,15 @@ Options
 
 
  See `Label and Subproject Summary`_.
  See `Label and Subproject Summary`_.
 
 
+``--hardware-spec-file <file>``
+ Run CTest with :ref:`hardware allocation <ctest-hardware-allocation>` enabled,
+ using the
+ :ref:`hardware specification file <ctest-hardware-specification-file>`
+ specified in ``<file>``.
+
+ When ``ctest`` is run as a `Dashboard Client`_ this sets the
+ ``HardwareSpecFile`` option of the `CTest Test Step`_.
+
 ``--test-load <level>``
 ``--test-load <level>``
  While running tests in parallel (e.g. with ``-j``), try not to start
  While running tests in parallel (e.g. with ``-j``), try not to start
  tests when they may cause the CPU load to pass above a given threshold.
  tests when they may cause the CPU load to pass above a given threshold.
@@ -958,6 +967,11 @@ Arguments to the command may specify some of the step settings.
 
 
 Configuration settings include:
 Configuration settings include:
 
 
+``HardwareSpecFile``
+  Specify a
+  :ref:`hardware specification file <ctest-hardware-specification-file>`. See
+  :ref:`ctest-hardware-allocation` for more information.
+
 ``LabelsForSubprojects``
 ``LabelsForSubprojects``
   Specify a semicolon-separated list of labels that will be treated as
   Specify a semicolon-separated list of labels that will be treated as
   subprojects. This mapping will be passed on to CDash when configure, test or
   subprojects. This mapping will be passed on to CDash when configure, test or
@@ -1267,6 +1281,221 @@ model is defined as follows:
     Test properties.
     Test properties.
     Can contain keys for each of the supported test properties.
     Can contain keys for each of the supported test properties.
 
 
+.. _`ctest-hardware-allocation`:
+
+Hardware Allocation
+===================
+
+CTest provides a mechanism for tests to specify the hardware that they need and
+how much of it they need, and for users to specify the hardware availiable on
+the running machine. This allows CTest to internally keep track of which
+hardware is in use and which is free, scheduling tests in a way that prevents
+them from trying to claim hardware that is not available.
+
+A common use case for this feature is for tests that require the use of a GPU.
+Multiple tests can simultaneously allocate memory from a GPU, but if too many
+tests try to do this at once, some of them will fail to allocate, resulting in
+a failed test, even though the test would have succeeded if it had the memory
+it needed. By using the hardware allocation feature, each test can specify how
+much memory it requires from a GPU, allowing CTest to schedule tests in a way
+that running several of these tests at once does not exhaust the GPU's memory
+pool.
+
+Please note that CTest has no concept of what a GPU is or how much memory it
+has, nor does it have any way of communicating with a GPU to retrieve this
+information or perform any memory management. CTest simply keeps track of a
+list of abstract resource types, each of which has a certain number of slots
+available for tests to use. Each test specifies the number of slots that it
+requires from a certain resource, and CTest then schedules them in a way that
+prevents the total number of slots in use from exceeding the listed capacity.
+When a test is executed, and slots from a resource are allocated to that test,
+tests may assume that they have exclusive use of those slots for the duration
+of the test's process.
+
+The CTest hardware allocation feature consists of two inputs:
+
+* The :ref:`hardware specification file <ctest-hardware-specification-file>`,
+  described below, which describes the hardware resources available on the
+  system, and
+* The :prop_test:`PROCESSES` property of tests, which describes the resources
+  required by the test
+
+When CTest runs a test, the hardware allocated to that test is passed in the
+form of a set of
+:ref:`environment variables <ctest-hardware-environment-variables>` as
+described below. Using this information to decide which resource to connect to
+is left to the test writer.
+
+Please note that these processes are not spawned by CTest. The ``PROCESSES``
+property merely tells CTest what processes the test expects to launch. It is up
+to the test itself to do this process spawning, and read the :ref:`environment
+variables <ctest-hardware-environment-variables>` to determine which resources
+each process has been allocated.
+
+.. _`ctest-hardware-specification-file`:
+
+Hardware Specification File
+---------------------------
+
+The hardware specification file is a JSON file which is passed to CTest, either
+on the :manual:`ctest(1)` command line as ``--hardware-spec-file``, or as the
+``HARDWARE_SPEC_FILE`` argument of :command:`ctest_test`. The hardware
+specification file must be a JSON object. All examples in this document assume
+the following hardware specification file:
+
+.. code-block:: json
+
+  {
+    "local": [
+      {
+        "gpus": [
+          {
+            "id": "0",
+            "slots": 2
+          },
+          {
+            "id": "1",
+            "slots": 4
+          },
+          {
+            "id": "2",
+            "slots": 2
+          },
+          {
+            "id": "3"
+          }
+        ],
+        "crypto_chips": [
+          {
+            "id": "card0",
+            "slots": 4
+          }
+        ]
+      }
+    ]
+  }
+
+The members are:
+
+``local``
+  A JSON array consisting of CPU sockets present on the system. Currently, only
+  one socket is supported.
+
+  Each socket is a JSON object with members whose names are equal to the
+  desired resource types, such as ``gpu``. These names must start with a
+  lowercase letter or an underscore, and subsequent characters can be a
+  lowercase letter, a digit, or an underscore. Uppercase letters are not
+  allowed, because certain platforms have case-insensitive environment
+  variables. See the `Environment Variables`_ section below for
+  more information. It is recommended that the resource type name be the plural
+  of a noun, such as ``gpus`` or ``crypto_chips`` (and not ``gpu`` or
+  ``crypto_chip``.)
+
+  Please note that the names ``gpus`` and ``crypto_chips`` are just examples,
+  and CTest does not interpret them in any way. You are free to make up any
+  resource type you want to meet your own requirements.
+
+  The value for each resource type is a JSON array consisting of JSON objects,
+  each of which describe a specific instance of the specified resource. These
+  objects have the following members:
+
+  ``id``
+    A string consisting of an identifier for the resource. Each character in
+    the identifier can be a lowercase letter, a digit, or an underscore.
+    Uppercase letters are not allowed.
+
+    Identifiers must be unique within a resource type. However, they do not
+    have to be unique across resource types. For example, it is valid to have a
+    ``gpus`` resource named ``0`` and a ``crypto_chips`` resource named ``0``,
+    but not two ``gpus`` resources both named ``0``.
+
+    Please note that the IDs ``0``, ``1``, ``2``, ``3``, and ``card0`` are just
+    examples, and CTest does not interpret them in any way. You are free to
+    make up any IDs you want to meet your own requirements.
+
+  ``slots``
+    An optional unsigned number specifying the number of slots available on the
+    resource. For example, this could be megabytes of RAM on a GPU, or
+    cryptography units available on a cryptography chip. If ``slots`` is not
+    specified, a default value of ``1`` is assumed.
+
+In the example file above, there are four GPUs with ID's 0 through 3. GPU 0 has
+2 slots, GPU 1 has 4, GPU 2 has 2, and GPU 3 has a default of 1 slot. There is
+also one cryptography chip with 4 slots.
+
+``PROCESSES`` Property
+----------------------
+
+See :prop_test:`PROCESSES` for a description of this property.
+
+.. _`ctest-hardware-environment-variables`:
+
+Environment Variables
+---------------------
+
+Once CTest has decided which resources to allocate to a test, it passes this
+information to the test executable as a series of environment variables. For
+each example below, we will assume that the test in question has a
+:prop_test:`PROCESSES` property of ``2,gpus:2;gpus:4,gpus:1,crypto_chips:2``.
+
+The following variables are passed to the test process:
+
+.. envvar:: CTEST_PROCESS_COUNT
+
+  The total number of processes specified by the :prop_test:`PROCESSES`
+  property. For example:
+
+  * ``CTEST_PROCESS_COUNT=3``
+
+  This variable will only be defined if :manual:`ctest(1)` has been given a
+  ``--hardware-spec-file``, or if :command:`ctest_test` has been given a
+  ``HARDWARE_SPEC_FILE``. If no hardware specification file has been given,
+  this variable will not be defined.
+
+.. envvar:: CTEST_PROCESS_<num>
+
+  The list of resource types allocated to each process, with each item
+  separated by a comma. ``<num>`` is a number from zero to
+  ``CTEST_PROCESS_COUNT`` minus one. ``CTEST_PROCESS_<num>`` is defined for
+  each ``<num>`` in this range. For example:
+
+  * ``CTEST_PROCESS_0=gpus``
+  * ``CTEST_PROCESS_1=gpus``
+  * ``CTEST_PROCESS_2=crypto_chips,gpus``
+
+.. envvar:: CTEST_PROCESS_<num>_<resource-type>
+
+  The list of resource IDs and number of slots from each ID allocated to each
+  process for a given resource type. This variable consists of a series of
+  pairs, each pair separated by a semicolon, and with the two items in the pair
+  separated by a comma. The first item in each pair is ``id:`` followed by the
+  ID of a resource of type ``<resource-type>``, and the second item is
+  ``slots:`` followed by the number of slots from that resource allocated to
+  the given process. For example:
+
+  * ``CTEST_PROCESS_0_GPUS=id:0,slots:2``
+  * ``CTEST_PROCESS_1_GPUS=id:2,slots:2``
+  * ``CTEST_PROCESS_2_GPUS=id:1,slots:4;id:3,slots:1``
+  * ``CTEST_PROCESS_2_CRYPTO_CHIPS=id:card0,slots:2``
+
+  In this example, process 0 gets 2 slots from GPU ``0``, process 1 gets 2 slots
+  from GPU ``2``, and process 2 gets 4 slots from GPU ``1`` and 2 slots from
+  cryptography chip ``card0``.
+
+  ``<num>`` is a number from zero to ``CTEST_PROCESS_COUNT`` minus one.
+  ``<resource-type>`` is the name of a resource type, converted to uppercase.
+  ``CTEST_PROCESS_<num>_<resource-type>`` is defined for the product of each
+  ``<num>`` in the range listed above and each resource type listed in
+  ``CTEST_PROCESS_<num>``.
+
+  Because some platforms have case-insensitive names for environment variables,
+  the names of resource types may not clash in a case-insensitive environment.
+  Because of this, for the sake of simplicity, all resource types must be
+  listed in all lowercase in the
+  :ref:`hardware specification file <ctest-hardware-specification-file>` and in
+  the :prop_test:`PROCESSES` property, and they are converted to all uppercase
+  in the ``CTEST_PROCESS_<num>_<resource-type>`` environment variable.
+
 See Also
 See Also
 ========
 ========
 
 

+ 54 - 0
Help/prop_test/PROCESSES.rst

@@ -0,0 +1,54 @@
+PROCESSES
+----------
+
+Set to specify the number of processes spawned by a test, and the resources
+that they require. See :ref:`hardware allocation <ctest-hardware-allocation>`
+for more information on how this property integrates into the CTest hardware
+allocation feature.
+
+The ``PROCESSES`` property is a :ref:`semicolon-separated list <CMake Language
+Lists>` of process descriptions. Each process description consists of an
+optional number of processes for the description followed by a series of
+resource requirements for those processes. These requirements (and the number
+of processes) are separated by commas. The resource requirements consist of the
+name of a resource type, followed by a colon, followed by an unsigned integer
+specifying the number of slots required on one resource of the given type.
+
+Please note that these processes are not spawned by CTest. The ``PROCESSES``
+property merely tells CTest what processes the test expects to launch. It is up
+to the test itself to do this process spawning, and read the :ref:`environment
+variables <ctest-hardware-environment-variables>` to determine which resources
+each process has been allocated.
+
+Consider the following example:
+
+.. code-block:: cmake
+
+  add_test(NAME MyTest COMMAND MyExe)
+  set_property(TEST MyTest PROPERTY PROCESSES
+    "2,gpus:2"
+    "gpus:4,crypto_chips:2")
+
+In this example, there are two process descriptions (implicitly separated by a
+semicolon.) The content of the first description is ``2,gpus:2``. This
+description spawns 2 processes, each of which requires 2 slots from a single
+GPU. The content of the second description is ``gpus:4,crypto_chips:2``. This
+description does not specify a process count, so a default of 1 is assumed.
+This single process requires 4 slots from a single GPU and 2 slots from a
+single cryptography chip. In total, 3 processes are spawned from this test,
+each with their own unique requirements.
+
+When CTest sets the :ref:`environment variables
+<ctest-hardware-environment-variables>` for a test, it assigns a process number
+based on the process description, starting at 0 on the left and the number of
+processes minus 1 on the right. For example, in the example above, the two
+processes in the first description would have IDs of 0 and 1, and the single
+process in the second description would have an ID of 2.
+
+Both the ``PROCESSES`` and :prop_test:`RESOURCE_LOCK` properties serve similar
+purposes, but they are distinct and orthogonal. Resources specified by
+``PROCESSES`` do not affect :prop_test:`RESOURCE_LOCK`, and vice versa. Whereas
+:prop_test:`RESOURCE_LOCK` is a simpler property that is used for locking one
+global resource, ``PROCESSES`` is a more advanced property that allows multiple
+tests to simultaneously use multiple resources of the same type, specifying
+their requirements in a fine-grained manner.

+ 8 - 0
Help/prop_test/RESOURCE_LOCK.rst

@@ -8,3 +8,11 @@ not to run concurrently.
 
 
 See also :prop_test:`FIXTURES_REQUIRED` if the resource requires any setup or
 See also :prop_test:`FIXTURES_REQUIRED` if the resource requires any setup or
 cleanup steps.
 cleanup steps.
+
+Both the :prop_test:`PROCESSES` and ``RESOURCE_LOCK`` properties serve similar
+purposes, but they are distinct and orthogonal. Resources specified by
+:prop_test:`PROCESSES` do not affect ``RESOURCE_LOCK``, and vice versa. Whereas
+``RESOURCE_LOCK`` is a simpler property that is used for locking one global
+resource, :prop_test:`PROCESSES` is a more advanced property that allows
+multiple tests to simultaneously use multiple resources of the same type,
+specifying their requirements in a fine-grained manner.

+ 6 - 0
Help/release/dev/ctest-hardware-allocation.rst

@@ -0,0 +1,6 @@
+ctest-hardware-allocation
+-------------------------
+
+* :manual:`ctest(1)` now has the ability to serialize tests based on hardware
+  requirements for each test. See :ref:`ctest-hardware-allocation` for
+  details.