.. _batches:

Building Batches
================

While it is possible to send individual simulations to a :ref:`sim_bot` the best way to run
simulations is to build them into batches.

To do this, you create an ``OrcaFlexBatch`` and then add data files to it, this is done with the
`add` method:

.. code-block:: python

    from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, DirectorySource

    qfx = QalxOrcaFlex()
    batch_options = dm.BatchOptions(
    batch_queue="batch bot",  # queue our batch bot will look at for jobs
    sim_queue="sim bot",  # queue our sim bot will look at for jobs
        )
    with OrcaFlexBatch(name="My Batch", session=qfx, batch_options=batch_options) as batch:
        batch.add(source=DirectorySource(r"S:\Project123\OFX\Batch1"))


The reason for the ``with`` block is explained in :ref:`context_manager` but it is important to
know that once the code has successfully exited the ``with`` block then the batch has been built and
submitted.

.. _sources:

Sources
~~~~~~~

There are various options for the source argument:

DirectorySource
---------------

Data files will be submitted for every OrcaFlex file in a directory.

The following code will include all `.dat` files in the `Batch1` folder:

.. code-block:: python

    from qalx_orcaflex.core import DirectorySource
    my_files = DirectorySource(r"S:\Project 123\OFX\Batch1")


The following code will include all `.dat`, `.yml` and `.yaml` files in `Batch1`:

.. warning::

    ``qalx_orcaflex`` does not support the concept of ``BaseFile`` in text data files (yet)

.. code-block:: python

    from qalx_orcaflex.core import DirectorySource
    my_files = DirectorySource(r"S:\Project 123\OFX\Batch1", include_yaml=True)

The following code will include all `.dat` files in `Batch1` and **all the subdirectories** of
`Batch1`:

.. code-block:: python

    from qalx_orcaflex.core import DirectorySource
    my_files = DirectorySource(r"S:\Project 123\OFX\Batch1", recursive=True)

ModelSource
-----------

Saves the model data from an ``OrcFxAPI.Model`` instance. You need to give the model a name:

.. code-block:: python

    from qalx_orcaflex.core import ModelSource
    import OrcFxAPI as ofx

    model = ofx.Model()
    my_files = ModelSource(model, "My Model")

``ModelSource`` will save the data in the model instance at the
time it is created, any future changes to the ``OrcFxAPI.Model`` instance will not be reflected.
This is so you can happily use the same instance for all your load cases and not worry about
loading the base model from disk each time.

.. code-block:: python

    from qalx_orcaflex.core import ModelSource
    import OrcFxAPI as ofx

    model = ofx.Model(r"C:\MY_MASSIVE_MODEL.dat")
    model['My Line'].Length[0] = 100
    my_100_model_source = ModelSource(model, "Model with l=100")
    model['My Line'].Length[0] = 200
    my_200_model_source = ModelSource(model, "Model with l=200")

The sources above will have different line lengths even though the line in the ``model`` variable
has a length of 200.

.. note::

    There is no need to add ".dat" to the name of the model. This will be added automatically.

FileSource
----------

This will add a single file from a path.

.. code-block:: python

    from qalx_orcaflex.core import FileSource
    my_files = FileSource(path=r"S:\Project 123\OFX\Tests\MY_TEST_MODEL.dat")


DataFileItemsSource
-------------------

This is for when you have data files already in qalx and you want to re-run them for some reason.

.. admonition:: coming soon!

    This source will become more useful when we implement "model deltas" which will allow you to
    load a base model from qalx and specify all the load case details as they are added to the
    batch.

Batches with restarts
~~~~~~~~~~~~~~~~~~~~~

Batches which contain models that restart from another simulation should "just work". That is, those models should run
in the same way as every other model in the batch and the results will be processed in the same way.
However, there are a few things that you should understand about how qalx-OrcaFlex processes these to ensure you
do not suffer unexpected behaviours.

Use OrcFxAPI on build
---------------------

By default, qalx-OrcaFlex uses ``OrcFxAPI`` to determine if a file is a restart file and it will use the
use it to obtain the chain of parent models. This means that you need to have an OrcaFlex licence available on
the machine you are using to build batches.

If you do not have a licence available or do not want to use one, it is possible to pass ``use_orcfxapi=False`` to
any of the :ref:`sources` detailed above. This will result in no licence being used but at a cost of less robust
acquisition of the parent chain.


Parents "Inside" and "Outside" batches
--------------------------------------

qalx-OrcaFlex needs to have the parent simulation available to run a restart model. The way it treats this parent is
slightly different if the model is not included within the same batch as the child. In this situation the simulation
will be run and will be saved to qalx but none of the post-processing will be applied to the parent simulation.

Consider the batch of models below:

.. code-block::

    c:\Project1\base.dat
    c:\Project1\m1.yml [parent=c:\Project1\base.dat]
    c:\Project1\m2.yml [parent=c:\Project1\m1.yml]

If you create a batch with ``DirectorySource`` from ``c:\Project1`` then all the simulations will be run and any
post-processing you request will be performed on all three simulations as they are considered "inside" the batch.

However, if you create a batch using a single ``FileSource`` for ``c:\Project1\m2.yml`` then all three simulations will
be run (they have to be for ``m2.yml`` to run) but only the ``m2.yml`` simulation will be subjected to post-processing.
The ``base.dat`` and ``m1.yml`` simulation are considered to be "outside" the batch.

Distributed processing of complex trees
---------------------------------------

Consider the following directory of models, that you want to run as a batch.

.. code-block::

    base.dat
    m1.yml [parent=base.dat]
    m2.yml [parent=m1.yml]
    another_base.dat
    q1.yml [parent=another_base.dat]
    q2.yml [parent=q1.yml]
    m3.yml [parent=m2.yml]
    x1.yml [parent=m1.yml]

qalx-OrcaFlex is used to run batches in multiple processes across distributed servers in various environments.
However, for restarts to work the parent simulation file must be available on the server. Rather than running the
parents of every model on every server the system will group chains of restarts together and run them sequentially on
the same server. If a batch contains non-related chains however, it will split these to be run in parallel.

Given the example batch from above it is clear that the following all depend on ``base.dat``:

.. code-block::

    base.dat
    m1.yml [parent=base.dat]
    m2.yml [parent=m1.yml]
    m3.yml [parent=m2.yml]
    x1.yml [parent=m1.yml]

So you can expect these to all be run sequentially in the same process. The remaining chain which depends on
``another_base.dat`` may be run in another process on the same server or on another server all together.

.. code-block::

    another_base.dat
    q1.yml [parent=another_base.dat]
    q2.yml [parent=q1.yml]


Batch Options
~~~~~~~~~~~~~

When you create your batch you need to give some options and configuration settings. Firstly, you
need to say which queue you want to submit to. These should correspond to the queues that are
specified when you start your :ref:`bots`. This flexibility allows you to create separate queues
for different projects or teams and have multiple bots running in parallel.

- ``batch_queue``: name of the batch queue
- ``sim_queue``: name of the simulation queue

The other options are passed to :ref:`batch_bot`

- ``wait_between_completion_checks``:

    (default=30) if ``wait_to_complete`` has been set to True, which it is by default, then this
    will be the number of seconds that :ref:`batch_bot` will wait before checking if the jobs
    have all completed.

- ``summarise_results``:

    (default=True) set this to ``False`` if you want :ref:`batch_bot` to skip making
    :ref:`results_summaries`.

- ``build_threads``:

    (default=10) the cases in the batch will be added to qalx in parallel. Because the bottleneck
    in submitting cases is usually waiting for a HTTP response from the API, it should be ok to use
    lots of threads. If you find that your machine grids to a halt during this process you might
    want to reduce the number of threads. Equally, if you have to create many thousands of cases
    and are using a powerful machine you can increase the number of threads. Having too many
    threads will probably cause you to hit rate-limiting on the API.

- ``send_batch_to``:

    (default=[]) this is a list of names of queues that the batch will be sent to once it has
    finished processing. This will happen regardless of other settings such as ``summarise_results``.

- ``send_sim_to``:

    (default=[]) this is a list of names of queues that every sim will be sent to once it has
    completed.

- ``notifications``:

    see :ref:`notifications` below.

- ``timeout``:

    time in seconds that this batch should be expected to be complete by. See :ref:`notifications` below.

.. _notifications:

Notifications
~~~~~~~~~~~~~

There is an option to get qalx-OrcaFlex to send notifications when certain events happen to a batch. There are three events that have templated notifications that are sent by email:

- ``data_models.notifications.NotificationSubmitted``: this will be sent once the ``BatchBot`` has submitted all the simulations to the queue for simulation.

- ``data_models.notifications.NotificationCompleted``: this will be sent once the batch has been marked as complete by the ``BatchBot``.

- ``data_models.notifications.NotificationTimedOut``: this will be sent if the batch has been not marked as complete by the ``BatchBot`` after the specified ``timeout`` as described above.


By default, no notifications are sent. They can be enabled by adding them to the ``BatchOptions`` as an argument like so:

.. code-block:: python

    import OrcFxAPI as ofx

    import qalx_orcaflex.data_models as dm
    from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource

    qfx = QalxOrcaFlex()
    batch_options = dm.BatchOptions(
        batch_queue="batch bot",
        sim_queue="sim bot",
        notifications=dm.Notifications(
            notify_submitted=dm.notifications.NotificationSubmitted(),
            notify_completed=dm.notifications.NotificationCompleted(),
            notify_timed_out=dm.notifications.NotificationTimedOut(),
        )
    )

You can adjust who gets the notification email through keyword arguments to each notification. e.g,.:

.. code-block:: python

    notifications=dm.Notifications(
            notify_submitted=dm.notifications.NotificationSubmitted(
                    include_creator=True,
                    to=['bob@analysiscorp.co'],
                    cc=['anne@analysiscorp.co'],
                    bcc=['project_1235566@proj.analysiscorp.co'],
                    subject='I think we finally have this working Bob!'
                )
            )

For more details see the :ref:`Notifications API docs <notifications_api>`.

.. _context_manager:

Context manager
~~~~~~~~~~~~~~~

``OrcaFlexBatch`` is a context manager, this means that you need to use it in a ``with``
block. Doing this means that the data files and associated items will only be created in qalx
when the code you run to build you batch has completed successfully.

For example, the following code will error when the line length is set with the wrong data type:

.. code-block:: python

    import OrcFxAPI as ofx

    import qalx_orcaflex.data_models as dm
    from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource

    qfx = QalxOrcaFlex()
    batch_options = dm.BatchOptions(
        batch_queue="batch bot",
        sim_queue="sim bot",
    )
    m = ofx.Model()
    line = m.CreateObject(ofx.otLine, "My Line")

    with OrcaFlexBatch(name="I will never get built", session=qfx,
                       batch_options=batch_options) as batch:
        for length in [100, '120']:
            line.Length[0] = length
            batch.add(ModelSource(m, f"Case l={length}"))

In the above code "Case l=100" will not be added to qalx so you don't have to worry about
creating resources that contain errors or partial batches.

Complete example
~~~~~~~~~~~~~~~~

The following example shows that you can add to a batch from multiple sources:

.. code-block:: python

    import OrcFxAPI as ofx

    import qalx_orcaflex.data_models as dm
    from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource, \
        DirectorySource, FileSource

    qfx = QalxOrcaFlex()
    batch_options = dm.BatchOptions(
        batch_queue="batch bot",
        sim_queue="sim bot",
    )
    m = ofx.Model()
    line = m.CreateObject(ofx.otLine, "My Line")

    with OrcaFlexBatch(name="My Batch", session=qfx,
                       batch_options=batch_options) as batch:
        for length in [100, 120]:
            line.Length[0] = length
            batch.add(ModelSource(m, f"Case l={length}"))
        batch.add(DirectorySource(r"S:\Project 123\OFX\140-160m models"))
        batch.add(FileSource(r"C:\User\AnneAlysis\My Models\180m.dat"))

Advanced concepts
~~~~~~~~~~~~~~~~~

.. _orcaflex_job:

OrcaFlexJob
-----------

The ``OrcaFlexBatch`` object manages the creation of any number of ``OrcaFlexJob`` objects. These
are an entity of type ``pyqalx.Set`` which means that they are a collection of references to
``pyqalx.Item`` which contain all the information about the simulation that you want to run. It
may be useful to know the structure of this object so that you know where to find certain
information about the jobs in your batch.

.. note::

    some fields exist on ``qalx_orcaflex.data_models.OrcaFlexJob`` that are not detailed below,
    that is because they are not used or implemented in this version of ``qalx_orcaflex``

- ``job_options`` a set of options for the :ref:`sim_bot`:

    - time_to_wait:
        (default = 0) jobs will pause for this number of second before starting, this is useful
        if you are using a network dongle and the server hosting it can get overwhelmed by lots
        of simultaneous licence requests.
    - record_progress:
        (default = True) send updates on simulation progress
    - save_simulation:
        (default = True) save the simulation in qalx
    - licence_attempts:
        (default = 3600) number of times to try getting a licence before failing
    - max_search_attempts:
        (default = 10) number of attempts at :ref:`smart_statics`
    - max_wait:
        (default = 60) the longest time to wait in seconds between trying to get a licence
    - update_interval:
        (default=5) the time to wait between sending progress updates to qalx. It is better to
        set this to be longer if you are hitting your usage limits or the API rate limit.
    - delete_message_on_load:
        (default=False) delete the queue message in the bot `onload` function. This is useful
        to avoid the job being duplicated in the queue if it takes more than 12 hours to
        process. See https://docs.qalx.net/bots#onload.

- ``data_file`` an item containing an OrcaFlex data file.

    - file: the file item
    - file_name: the name of the file, used when saving back to disk
    - meta:
        - data_file_name: the full path to the file if it came from disk

- ``sim_file`` the saved simulation file

- ``results`` a mapping of result names to guids of the item that contain :ref:`results`

- ``model_views`` a mapping of model view name to details about the model view

- ``saved_views`` a mapping of model view name to guid of item with image file.

- ``progress`` a structure with information about the progress of the simulation:

    - progress: a summary of the current progress
    - start_time: time the job started
    - end_time: time the job ended
    - current_time: current time in the job
    - time_to_go: how long estimated to completion in seconds
    - percent: progress as a percentage
    - pretty_time: a nice string of time to go e.g. "3 hours, 4 mins"

- ``warnings``:

    an item containing all the text warnings from OrcaFlex as well as any warnings
    created by :ref:`sim_bot`

- ``load_case_info``: all the :ref:`load_case_info`


JobState
--------

Information about the state of a job is saved on the metadata of the ``OrcaFlexJob``. There is a
python `Enum <https://docs.python.org/3.8/library/enum.html#module-enum>`_ provided as
``qalx_orcaflex.data_models.JobState`` with the values relating to the states of the job as per
the tables below.

+----------------------------------+------------------------------+---------------------------------------------+
| Enum                             | Value                        | Description                                 |
+==================================+==============================+=============================================+
| JobState.NEW                     |  "New"                       |  When a   job has been created              |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.QUEUED                  |  "Queued"                    |  When the job has been added to the   queue |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.PRE_PROCESSING          |  "Pre-processing"            |  The job has been loaded by a bot           |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.PROCESSING              |  "Processing"                |  The job is about to be run bot             |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.LOADING_MODEL_DATA      |  "Loading model data"        |  The model data is about to be   loaded     |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.MODEL_DATA_LOADED       |  "Model data loaded"         |  The model data has loaded                  |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.RUNNING_STATICS         |  "Running statics"           |  Trying to find a static solution           |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.STATICS_FAILED          |  "Statics failed"            |  Couldn't find a static solution            |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.RUNNING_DYNAMICS        |  "Running dynamics"          |  Running simulation dynamics                |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.SAVING_SIMULATION       |  "Saving simulation"         |  Saving the simulation data to qalx         |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.SIMULATION_SAVED        |  "Simulation saved"          |  Simulation data saved                      |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.EXTRACTING_RESULTS      |  "Extracting results"        |  Extracting results                         |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.RESULTS_EXTRACTED       |  "Results extracted"         |  All results extracted                      |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.EXTRACTING_MODEL_VIEWS  |  "Extracting model views"    |  Extracting model views                     |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.MODEL_VIEWS_EXTRACTED   |  "Model views extracted"     |  All model views extracted                  |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.EXTRACTING_MODEL_VIDEOS |  "Extracting model   videos" |  Extracting videos                          |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.MODEL_VIDEOS_EXTRACTED  |  "Model videos extracted"    |  Videos extracted                           |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.SIMULATION_UNSTABLE     |  "Simulation unstable"       |  Simulation was unstable                    |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.ERROR                   |  "Error"                     |  There was an error                         |
+----------------------------------+------------------------------+---------------------------------------------+
| JobState.USER_CANCELLED          |  "User cancelled"            |  A user cancelled the job                   |
+----------------------------------+------------------------------+---------------------------------------------+


Custom sources
--------------

Perhaps you have a separate system for storing OrcaFlex data files with fancy features and you
want to add data files to a batch from that system without having to download them locally. You
can do this by creating a custom source, the only rules are that it should inherit
``qalx_orcaflex.core.BaseSource`` and implement a ``to_jobs`` instance method. The code below
should provide a rough idea of how this can be achieved.

.. code-block:: python

    from io import BytesIO
    from typing import Mapping

    from my_fancy_system import get_batch_list, get_file_object

    from qalx_orcaflex.core import BaseSource, QalxOrcaFlex, OrcaFlexBatch
    from qalx_orcaflex.helpers import clean_set_key
    import qalx_orcaflex.data_models as dm

    class FancySource(BaseSource):

        def __init__(self, project_code, batch_name):
            super(FancySource, self).__init__() # initialise the parent class
            self.project_code = project_code
            self.batch_name = batch_name

        def to_jobs(self, base_job: Mapping) -> Mapping:
            for case in get_batch_list(f"{self.project_code}/{self.batch_name}"):
                # Here we assume that case is something like "Project123/Batch456/Case1.dat"
                case_name = case.split("/")[-1]
                # We need to create a specific structure that can be passed to
                # `QalxSession().item.add`

                data_file = {
                    "input_file": BytesIO(get_file_object(case)), # needs to allow `.read()`
                    "meta": {
                        "_class": "orcaflex.job.data_file", # this is standard class
                        "data_file_name": case, # this can be the full path
                    },
                    "file_name": case_name, # this is how it will be saved if it's
                    # downloaded later
                }
                job = self._update_copy(
                    # the `base_job` will contain all the info that is being passed to
                    # all the jobs like results etc. so we update a copy of that with our
                    # data file
                    base_job, {
                        "data_file": data_file,
                        # case_name is used to store the set on the group. It cannot have
                        # @ or . in the string so we clean it.
                        "case_name": clean_set_key(case_name)
                    }
                )
                # MAKE THIS A GENERATOR!
                yield job

    qfx = QalxOrcaFlex()
    batch_options = dm.BatchOptions(
        batch_queue="batch bot",
        sim_queue="sim bot",
    )
    with OrcaFlexBatch(name="My Batch", session=qfx,
                       batch_options=batch_options) as batch:
        batch.add(FancySource("Project 123", "Batch 3"))


Batch waiter
------------

A waiter for a batch provides the functionality to wait until all the processing of a batch is complete, before the next
section of the code is executed. This can be useful in the case where some additional post-processing is required, after
a batch has been completed. Normally, some manual checking for completion would be needed until the reporting or post-processing
code is executed. The batch waiter automates this workflow and can be run as a context-manager from the ``when_complete``
method on a batch. This is shown with an example below.

.. code-block:: python

    import OrcFxAPI as ofx

    import qalx_orcaflex.data_models as dm
    from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource, \
        DirectorySource, FileSource

    qfx = QalxOrcaFlex()
    batch_options = dm.BatchOptions(
        batch_queue="batch bot",
        sim_queue="sim bot",
    )
    m = ofx.Model()
    line = m.CreateObject(ofx.otLine, "My Line")

    with OrcaFlexBatch(name="My Batch", session=qfx,
                       batch_options=batch_options) as batch:
        for length in [100, 120]:
            line.Length[0] = length
            batch.add(ModelSource(m, f"Case l={length}"))
        batch.add(DirectorySource(r"S:\Project 123\OFX\140-160m models"))
        batch.add(FileSource(r"C:\User\AnneAlysis\My Models\180m.dat"))
    with batch.when_complete(
        interval=20, timeout=1*60*60, run_with_gui=False
    ):
        pass
    # This section of the code will be executed once the batch processing is complete.
    # The waiter checks the status of the batch every 20 seconds. There is a specified
    # timeout of one hour when the waiter will exit anyway. The option `run_with_gui`
    # can be set to True and this will show the progress of the batch visually in a window