.. _batches: Building Batches ================ While it is possible to send individual simulations to a :ref:`sim_bot` the best way to run simulations is to build them into batches. To do this, you create an ``OrcaFlexBatch`` and then add data files to it, this is done with the `add` method: .. code-block:: python from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, DirectorySource qfx = QalxOrcaFlex() batch_options = dm.BatchOptions( batch_queue="batch bot", # queue our batch bot will look at for jobs sim_queue="sim bot", # queue our sim bot will look at for jobs ) with OrcaFlexBatch(name="My Batch", session=qfx, batch_options=batch_options) as batch: batch.add(source=DirectorySource(r"S:\Project123\OFX\Batch1")) The reason for the ``with`` block is explained in :ref:`context_manager` but it is important to know that once the code has successfully exited the ``with`` block then the batch has been built and submitted. .. _sources: Sources ~~~~~~~ There are various options for the source argument: DirectorySource --------------- Data files will be submitted for every OrcaFlex file in a directory. The following code will include all `.dat` files in the `Batch1` folder: .. code-block:: python from qalx_orcaflex.core import DirectorySource my_files = DirectorySource(r"S:\Project 123\OFX\Batch1") The following code will include all `.dat`, `.yml` and `.yaml` files in `Batch1`: .. warning:: ``qalx_orcaflex`` does not support the concept of ``BaseFile`` in text data files (yet) .. code-block:: python from qalx_orcaflex.core import DirectorySource my_files = DirectorySource(r"S:\Project 123\OFX\Batch1", include_yaml=True) The following code will include all `.dat` files in `Batch1` and **all the subdirectories** of `Batch1`: .. code-block:: python from qalx_orcaflex.core import DirectorySource my_files = DirectorySource(r"S:\Project 123\OFX\Batch1", recursive=True) ModelSource ----------- Saves the model data from an ``OrcFxAPI.Model`` instance. You need to give the model a name: .. code-block:: python from qalx_orcaflex.core import ModelSource import OrcFxAPI as ofx model = ofx.Model() my_files = ModelSource(model, "My Model") ``ModelSource`` will save the data in the model instance at the time it is created, any future changes to the ``OrcFxAPI.Model`` instance will not be reflected. This is so you can happily use the same instance for all your load cases and not worry about loading the base model from disk each time. .. code-block:: python from qalx_orcaflex.core import ModelSource import OrcFxAPI as ofx model = ofx.Model(r"C:\MY_MASSIVE_MODEL.dat") model['My Line'].Length[0] = 100 my_100_model_source = ModelSource(model, "Model with l=100") model['My Line'].Length[0] = 200 my_200_model_source = ModelSource(model, "Model with l=200") The sources above will have different line lengths even though the line in the ``model`` variable has a length of 200. .. note:: There is no need to add ".dat" to the name of the model. This will be added automatically. FileSource ---------- This will add a single file from a path. .. code-block:: python from qalx_orcaflex.core import FileSource my_files = FileSource(path=r"S:\Project 123\OFX\Tests\MY_TEST_MODEL.dat") DataFileItemsSource ------------------- This is for when you have data files already in qalx and you want to re-run them for some reason. .. admonition:: coming soon! This source will become more useful when we implement "model deltas" which will allow you to load a base model from qalx and specify all the load case details as they are added to the batch. Batches with restarts ~~~~~~~~~~~~~~~~~~~~~ Batches which contain models that restart from another simulation should "just work". That is, those models should run in the same way as every other model in the batch and the results will be processed in the same way. However, there are a few things that you should understand about how qalx-OrcaFlex processes these to ensure you do not suffer unexpected behaviours. Use OrcFxAPI on build --------------------- By default, qalx-OrcaFlex uses ``OrcFxAPI`` to determine if a file is a restart file and it will use the use it to obtain the chain of parent models. This means that you need to have an OrcaFlex licence available on the machine you are using to build batches. If you do not have a licence available or do not want to use one, it is possible to pass ``use_orcfxapi=False`` to any of the :ref:`sources` detailed above. This will result in no licence being used but at a cost of less robust acquisition of the parent chain. Parents "Inside" and "Outside" batches -------------------------------------- qalx-OrcaFlex needs to have the parent simulation available to run a restart model. The way it treats this parent is slightly different if the model is not included within the same batch as the child. In this situation the simulation will be run and will be saved to qalx but none of the post-processing will be applied to the parent simulation. Consider the batch of models below: .. code-block:: c:\Project1\base.dat c:\Project1\m1.yml [parent=c:\Project1\base.dat] c:\Project1\m2.yml [parent=c:\Project1\m1.yml] If you create a batch with ``DirectorySource`` from ``c:\Project1`` then all the simulations will be run and any post-processing you request will be performed on all three simulations as they are considered "inside" the batch. However, if you create a batch using a single ``FileSource`` for ``c:\Project1\m2.yml`` then all three simulations will be run (they have to be for ``m2.yml`` to run) but only the ``m2.yml`` simulation will be subjected to post-processing. The ``base.dat`` and ``m1.yml`` simulation are considered to be "outside" the batch. Distributed processing of complex trees --------------------------------------- Consider the following directory of models, that you want to run as a batch. .. code-block:: base.dat m1.yml [parent=base.dat] m2.yml [parent=m1.yml] another_base.dat q1.yml [parent=another_base.dat] q2.yml [parent=q1.yml] m3.yml [parent=m2.yml] x1.yml [parent=m1.yml] qalx-OrcaFlex is used to run batches in multiple processes across distributed servers in various environments. However, for restarts to work the parent simulation file must be available on the server. Rather than running the parents of every model on every server the system will group chains of restarts together and run them sequentially on the same server. If a batch contains non-related chains however, it will split these to be run in parallel. Given the example batch from above it is clear that the following all depend on ``base.dat``: .. code-block:: base.dat m1.yml [parent=base.dat] m2.yml [parent=m1.yml] m3.yml [parent=m2.yml] x1.yml [parent=m1.yml] So you can expect these to all be run sequentially in the same process. The remaining chain which depends on ``another_base.dat`` may be run in another process on the same server or on another server all together. .. code-block:: another_base.dat q1.yml [parent=another_base.dat] q2.yml [parent=q1.yml] Batch Options ~~~~~~~~~~~~~ When you create your batch you need to give some options and configuration settings. Firstly, you need to say which queue you want to submit to. These should correspond to the queues that are specified when you start your :ref:`bots`. This flexibility allows you to create separate queues for different projects or teams and have multiple bots running in parallel. - ``batch_queue``: name of the batch queue - ``sim_queue``: name of the simulation queue The other options are passed to :ref:`batch_bot` - ``wait_between_completion_checks``: (default=30) if ``wait_to_complete`` has been set to True, which it is by default, then this will be the number of seconds that :ref:`batch_bot` will wait before checking if the jobs have all completed. - ``summarise_results``: (default=True) set this to ``False`` if you want :ref:`batch_bot` to skip making :ref:`results_summaries`. - ``build_threads``: (default=10) the cases in the batch will be added to qalx in parallel. Because the bottleneck in submitting cases is usually waiting for a HTTP response from the API, it should be ok to use lots of threads. If you find that your machine grids to a halt during this process you might want to reduce the number of threads. Equally, if you have to create many thousands of cases and are using a powerful machine you can increase the number of threads. Having too many threads will probably cause you to hit rate-limiting on the API. - ``send_batch_to``: (default=[]) this is a list of names of queues that the batch will be sent to once it has finished processing. This will happen regardless of other settings such as ``summarise_results``. - ``send_sim_to``: (default=[]) this is a list of names of queues that every sim will be sent to once it has completed. - ``notifications``: see :ref:`notifications` below. - ``timeout``: time in seconds that this batch should be expected to be complete by. See :ref:`notifications` below. .. _notifications: Notifications ~~~~~~~~~~~~~ There is an option to get qalx-OrcaFlex to send notifications when certain events happen to a batch. There are three events that have templated notifications that are sent by email: - ``data_models.notifications.NotificationSubmitted``: this will be sent once the ``BatchBot`` has submitted all the simulations to the queue for simulation. - ``data_models.notifications.NotificationCompleted``: this will be sent once the batch has been marked as complete by the ``BatchBot``. - ``data_models.notifications.NotificationTimedOut``: this will be sent if the batch has been not marked as complete by the ``BatchBot`` after the specified ``timeout`` as described above. By default, no notifications are sent. They can be enabled by adding them to the ``BatchOptions`` as an argument like so: .. code-block:: python import OrcFxAPI as ofx import qalx_orcaflex.data_models as dm from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource qfx = QalxOrcaFlex() batch_options = dm.BatchOptions( batch_queue="batch bot", sim_queue="sim bot", notifications=dm.Notifications( notify_submitted=dm.notifications.NotificationSubmitted(), notify_completed=dm.notifications.NotificationCompleted(), notify_timed_out=dm.notifications.NotificationTimedOut(), ) ) You can adjust who gets the notification email through keyword arguments to each notification. e.g,.: .. code-block:: python notifications=dm.Notifications( notify_submitted=dm.notifications.NotificationSubmitted( include_creator=True, to=['bob@analysiscorp.co'], cc=['anne@analysiscorp.co'], bcc=['project_1235566@proj.analysiscorp.co'], subject='I think we finally have this working Bob!' ) ) For more details see the :ref:`Notifications API docs `. .. _context_manager: Context manager ~~~~~~~~~~~~~~~ ``OrcaFlexBatch`` is a context manager, this means that you need to use it in a ``with`` block. Doing this means that the data files and associated items will only be created in qalx when the code you run to build you batch has completed successfully. For example, the following code will error when the line length is set with the wrong data type: .. code-block:: python import OrcFxAPI as ofx import qalx_orcaflex.data_models as dm from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource qfx = QalxOrcaFlex() batch_options = dm.BatchOptions( batch_queue="batch bot", sim_queue="sim bot", ) m = ofx.Model() line = m.CreateObject(ofx.otLine, "My Line") with OrcaFlexBatch(name="I will never get built", session=qfx, batch_options=batch_options) as batch: for length in [100, '120']: line.Length[0] = length batch.add(ModelSource(m, f"Case l={length}")) In the above code "Case l=100" will not be added to qalx so you don't have to worry about creating resources that contain errors or partial batches. Complete example ~~~~~~~~~~~~~~~~ The following example shows that you can add to a batch from multiple sources: .. code-block:: python import OrcFxAPI as ofx import qalx_orcaflex.data_models as dm from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource, \ DirectorySource, FileSource qfx = QalxOrcaFlex() batch_options = dm.BatchOptions( batch_queue="batch bot", sim_queue="sim bot", ) m = ofx.Model() line = m.CreateObject(ofx.otLine, "My Line") with OrcaFlexBatch(name="My Batch", session=qfx, batch_options=batch_options) as batch: for length in [100, 120]: line.Length[0] = length batch.add(ModelSource(m, f"Case l={length}")) batch.add(DirectorySource(r"S:\Project 123\OFX\140-160m models")) batch.add(FileSource(r"C:\User\AnneAlysis\My Models\180m.dat")) Advanced concepts ~~~~~~~~~~~~~~~~~ .. _orcaflex_job: OrcaFlexJob ----------- The ``OrcaFlexBatch`` object manages the creation of any number of ``OrcaFlexJob`` objects. These are an entity of type ``pyqalx.Set`` which means that they are a collection of references to ``pyqalx.Item`` which contain all the information about the simulation that you want to run. It may be useful to know the structure of this object so that you know where to find certain information about the jobs in your batch. .. note:: some fields exist on ``qalx_orcaflex.data_models.OrcaFlexJob`` that are not detailed below, that is because they are not used or implemented in this version of ``qalx_orcaflex`` - ``job_options`` a set of options for the :ref:`sim_bot`: - time_to_wait: (default = 0) jobs will pause for this number of second before starting, this is useful if you are using a network dongle and the server hosting it can get overwhelmed by lots of simultaneous licence requests. - record_progress: (default = True) send updates on simulation progress - save_simulation: (default = True) save the simulation in qalx - licence_attempts: (default = 3600) number of times to try getting a licence before failing - max_search_attempts: (default = 10) number of attempts at :ref:`smart_statics` - max_wait: (default = 60) the longest time to wait in seconds between trying to get a licence - update_interval: (default=5) the time to wait between sending progress updates to qalx. It is better to set this to be longer if you are hitting your usage limits or the API rate limit. - delete_message_on_load: (default=False) delete the queue message in the bot `onload` function. This is useful to avoid the job being duplicated in the queue if it takes more than 12 hours to process. See https://docs.qalx.net/bots#onload. - ``data_file`` an item containing an OrcaFlex data file. - file: the file item - file_name: the name of the file, used when saving back to disk - meta: - data_file_name: the full path to the file if it came from disk - ``sim_file`` the saved simulation file - ``results`` a mapping of result names to guids of the item that contain :ref:`results` - ``model_views`` a mapping of model view name to details about the model view - ``saved_views`` a mapping of model view name to guid of item with image file. - ``progress`` a structure with information about the progress of the simulation: - progress: a summary of the current progress - start_time: time the job started - end_time: time the job ended - current_time: current time in the job - time_to_go: how long estimated to completion in seconds - percent: progress as a percentage - pretty_time: a nice string of time to go e.g. "3 hours, 4 mins" - ``warnings``: an item containing all the text warnings from OrcaFlex as well as any warnings created by :ref:`sim_bot` - ``load_case_info``: all the :ref:`load_case_info` JobState -------- Information about the state of a job is saved on the metadata of the ``OrcaFlexJob``. There is a python `Enum `_ provided as ``qalx_orcaflex.data_models.JobState`` with the values relating to the states of the job as per the tables below. +----------------------------------+------------------------------+---------------------------------------------+ | Enum | Value | Description | +==================================+==============================+=============================================+ | JobState.NEW | "New" | When a job has been created | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.QUEUED | "Queued" | When the job has been added to the queue | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.PRE_PROCESSING | "Pre-processing" | The job has been loaded by a bot | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.PROCESSING | "Processing" | The job is about to be run bot | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.LOADING_MODEL_DATA | "Loading model data" | The model data is about to be loaded | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.MODEL_DATA_LOADED | "Model data loaded" | The model data has loaded | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.RUNNING_STATICS | "Running statics" | Trying to find a static solution | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.STATICS_FAILED | "Statics failed" | Couldn't find a static solution | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.RUNNING_DYNAMICS | "Running dynamics" | Running simulation dynamics | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.SAVING_SIMULATION | "Saving simulation" | Saving the simulation data to qalx | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.SIMULATION_SAVED | "Simulation saved" | Simulation data saved | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.EXTRACTING_RESULTS | "Extracting results" | Extracting results | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.RESULTS_EXTRACTED | "Results extracted" | All results extracted | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.EXTRACTING_MODEL_VIEWS | "Extracting model views" | Extracting model views | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.MODEL_VIEWS_EXTRACTED | "Model views extracted" | All model views extracted | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.EXTRACTING_MODEL_VIDEOS | "Extracting model videos" | Extracting videos | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.MODEL_VIDEOS_EXTRACTED | "Model videos extracted" | Videos extracted | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.SIMULATION_UNSTABLE | "Simulation unstable" | Simulation was unstable | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.ERROR | "Error" | There was an error | +----------------------------------+------------------------------+---------------------------------------------+ | JobState.USER_CANCELLED | "User cancelled" | A user cancelled the job | +----------------------------------+------------------------------+---------------------------------------------+ Custom sources -------------- Perhaps you have a separate system for storing OrcaFlex data files with fancy features and you want to add data files to a batch from that system without having to download them locally. You can do this by creating a custom source, the only rules are that it should inherit ``qalx_orcaflex.core.BaseSource`` and implement a ``to_jobs`` instance method. The code below should provide a rough idea of how this can be achieved. .. code-block:: python from io import BytesIO from typing import Mapping from my_fancy_system import get_batch_list, get_file_object from qalx_orcaflex.core import BaseSource, QalxOrcaFlex, OrcaFlexBatch from qalx_orcaflex.helpers import clean_set_key import qalx_orcaflex.data_models as dm class FancySource(BaseSource): def __init__(self, project_code, batch_name): super(FancySource, self).__init__() # initialise the parent class self.project_code = project_code self.batch_name = batch_name def to_jobs(self, base_job: Mapping) -> Mapping: for case in get_batch_list(f"{self.project_code}/{self.batch_name}"): # Here we assume that case is something like "Project123/Batch456/Case1.dat" case_name = case.split("/")[-1] # We need to create a specific structure that can be passed to # `QalxSession().item.add` data_file = { "input_file": BytesIO(get_file_object(case)), # needs to allow `.read()` "meta": { "_class": "orcaflex.job.data_file", # this is standard class "data_file_name": case, # this can be the full path }, "file_name": case_name, # this is how it will be saved if it's # downloaded later } job = self._update_copy( # the `base_job` will contain all the info that is being passed to # all the jobs like results etc. so we update a copy of that with our # data file base_job, { "data_file": data_file, # case_name is used to store the set on the group. It cannot have # @ or . in the string so we clean it. "case_name": clean_set_key(case_name) } ) # MAKE THIS A GENERATOR! yield job qfx = QalxOrcaFlex() batch_options = dm.BatchOptions( batch_queue="batch bot", sim_queue="sim bot", ) with OrcaFlexBatch(name="My Batch", session=qfx, batch_options=batch_options) as batch: batch.add(FancySource("Project 123", "Batch 3")) Batch waiter ------------ A waiter for a batch provides the functionality to wait until all the processing of a batch is complete, before the next section of the code is executed. This can be useful in the case where some additional post-processing is required, after a batch has been completed. Normally, some manual checking for completion would be needed until the reporting or post-processing code is executed. The batch waiter automates this workflow and can be run as a context-manager from the ``when_complete`` method on a batch. This is shown with an example below. .. code-block:: python import OrcFxAPI as ofx import qalx_orcaflex.data_models as dm from qalx_orcaflex.core import QalxOrcaFlex, OrcaFlexBatch, ModelSource, \ DirectorySource, FileSource qfx = QalxOrcaFlex() batch_options = dm.BatchOptions( batch_queue="batch bot", sim_queue="sim bot", ) m = ofx.Model() line = m.CreateObject(ofx.otLine, "My Line") with OrcaFlexBatch(name="My Batch", session=qfx, batch_options=batch_options) as batch: for length in [100, 120]: line.Length[0] = length batch.add(ModelSource(m, f"Case l={length}")) batch.add(DirectorySource(r"S:\Project 123\OFX\140-160m models")) batch.add(FileSource(r"C:\User\AnneAlysis\My Models\180m.dat")) with batch.when_complete( interval=20, timeout=1*60*60, run_with_gui=False ): pass # This section of the code will be executed once the batch processing is complete. # The waiter checks the status of the batch every 20 seconds. There is a specified # timeout of one hour when the waiter will exit anyway. The option `run_with_gui` # can be set to True and this will show the progress of the batch visually in a window