Factory Plans

Simple local plan

The plan below is the simplest factory defines the two bots from pypi source and a single stage called test and a local connection.

 1# simple-local-plan.yml
 2factory:
 3  name: qalx-orcaflex-local
 4  bots:
 5    batch_bot:
 6      source:
 7        pypi: qalx_orcaflex
 8      bot_path: qalx_orcaflex.bots:batch_bot
 9    sim_bot:
10      source:
11        pypi: qalx_orcaflex
12      bot_path: qalx_orcaflex.bots:sim_bot
13
14  stages:
15    test:
16      local_connection:
17        type: local
18        bots:
19          sim_bot:
20            queue-name: test-sim-queue
21            processes: 2
22          batch_bot:
23            queue-name: test-batch-queue
24            processes: 1

Building this factory with the command line below:

qalx factory-build --plan simple-local-plan.yml --stage test

Will start a sim_bot in two processes and a batch_bot in one process. They will read from the specified queue names in the highlighted lines.

Local multiple stage plan

The plan below adds an additional stage to the plan above.

 1# local-multiple-stage-plan.yml
 2factory:
 3  name: qalx-orcaflex-local
 4  bots:
 5    batch_bot:
 6      source:
 7        pypi: qalx_orcaflex
 8      bot_path: qalx_orcaflex.bots:batch_bot
 9    sim_bot:
10      source:
11        pypi: qalx_orcaflex
12      bot_path: qalx_orcaflex.bots:sim_bot
13
14  stages:
15    test:
16      local_connection:
17        type: local
18        bots:
19          sim_bot:
20            queue-name: test-sim-queue
21            processes: 2
22          batch_bot:
23            queue-name: test-batch-queue
24            processes: 1
25    production:
26      local_connection:
27        type: local
28        bots:
29          sim_bot:
30            queue-name: sim-queue
31            processes: 8
32          batch_bot:
33            queue-name: batch-queue
34            processes: 1

Building this factory with the command line below:

qalx factory-build --plan local-multiple-stage-plan.yml --stage production

Will now start a sim_bot in eight processes. These will read from a different queue as well.

Remote plan

The plan below shows a stage which will deploy the bots to a server on AWS.

 1# remote-aws-plan.yml
 2factory:
 3  name: qalx-orcaflex-remote
 4  bots:
 5    batch_bot:
 6      source:
 7        pypi: qalx_orcaflex
 8      bot_path: qalx_orcaflex.bots:batch_bot
 9    sim_bot:
10      source:
11        pypi: qalx_orcaflex
12      bot_path: qalx_orcaflex.bots:sim_bot
13
14  stages:
15    test:
16      remote_on_aws:
17        type: aws
18        parameters:
19          ImageId: ami-1a2bc3d4e5f6g7h8i
20          InstanceType: m5.12xlarge
21          # key-pair is needed if you want to connect to the remote server
22          KeyName: my-key-pair
23          NetworkInterfaces:
24            - DeviceIndex: 0
25              AssociatePublicIpAddress: true
26              # subnet needs to exist and have access to the internet
27              SubnetId: subnet-11aa22b
28              # security group needs to exist and at least allow outbound traffic
29              GroupSet:
30                - sg-q9w8e7r6t5yg2g1h0
31        bots:
32          sim_bot:
33            queue-name: test-sim-queue
34            processes: 2
35          batch_bot:
36            queue-name: test-batch-queue
37            processes: 1

The key parameters are in the highlighted lines:

  • ImageId: this is image needs to be a pyqalx base image (contact us if you don’t have access to these) with OrcaFlex installed and with access to a licence.

  • InstanceType: this is the server that the bots will be deployed to.

  • KeyName: if you want to be able to remote desktop to the servers you need to specify an existing key pair name.

  • NetworkInterface: the server will need to be placed in a subnet with access to the internet and the licence server. It will also need a security group that allows this and RDP access if required.

qalx factory-build --plan remote-aws-plan.yml --stage test

Multiple clusters

This plan is a more complete example, it allows you to run bots locally during development, then deploy them to a small cloud server for testing and then offers production clusters in 3 different sizes.

 1# full-cluster-plan.yml
 2factory:
 3  name: qalx-orcaflex-cluster
 4  bots:
 5    batch_bot:
 6      source:
 7        pypi: qalx_orcaflex
 8      bot_path: qalx_orcaflex.bots:batch_bot
 9    sim_bot:
10      source:
11        pypi: qalx_orcaflex
12      bot_path: qalx_orcaflex.bots:sim_bot
13
14  stages:
15    dev: # this is the stage to use when you are developing your automation locally
16      development:
17        type: local
18        bots:
19          sim_bot:
20            queue-name: dev-sim-queue
21            processes: 2
22          batch_bot:
23            queue-name: dev-batch-queue
24            processes: 1
25    test: # this is the stage to use when you are testing small batches and remote processing
26      remote_on_aws:
27        type: aws
28        parameters:
29          ImageId: ami-1a2bc3d4e5f6g7h8i
30          InstanceType: c6i.2xlarge
31          KeyName: my-key-pair
32          NetworkInterfaces: &network_interface
33            - DeviceIndex: 0
34              AssociatePublicIpAddress: true
35              SubnetId: subnet-11aa22b
36              GroupSet:
37                - sg-q9w8e7r6t5yg2g1h0
38        bots:
39          sim_bot:
40            queue-name: test-sim-queue
41            processes: 8
42          batch_bot:
43            queue-name: test-batch-queue
44            processes: 1
45
46    cluster_small: # this is a small production cluster with 1 large 128 core server
47      node_1: &cluster_node
48        type: aws
49        parameters:
50          ImageId: ami-1a2bc3d4e5f6g7h8i
51          InstanceType: c6i.32xlarge
52          KeyName: my-key-pair
53          NetworkInterfaces: *network_interface
54        bots:
55          sim_bot:
56            queue-name: prod-sim-queue
57            processes: 128
58          batch_bot:
59            queue-name: prod-batch-queue
60            processes: 1
61
62    cluster_medium: # this cluster has two large servers
63      node_1: *cluster_node
64      node_2: *cluster_node
65
66    cluster_large: # this cluster has 5 large servers
67      node_1: *cluster_node
68      node_2: *cluster_node
69      node_3: *cluster_node
70      node_4: *cluster_node
71      node_5: *cluster_node

This plan leverages anchors and aliases in YAML to simplify the plan. The anchors are defined on the highlighted lines and then re-used with aliases lower down the plan.

Calling the command:

qalx factory-build --plan full-cluster-plan.yml --stage cluster_large

Will start 5 servers in AWS with 128 cores each.

Terminating servers

You can _demolish_ a factory using the command:

qalx factory-demolish --name my-factory-name

Where my-factory-name is the name on line 3 of all the plans above. This command will gracefully terminate the bots on each server and then terminate the server. Each bot should finish the job is was doing before the server terminates. This is great when you know that your batches are all finished but if you are running batches overnight you might have some pricey servers sitting idle until someone checks that they are finished.

The plan below shows how you can leverage CloudWatch alarms to ensure that each server will terminate (and stop costing money) once it has bene idle for a certain period.

 1# remote-aws-alarm-plan.yml
 2factory:
 3  name: qalx-orcaflex-remote
 4  bots:
 5    batch_bot:
 6      source:
 7        pypi: qalx_orcaflex
 8      bot_path: qalx_orcaflex.bots:batch_bot
 9    sim_bot:
10      source:
11        pypi: qalx_orcaflex
12      bot_path: qalx_orcaflex.bots:sim_bot
13
14  stages:
15    test:
16      remote_on_aws:
17        type: aws
18        alarm: terminate_on_idle_2pc
19        parameters:
20          ImageId: ami-1a2bc3d4e5f6g7h8i
21          InstanceType: m5.12xlarge
22          # key-pair is needed if you want to connect to the remote server
23          KeyName: my-key-pair
24          NetworkInterfaces:
25            - DeviceIndex: 0
26              AssociatePublicIpAddress: true
27              # subnet needs to exist and have access to the internet
28              SubnetId: subnet-11aa22b
29              # security group needs to exist and at least allow outbound traffic
30              GroupSet:
31                - sg-q9w8e7r6t5yg2g1h0
32        bots:
33          sim_bot:
34            queue-name: test-sim-queue
35            processes: 2
36          batch_bot:
37            queue-name: test-batch-queue
38            processes: 1
39
40  alarms:
41    terminate_on_idle_2pc:
42      MetricName: CPUUtilization
43      Statistic: Average
44      Period: 60 # seconds
45      ComparisonOperator: LessThanThreshold
46      EvaluationPeriods: 10
47      Threshold: '2'
48      Namespace: AWS/EC2
49      AlarmActions:
50        - terminate

In this plan as soon as the server has a CPU utilisation below 2% for ten minutes the server will be terminated.

Attention

The CPU levels that define an “idle” state will vary based on the server size and the jobs the bots are doing. You should be careful not to define an alarm that will terminate a server that is finishing off one final long-running job. These alarms do not wait for jobs to finish before terminating the server.

Workflows

Often when a batch completes you will want to process the information in the batch to create plots or populate a report.

The plan below shows how to use the workflows a feature of factories to send jobs from batch_bot on to an plot_bot.

 1# simple-workflow-plan.yml
 2factory:
 3  name: qalx-orcaflex-local
 4  bots:
 5    batch_bot:
 6      source:
 7        pypi: qalx_orcaflex
 8      bot_path: qalx_orcaflex.bots:batch_bot
 9    sim_bot:
10      source:
11        pypi: qalx_orcaflex
12      bot_path: qalx_orcaflex.bots:sim_bot
13    plot_bot:
14      source:
15        path: examples/bots
16      bot_path: plots:plot_bot
17
18  stages:
19    test:
20      workflow: plot_flow
21      local_connection:
22        type: local
23        bots:
24          sim_bot:
25            queue-name: test-sim-queue
26            processes: 2
27          batch_bot:
28            queue-name: test-batch-queue
29            processes: 1
30          plot_bot:
31            queue-name: test-plot-queue
32            processes: 1
33
34  workflows:
35    plot_flow:
36      - batch_bot:
37          plot_bot

More details on creating custom bots can be found in Custom Bots.