Jobs
Topics
Job processing overview ↑
Job is an instance of a software application (App) that stores its configuration and state. The Jobs service is designed to run applications on any host that supports container runtimes (Docker and Singularity) as well as executable. The service integrates with the Systems, Apps, Files, and Security Kernel services to handle job processing. Before delving into the specifics of constructing a job request, we outline the overall lifecycle of a job:
- Request Authorization: The tenant, owner, and user information from the request, along with the Tapis JWT, are used to authorize access to the application, execution system, and optional archive system.
- Request Validation: The request is validated to check for missing, conflicting, or improper values. Necessary paths are assigned or created on the execution system, and macro substitution is performed to finalize all job parameters.
- Job Creation: A Tapis job object is created and stored in the database.
- Job Queuing: The job is placed in an internal queue, serviced by one or more Job Worker processes.
- Response: The initial Job object is returned to the caller, marking the end of the synchronous portion of job submission.
After the response is sent, job processing continues asynchronously. Job workers pull jobs from the queues, with the number of workers and queues only constrained by available hardware. Each job is assigned to a worker thread, which shepherds the job through its lifecycle until completion, failure, or a block caused by resource limitations.
Tutorial: Submitting a job for the scientific application on HPC cluster or AWS system ↑
In this tutorial, we will guide you through the steps of selecting an application, configuring input elements, and submitting jobs via a web-based platform. The example used here is for an Image classification application that runs on an AWS system.
Step 1: Selecting an application from the Apps menu ↑
- Navigate to the "Apps" menu: Once logged in to the system, you can view all available applications in the "Apps" menu.
- Select the desired application: For this tutorial, we will use the Image Classification app. It is an application that performs image classification using TensorFlow.
- The application will be listed under available apps. Click on it to proceed.
Step 2: Viewing the application details and configuration options ↑
- Viewing the "Image classification" App:
On the app's main page, you will see a brief summary describing the app's functionality. For this example, it describes the application as a Batch - Non-Interactive Command Line Application that uses TensorFlow for image classification.- Below the description, you will find two important buttons:
- Launch app: Use this button to submit a job.
- Configure app form: This is used to add or modify input elements that users will fill out during job submission.
- Below the description, you will find two important buttons:
Step 3: Launching the job and filling out the submission form ↑
- Access the job submission page:
Clicking either Launch app or Configure app form will take you to the job submission page. Here, you can specify the resources needed for the job as well as input specific parameters required by the application. - Job resources:
- Number of nodes: Specify how many nodes the job will use (e.g., 1).
- Cores per node: Define the number of cores per node (e.g., 1).
- Memory (MB): Specify the memory allocation (e.g., 100 MB).
- Filling out job details:
- Max runtime (minutes): This is the maximum time allowed for the job to run. In this case, it is set to 10 minutes.
- System: Select the system where the job will run. Here, it is set to the AWS system.
- Job name: Assign a unique name to the job, such as Image classification.
- Image URL: Provide the URL of the image to be classified. Example:
https://s3.amazonaws.com/cdn-origin-etr.akc.org/wp-content/uploads/2017/11/09152345/Alaskan-Malamut….
- Submit the Job:
After filling out the form, click Submit to start the job. Alternatively, you can click Save Draft to save your progress and return later.
Step 4: Monitoring job status ↑
Once the job is submitted, you can monitor its status.
- Job status monitoring:
After launching the job, you will be taken to a status page where you can see the job’s progress. The possible statuses include:- Running: The job is actively being processed.
- Staging job: The job is being prepared for execution.
- Processing inputs: The inputs provided in the submission form are being processed.
- Here, you can also view resource details such as:
- Number of nodes, cores, memory, system name, etc.
- Terminating a job:
If needed, you can terminate the job from this page by clicking Terminate Job.
- Reviewing a job output:
You can review the job output from this page by clicking the Output link.
Step 5: Cloning and re-submitting jobs ↑
- Accessing previous jobs:
From the My Jobs menu, you can view a list of all jobs that have been submitted. This list shows the job name, application used, system, submission date, and other details. - Cloning a job:
You can clone an existing job by clicking the Clone button next to the job in the list. This will open the job submission page again with all the previously filled parameters.
- Modifying cloned job parameters:
On the cloned job submission page, you can update any of the parameters. For example, you can:- Change the Job name.
- Modify other resource settings if needed.
- Submit or Save Draft:
Once the modifications are made, click Submit to run the cloned job or Save Draft to make adjustments later.