Introduction to Amazon Code Pipeline with Java part 9: the job agent continuation token
May 15, 2016 Leave a comment
Introduction
In the previous post we went through the details behind the communication between CodePipeline and the third part action. The job agent is responsible for for this communication on the side of the third party action. It must monitor a CodePipeline endpoint for new jobs and then acknowledge the job in some way, e.g. by sending a success or failure message. We also mentioned the role of AWS S3 which will store the artifact that enters the pipeline.
We’re at the point where the job agent will act upon the job details received from CP and start the actual work. There’s, however, at least one remaining implementation detail that is important to be aware of when designing this communication process. It’s called the continuation token and this is what we’ll look at in this post.
The continuation token
We’re at the very moment where CP has matched the authentication credentials that the job agent sent back. If the validation succeeds then CP will hand out the details of the job, such as the CP job ID and the list of key-value objects that hold the job configuration details. CP is expecting a signal back from the job agent to know whether the job has been executed with success or failure.
The question is when the job agent will send the response exactly. It’s natural to think that the response is sent at the very end of the job execution, right? Let’s take the example of the Apica Loadtest third party action:
The Apica Loadtest third party action will execute a load test according to the job properties in the aforementioned list of key-value objects. At first it seems natural that the job agent will execute the load test from start to finish and only then send a response back to CP. There are, however, at least three errors with this approach.
The first one is related to timeout. An Apica load test can take any time between 1 minute and several hours to finish depending on the test configuration. On the other hand CP has a timeout of 1 hour on every new job, i.e. it will wait 1 hour to get a signal back from the job agent. If it has received no response during this time frame then it automatically puts the job result to failure. Therefore if your third party action takes say 2 hours to complete then it’s bound to fail on the CP side even if your job agent manages to execute it normally.
The second error is related to feedback to the AWS user. Recall the details link on the pipeline stage?
That link won’t just appear by magic. That link is called the execution URL template and consists of a base URL and a placeholder which is up to you to fill in. The execution URL template must be submitted to AWS when you want to have your third party action listed on their site. The Apica Loadtest execution URL is https://loadtest.apicasystem.com/jobs/placeholder_for_the_Apica_job_id. Every load test at Apica will also have a job id which is a simple integer like 153245. It’s very probable that the job that your third party action is executing will also have some internal job id, e.g. a GUID or an integer. The execution URL template is given but how can the job agent supply the job ID? That’s where the continuation token enters the scene. Without this continuation token the Details link will never appear during the job execution on the CP side. Your customers will therefore never see a link where they can check the job progress. They will only see that the job is ongoing but nothing else and that’s not a very clever idea. Before we continue with this discussion let’s look the the third point in the list of errors.
The third major drawback is related to the deployability of the job agent. Imagine that 1 or more customers are executing a job using your third party action but you want to want to deploy a new version with an important bug fix. If you do that while CP is waiting for a response then those jobs will be cut, the communication process will discontinue and CP will eventually put the failure result upon those jobs when the timeout has been reached. This is not a very clever idea either.
The solution to all of the above is the continuation token that the job agent responds with to CP. The most important point to understand here is that a single job in your system will be broken up into a series of small communication steps between CP and the agent. Let’s illustrate this with the Apica load test process:
- Our job agent receives the execution details of a brand new load test job
- The details will include the AWS job ID, the list of job configuration details and a property called “continuationToken”
- At first the continuation token property will be null. That’s how our job agent knows that it is a brand new load test that at first must be initiated in our backend system
- The job agent therefore takes the steps necessary to start a new load test which will get an Apica job ID
- If the load test initialisation has been successful then our job agent responds with a success message and also fills in the continuation token field with the Apica job ID
- The response will also contain the AWS job ID by which AWS can find the job in its system, put the status to success or failure and also fill in the job id field in the execution URL template
- This is when the Details link in the stage appears on the screen of the AWS CP user
- The job agent then gets a second signal about a job
- This time, however, the job details will include a non-null continuation token
- Our job agent therefore knows that this is an on-going load test and therefore it won’t attempt to start a brand new load test but simply check the status of the on-going load test
- If the job is still being executed then the job agent will again send a success message with the same job id as the continuation token
- This process continues back and forth between CP and the job agent until the load test has been executed with some final outcome, i.e. a final success or failure
- This is when the job agent will send a final status of success or failure
After this point there will be no more jobs with the same continuation token coming from CP and the CP user will see the updated status on the screen. How does CP know whether a certain job success message is absolutely final vs. the job success message of a still on-going job? It’s indicated by the job agent using a job progress indicator from 0 to 100 where 100, i.e. 100% means a final status. The job progress is also sent in the success message along with the job ID. At this point the job progress value is not shown in CP at all, it’s only used for this purpose, i.e. to indicate that the job has really ended and CP shouldn’t send any more jobs with the same continuation token. I can imagine that in the future the CP stage will show this progress in some progress bar.
Note that a failure message from the job agent is always final. If you send a failure status in the response then the CP user will see that the job has failed. Here we can send in a string which describes the reason for the failure.
So the role of the continuation token is very significant in the execution and communication process. You must have it in your thoughts when designing the job agent.
Read the next post here.
View all posts related to Amazon Web Services and Big Data here.