It’s raised by the .execute_async line of the sampler.
It doesn’t seem to be related to the status of the simulator (or QPU), it happens if they’re “available”, as well as if they’re “in deployment”. Does anyone know what might be the problem? And more importantly, how to avoid it? I guess it could be something like a try/except with a while loop that comes back to the same iteration again if the exception is raise?
I’d prefer to fix the root of the problem though, rather than mitigating it.
I’m asking because if I’m intending to send around ~200 jobs, I wouldn’t want my python code to break like this in the middle of them.
Thanks for raising this issue. Our Cloud team is aware of the timeouts you experience (which occur pretty often these days as we’ve got a higher load of requests to manage) and already working on a fix.
When such a timeout occurs, the corresponding job is still created on the Cloud and run by the target platform you chose. It can be found on the Cloud website GUI, where you can copy/paste the Job ID and load the results back in you Perceval scripts.
As long as the timeout issue happen, there’s very few we can do from the Perceval client to improve the situation.
A (very imperfect) workaround would be to allow for a longer waiting time when sending a request to the Cloud. This would probably hinder the overall performance of your global script, but it should at least avoid the crashes:
... # your setup code
rp = pcvl.RemoteProcessor("my:platform")
rp._rpc_handler.request_timeout = 30 # Set the max waiting time for every https request to 30s
... # your experiment code
Thanks for your reply.
I guess the while loop won’t work then (or will cause my job to run multiple times). Although the only thing that means is that I have to pay for it a few times rather than just once.
Getting the job ID manually doesn’t seem like a doable solution if I submit 100s of jobs
I might try and set the waiting time longer for now and use the while loop, hopefully there’ll be a fix from your side some time soon