Docker image pull timeout in OpenShift during build or deploy

The default time that OpenShift waits to pull an image during a build or deploy may not be long enough, especially if your images are getting quite large or the network is slow. In those cases you may see a message along the lines of:

Failed to pull image "your image": rpc error: code = 2 desc = net/http: request canceled

There is a thread that talks about this issue. The fix is quite simple but you need to be aware that it requires a service restart on all your application nodes. Possibly your infra nodes as well if you are experiencing the issue there as well.

On each node, update the following file: /etc/origin/node/node-config.yaml.

In this file, find the kubeletArguments section and specify a higher timeout value for image pulls. Example:

kubeletArguments:
  image-pull-progress-deadline:
  - "10m"

After that, restart the node service. Depending on you install, it’s one of the following options:

$ systemctl restart origin-node
$ systemctl restart atomic-openshift-node

Be aware of doing this on live systems as your pods may be affected during the service restart.

Alternatively, you can add this argument in your Ansible hosts file when you are deploying your platform. While theoretically you can re-run the playbook against an existing deployment, in reality I have experienced that some updates such as configs don’t always seem to work well. Your mileage may vary.

[OSEv3:vars]
openshift_node_kubelet_args={'pods-per-core': ['10'], 'max-pods': ['100'], 'resolv-conf': ['/etc/resolv.conf'], 'image-pull-progress-deadline': ['10m'] }

Leave a Reply

Your email address will not be published. Required fields are marked *

*