Tuesday, December 19, 2017

Timing Out Waiting For An HTTP Connections

HTTP Connection Timeout


As software engineers we always run into some interesting and frustrating issues. I ran into this one just a few days ago while load testing an application.
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
The issue doesn't occur during function tests. Right around 30 to 40 minutes into load tests we start seeing this error and from that point on all request to the target service fails.

Environment/setup

  • Java 1.8
  • Dropwizard 1.1.1
  • Apache httpclient 4.5.3
  • javax.ws.rs 2.0.1

We looked at the usual suspects:
  1. Determine the size of the connection pool and increase it. 
  2. Determine the current timeout setting and increase it. 
  3. Find out what state the current connections are in and see if they can be returned to the pool

Connection Pool  size:

We don't set this value. It turns out that when the property
jersey.config.client.async.threadPoolSize
 is not set in the ClientProperties  according to the docs:
If the property is absent then thread pool used for async requests will be initialized as default cached thread pool, which creates new thread for every new request, see Executors. When a value > 0 is provided, the created cached thread pool limited to that number of threads will be utilized.
A default value is not set.
We are using the CachedThreadPool, it will continue to create new connections for each request. Naturally this can't go on forever unless connections are closed, as there are limited resources on the client and on the server.

Connection Timeout setting:

This was set to 50000 ms (50 seconds) which is very generous. Note that this is not the actual response time from the request. This timeout is just to establish the connection with the target server. There is no point in increasing this limit, we needed to locate the real cause of the problem. So this was ruled out as a possible solution.

State of the current connections

A quick check with "netstat" command below showed that there are 1023 connections
netstat -a  | grep CLOSE_WAIT | wc -l
These connections are just awaiting close as CLOSE_WAIT is a terminal state and this connection can not be used to data transfer. Now I see the problem,