queue size - writes don't keep up with volume

Asked by bhardy

We have two linux servers running both carbon and carbon-relay for redundant storage of data and clustering. About 200k metrics are received each minute and after about two hours the writes to disk slows considerably. Its as if the writes cannot keep up with the volume we are receiving and thus the queue continues to increase. We are using SAN mounts for this purpose. Is there anything that would cause this issue? We feel that our carbon.conf entries are pretty generous as to creates and updates. Please advise.

Lines from the carbon.conf file:
MAX_QUEUE_SIZE = 500000
MAX_CACHE_SIZE = inf
MAX_UPDATES_PER_SECOND = 100000
MAX_CREATES_PER_MINUTE = inf

Question information

Language:
English Edit question
Status:
Solved
For:
Graphite Edit question
Assignee:
No assignee Edit question
Solved by:
bhardy
Solved:
Last query:
Last reply:
Revision history for this message
chrismd (chrismd) said :
#1

Hi Bryan. Each time an application, like carbon, tries to write data to disk it doesn't actually immediately go to disk, the kernel puts it in a buffer and it gets written to disk later on. The kernel does this for efficiency and it results in very low write latency as long as there is free memory for the kernel to allocate buffers in. The reason this is so important to carbon is because carbon depends on I/O latency being very low. If the I/O latency increases significantly, the rate at which carbon writes data drops dramatically, causing the cache to grow and carbon starts dropping datapoints (except in your case, carbon will simply crash because your MAX_CACHE_SIZE is infinite).

Carbon needs to keep writing data to disk very quickly otherwise the influx of new data will overwhelm it. What is probably happening is that your Graphite server's kernel runs out of free memory to use for buffering and when this happens carbon's writes become synchronous. Disks are obviously much slower than memory, even when on a fast SAN, and the drop in write latency (which you can monitor with the carbon.agents.<server>.avgUpdateTime metric) causes the cache to start growing. What you will probably see if you look at the history of this metric alongside the cache size, is that the avgUpdateTime will be fairly low and suddenly jump a lot higher, at the same time the cache will start growing until it hits a critical point and the app crashes or just stops working.

There are many ways to solve this problem, but in your case the easiest would be some carbon.conf changes:

1) Don't use MAX_CACHE_SIZE = inf, it just means the inevitable outcome of an I/O latency problem will be a crash. If you put a limit on the cache size (I used 10 million on machines w/24G memory) then the outcome will be sporadic dropped datapoints until the latency is back down and the cache goes below the limit.

2) MAX_UPDATES_PER_SECOND is waaaaay too high. I'll post the comment from carbon.conf.example that explains why:
# Limits the number of whisper update_many() calls per second, which effectively
# means the number of write requests sent to the disk. This is intended to
# prevent over-utilizing the disk and thus starving the rest of the system.
# When the rate of required updates exceeds this, then carbon's caching will
# take effect and increase the overall throughput accordingly.

The idea is to actually slow down the rate of write calls to avoid causing I/O cache starvation. It may sound counter-intuitive but it helps in practice. Essentially it lets you strike a balance between the use of carbon's caching mechanism vs the kernel's buffering mechanism. You should look at your updates.log to see how many updates are done per second on your systems, set your value to about 80% of that. Experiment with this to see what works for you. I use a value of 800.

3) MAX_CREATES_PER_MINUTE = inf is the culprit for your problems. Each time a new metric is received by carbon, it has to allocate a new whisper file (which can be a few megs depending on your configuration). This creation process is a big write that the kernel puts into a buffer just like every other write, except that it takes up a lot of room when you've got hundreds of new metrics and thus hundreds of new files being allocated and thus hundreds of big writes filling up the available buffers, which means there isn't any more room for buffering the updates to existing metrics, making carbon's writes synchronous, yadda yadda yadda, carbon goes boom. Here is the warning from carbon.conf:
# Softly limits the number of whisper files that get created each minute.
# Setting this value low (like at 50) is a good way to ensure your graphite
# system will not be adversely impacted when a bunch of new metrics are
# sent to it. The trade off is that it will take much longer for those metrics'
# database files to all get created and thus longer until the data becomes usable.
# Setting this value high (like "inf" for infinity) will cause graphite to create
# the files quickly but at the risk of slowing I/O down considerably for a while.

I use a value of 60. It is probably very bad of me to have a default value of 'inf' in the example config file, sorry about that :)

Note that if you get large sets of new metrics frequently it is still possible that they can overwhelm carbon because they'll drain from the cache at a slow rate of 60 metrics/minute (or whatever you set) which can cause the same problem but over a longer period of time. This is a problem I solved fairly recently and if you're running a version of carbon older than say a month or two then you probably don't have the fix applied.

In summary, I suggest adjusting your carbon.conf settings as I've described and also updating to the latest version of carbon.

Revision history for this message
bhardy (bhardy) said :
#2

I thank you very much for the thorough explanation.

Our two systems are Centos x86_64 with 8 cpus at 2327, 31g of memory, and as I mentioned SAN mounts. I did take the default values in carbon.conf as "recommended settings". I imagined our systems could handle the volume as I noticed in a former answer on lauchpad you stated you had a system handling 250k metrics/min.

After initial population we do not frequently receive a bunch of new metrics. For the first 3-4days of turning things up it will need to build ALOT of new metrics. After that it should be primarily updates. Maybe we can just turn things up slowly. This issue has been killing us so I will apply the adjustments to carbon.conf and test. I will also see how feasible it is to install the latest version. Thanks again.

Revision history for this message
bhardy (bhardy) said :
#3

Also as a follow up, is there a recommened setting for MAX_QUEUE_SIZE in relation to carbon-relay? Our two systems are running both carbon and carbon-relay for redundancy of data and we have also seen this die due to the queue being full.

Our setting for this on each server was:
CACHE_SERVERS = localhost:2004, theotherserver:2006
MAX_QUEUE_SIZE = 300000

Revision history for this message
chrismd (chrismd) said :
#4

The MAX_QUEUE_SIZE setting is only used by carbon-relay to determine how many datapoints to queue up when the carbon-cache its sending to is not receiving them quickly enough. This does not affect the performance of carbon-cache at all and your value looks fine. I only use a value of 50,000. If you've seen this buffer fill up it is because of a separate problem that is causing carbon-cache to slow down.

Revision history for this message
bhardy (bhardy) said :
#5

Thanks alot Chris. This is exactly the information I needed.
.......solved.......