November 21, 2011 ・ Sphinx
Sphinx RT-indexes memory consumption issue
The Sphinx team recently published new article about Sphinx memory consumption.
It provides the formula to estimate the memory consumption for RT indexes:
For RT-index you can estimate memory consumption by calculating the size of all on-disk chunks (minus .spd & .spp sizes as noted above) plus RAM-chunk size (rt_mem_limit)
The formula looks good, but notice that they also added size of RAM-file into it.
What is that RAM-file?
The RAM-file is used to store all necessary data for Sphinx RT-indexes.
Sphinx keeps the RAM-file in memory to support real time updates for RT-indexes.
Before the RAM-file data is copied to a general chunk file, the RAM-file should reach rt_mem_limit in size.
Each RT-index has a personal RAM file.
So, what's wrong with the RAM-file?
Imagine we have 30 indexes each 3Gb in size.
If we want to keep number of index-chunks low, i.e. less than 5, we need to set rt_mem_limit to 1Gb.
In this case we will have 3 chunks for each index.
Now let's estimate how much memory we need to support this configuration.
For 30 indexes we will have 30 RAM-files multiplied by 1Gb (rt_mem_limit) which will make 30Gb.
30Gb of free memory is required to support our configuration and that's without counting .spa and .spi files.
Of course in a real system with random data distribution Sphinx will consume much less memory, probably 1.5 times less.
Ok, lets decrease rt_mem_limit 10 times to 100Mb.
In this case we will get 3Gb memory required, but the number of chunks for each index will grow from
3 to 30 (overall it is 900 chunks for 30 indexes).
Now let's imagine how fast Sphinx will query each of 30 or more chunks.
It will be very slow, because of many disk I/O operations, especially if we query more than one index at a time!
Low rt_mem_limit is good for memory, but hurts Sphinx performance.
High rt_mem_limit could gives good performance, but requires a lot of free memory.
The Sphinx team definitely needs to optimize this feature, i.e. add a new option to control RAM-file size (i.e. rt_RAM_limit).
rt_RAM_limit in conjunction with rt_mem_limit could give more flexibility to better setup the massive number of RT-indexes
on an average server with 16Gb of memory on board.
Another option is to exclude .spp and .spd data from RAM-file, so it keeps only those things in memory that are supposed to be there - .spa and .spi.