Should you switch to Sphinx real time indexes?

Regular indexes problem

The main inconvenience of regular indexes is their update speed. In order to update one you should entirely rebuild it.

For large amounts of data we usually use main+delta indexes. The main contains the most of the data, and delta — only recent changes. So to keep whole index up-to-date we should rebuild delta index every 3-5 minutes. But the larger delta grows the longer it takes to rebuild it. That’s why we recommend to flush delta into main every day or week depending on your data growth rate.

But this approach has a couple of drawbacks:

  • high average load of the system due to frequent index rebuilding;
  • fresh data will be indexed only in several minutes in worst case.

Real-time indexes

Sphinx 1.10 introduces real-time index support. The main idea of it is the ability to insert and update index records on-the-fly. RT indexes are compatible with MySQL protocol which allows us to use existing MySQL client apps for work with them using SELECT, DELETE, INSERT and REPLACE operators.

Currently there are some performance issues with real-time indexes on large data sets. But for smaller ones (say, up to 500.000 Wikipedia documents) they show comparable to regular indexes speed.

So real-time indexes performance and simplicity makes them a preferable choice for storing relatively small but frequently changing data index.

Conclusion

Real-time indexes can be used as a replacement of main+delta regular index bundle. They can reduce server workload and simplify index updating routine.

Also we can make use of mixed indexes to plug in real-time indexes to the existing app.

Here’s the example of mixed index:

index distributed
{
type = distributed
local = plain_main_index
local = real_time_increment_index
}

In this example we connect to both regular and real-time indexes using one distributed index. This way migration to real-time indexes can be performed seamlessly without significant modifications of production system.

Good luck and have fun with real-time indexes.

2 Comments

Shankar KrishOctober 25th, 2012 at 4:15 pm

Hello,
This comment is more than 2 years old. You have commented “Currently there are some performance issues with real-time indexes on large data sets.”
Would it be a fair assumption that the current version has addressed these performance issues.
Looking at the forum on Sphinx, i have not found any posting indicating performance issues with the real time indexes in the recent versions.
Would appreciate your thoughts/comments on the performance aspect of RT indexes.

Thanks & Regards
Shankar Krish

Sergey NikolaevOctober 26th, 2012 at 3:30 am

Hello Shankar

Please read this our post http://www.ivinco.com/blog/sphinx-in-action-good-and-bad-in-sphinx-real-time-indexes/. The RT indexes can actually perform just as good as traditional indexes, it’s all about high enough rt_mem_limit value, but the cost of the good RT indexes performance level is possible excessive RAM consuming.

Leave a comment

Your comment

Notify me of followup comments via e-mail. You can also subscribe without commenting.