Sphinx replication

Sphinx doesn’t yet support any replication for plain or RT indexes out of the box, so you have to implement something by yourself if you need to have a copy of your Sphinx data somewhere. Why you may need it:

  • you want to balance load on your servers (e.g. you can send half of all Sphinx queries to one server and the rest to another one)
  • you want to have Sphinx index backups that are immediately available once the main Sphinx storage becomes unavailable for some reason
  • you want to combine the both of the above with automatic problems detection and switching between servers (e.g. when one server crashes another one automatically starts handling all of the queries)

The easiest way to do plain indexes replication is just copy your Sphinx config to another server, index everything there and have searchd running, but what is bad in this case is that it’s difficult to provide good data synchronization level: Sphinx indexes will be rebuilt separately from the same source (e.g. your main database), they should be identical in the end, but it may happen with some latency, because some server is a bit more loaded or it has worse configuration, there may be a number of reasons.

Here’s how we can do it in another way which would provide minimal latency:

  1. make indexing only in one place (let’s call it MASTER)
  2. use rsync on SLAVEs for syncing from the MASTER
  3. tell Sphinx in all places it should start using the new rebuilt data

Here’s how it can be done in details:

1) Indexing

The challenge here is to NOT use ‘indexer –rotate’ which would make data in one place rebuilt while it’s still old in another place, but make some rotating in the end. For each Sphinx index you want to replicate, e.g.:

index idx
{
...
path      = /path/to/sphinx/indexes/idx
...
}

You need to add the inherited index with only one modification in ‘path’:

index idx_new : idx
{
path = /path/to/sphinx/indexes/idx.new
}

and when you want to reindex ‘idx’ you need to reindex ‘idx_new’ instead. Note you should use exactly ‘.new’ postfix in the new index path, because this way you can make Sphinx automatically rotate the indexes when you want it (see below).

How it works:

You make indexing:

[snikolaev@MASTER ~]$ indexer -c sphinx_simple.conf idx_new
Sphinx 0.9.10-dev (r1996)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file 'sphinx_simple.conf'...
indexing index 'idx_new'...
collected 1 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 1 docs, 28 bytes
total 0.007 sec, 3958 bytes/sec, 141.38 docs/sec
total 4 reads, 0.000 sec, 24.0 kb/call avg, 0.0 msec/call avg
total 6 writes, 0.000 sec, 4.7 kb/call avg, 0.0 msec/call avg

Here’s what you have in the dir after that:

[snikolaev@MASTER ~]$ ls -la sphinx_tmp/
total 184
drwxrwxr-x   2 snikolaev snikolaev  4096 Oct 14 07:23 .
drwx------  57 snikolaev snikolaev 16384 Oct 14 07:20 ..
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:23 idx.new.spa
-rw-r--r--   1 snikolaev snikolaev   211 Oct 14 07:23 idx.new.spd
-rw-r--r--   1 snikolaev snikolaev 28401 Oct 14 07:23 idx.new.sph
-rw-r--r--   1 snikolaev snikolaev   316 Oct 14 07:23 idx.new.spi
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:23 idx.new.spk
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:23 idx.new.spm
-rw-r--r--   1 snikolaev snikolaev     1 Oct 14 07:23 idx.new.spp
-rw-r--r--   1 snikolaev snikolaev     1 Oct 14 07:23 idx.new.sps
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:20 idx.spa
-rw-r--r--   1 snikolaev snikolaev   211 Oct 14 07:20 idx.spd
-rw-r--r--   1 snikolaev snikolaev 28401 Oct 14 07:20 idx.sph
-rw-r--r--   1 snikolaev snikolaev   316 Oct 14 07:20 idx.spi
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:20 idx.spk
-rw-------   1 snikolaev snikolaev     0 Oct 14 07:23 idx.spl
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:20 idx.spm
-rw-r--r--   1 snikolaev snikolaev     1 Oct 14 07:20 idx.spp
-rw-r--r--   1 snikolaev snikolaev     1 Oct 14 07:20 idx.sps

2) Data synchronization

There’re different ways how you can synchronize data between servers, I like rsync and here’s how it can be done with rsync:

[snikolaev@SLAVE ~]$ rsync -t --stats --exclude "*.spl" --rsh=ssh "MASTER:~/sphinx_tmp/*" ~/sphinx_tmp/ >> /tmp/sphinx_syncing 2>&1

You should exclude “spl” as .spl is a Sphinx lock file and it doesn’t make sense to copy it

You can add it to cron to run periodically. The period may be few minutes or even 1 minute, because rsync doesn’t make any hard work when there’s nothing to synchronize, i.e. when 2 dirs/files are euqal it won’t overload your servers and you don’t need to make some preliminary check whether the copies differ or not, rsync will do it ok. When you add it to cron remember that you need to prevent 2 instances of rsync from running, you can use some common locks algorithm for this.

3) Sphinx indexes rotating

Now when you have your data synchronized you need to rotate Sphinx instances in all places at the same time, here’s how it can be done if you have one SLAVE:

[snikolaev@SLAVE ~]$ test `tail -n 50 /tmp/sphinx_syncing|grep "Number of files transferred"|tail -1|awk -F: '{print $2}'|perl -p -e 's/ //g'` == '0' && (echo Rotating Sphinx instances on MASTER and SLAVE; echo MASTER; ssh MASTER "[ -f /home/snikolaev/sphinx.pid ] && cat /home/snikolaev/sphinx.pid|xargs kill -SIGHUP &"; echo SLAVE; [ -f /home/snikolaev/sphinx.pid ] && cat /home/snikolaev/sphinx.pid|xargs kill -SIGHUP)

You need to call this right after you make rsync, you can have some script for that.

It looks a bit complex, but it’s actually not: The following

test `tail -n 50 /tmp/sphinx_syncing|grep "Number of files transferred"|tail -1|awk -F: '{print $2}'|perl -p -e 's/ //g'` == '0'

checks whether we need to rotate, i.e. if we have “Number of files transferred: 0” in the end of the log rsync outputs to we don’t need to rotate otherwise we do

The following

ssh MASTER "[ -f /home/snikolaev/sphinx.pid ] && cat /home/snikolaev/sphinx.pid|xargs kill -SIGHUP"

opens ssh connection to the SLAVE, gets PID of the running Sphinx instance if it exists and sends SIGHUP signal to the process. There’s no need to use & or send the process to background by other means as Sphinx index rotating works very fast even on huge indexes.

And we do the rotating locally:

[ -f /home/snikolaev/sphinx.pid ] && cat /home/snikolaev/sphinx.pid|xargs kill -SIGHUP

And here’s what you have in the dir after the rotating:

[snikolaev@MASTER ~]$ ls -la sphinx_tmp/
total 108
drwxrwxr-x   2 snikolaev snikolaev  4096 Oct 14 07:24 .
drwx------  57 snikolaev snikolaev 16384 Oct 14 07:20 ..
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:23 idx.spa
-rw-r--r--   1 snikolaev snikolaev   211 Oct 14 07:23 idx.spd
-rw-r--r--   1 snikolaev snikolaev 28401 Oct 14 07:23 idx.sph
-rw-r--r--   1 snikolaev snikolaev   316 Oct 14 07:23 idx.spi
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:23 idx.spk
-rw-------   1 snikolaev snikolaev     0 Oct 14 07:24 idx.spl
-rw-r--r--   1 snikolaev snikolaev     0 Oct 14 07:23 idx.spm
-rw-r--r--   1 snikolaev snikolaev     1 Oct 14 07:23 idx.spp
-rw-r--r--   1 snikolaev snikolaev     1 Oct 14 07:23 idx.sps

as you can see idx.new.sp* were moved to idx.sp* by Sphinx and the old idx.sp* files were removed.

Notes:

  • if your Sphinx instance restarts eariler than expected your replication will be broken, because Sphinx automatically rotates on start. To avoid this you might want to rename the ‘*.new.*’ indexes right after reindexing and back before rotating
  • when you start Sphinx you will see the following warning:

    [snikolaev@MASTER ~]$ searchd -c sphinx_simple.conf
    Sphinx 0.9.10-dev (r1996)
    Copyright (c) 2001-2009, Andrew Aksyonoff
    
    using config file 'sphinx_simple.conf'...
    listening on 127.0.0.1:9314
    listening on 127.0.0.1:9313
    precaching index 'idx'
    precaching index 'idx_new'
    WARNING: index 'idx_new': preload: failed to open /home/snikolaev/sphinx_tmp/idx.new.sph: No such file or directory; NOT SERVING
    precached 2 indexes in 0.013 sec
    

    The warning is ok, because we don’t need to have idx_new served by Sphinx, we need this index only as a temporary storage for new data.

  • time spent for rsyncing may be big for huge indexes, but anyway data syncing on fs level is an easier task for your servers than using ‘indexer’.
  • be careful with UpdateAttributes() usage if you use this scheme of Sphinx data replication, because once Sphinx flushes your updated attributes to disk in one place the data will become different compared to the second place and it will start your automation making synchronization, you can try to do UpdateAttributes() against the both places to avoid this

11 Comments

Barry HunterJanuary 27th, 2011 at 12:22 pm

Nice writeup. I like the idea of the _new index, to automatically add the .new into the index name 🙂
I’ve previouslly just added the .new extension during the rsync processs.

One small suggestion, could/should you make the rsync exlucde the current index, ie –exclude=”idx.s*” ? (or just make it copy *.new.*)

Otherwise it will also overwrite the index files currently being served with sphinx on the slave. Which will be one rotation behind if you don’t have searchd running on the master. If you have sphinx running on master too, it probably makes little difference tho.

Sergey NikolaevJanuary 27th, 2011 at 1:31 pm

Thanks for comment, Barry. Yes, it’d be good to rsync only files that need to be copied, just in case something goes wrong. But normally it shouldn’t be a problem, because rsync won’t copy idx.s* since they will be always identical, the only moment when they can be not identical is when rotating is running, but it’s a very short period of time, some few seconds.

ChristianSeptember 23rd, 2011 at 11:08 am

Great post! That’s exactly what I was looking for. Nevertheless, are there any plans to integrate a replication feature into Sphinx Search directly?

Sergey NikolaevSeptember 23rd, 2011 at 1:16 pm

Hi Christian
As far as I know yes, Andrew mentioned this on last few conferences, but since it’s not yet in any roadmap or beta version I’m not sure how soon this feature will arrive.

FabioNovember 24th, 2011 at 3:07 pm

Is not more simple write the index file to NFS file system (remove rsync problem) ? so the searchd can read the same index or you see any problem?

Sergey NikolaevNovember 24th, 2011 at 4:23 pm

I think it won’t work, because Sphinx locks an index when it’s running (the lock is in .spl file). I.e. if you start searchd with index located in an NFS shared dir the .spl file will be created and locked and you won’t be able to start searchd using the same index on another box.
Another thing is that the NFS scheme is less reliable, because if you lose the share you lose all the data while when you have 2 copies you’re protected from this.

ColinAugust 5th, 2012 at 6:49 pm

Thank you for this write-up it has been very helpful. This is a little cleaner in my opinion…

I made the changes to sphinx.conf in step 1, and all the rest of the process is in a script that runs from cron every 30 min on the master. This removes any cron scripts that need to be on the slaves.
Also remember to note you have to add the ssh keys of root to any slaves so rsync can complete, and you have to make the changes to sphinx.conf on both master and slave servers.

#!/bin/bash
# Reindex catalogs into “_new” files that will be rotated in
/opt/sphinx/bin/indexer catalog_new -c /opt/sphinx/etc/sphinx.conf
# Rsync the “_new” files to other servers so we can rotate them all in at the same time
rsync -av –exclude “*.spl” /opt/sphinx/var/data/ catalog2.prod:/opt/sphinx/var/data/
# Rsync the “_new” files to other servers so we can rotate them all in at the same time
ssh catalog2.prod “[ -f /opt/sphinx/var/log/searchd/searchd.pid ] && cat /opt/sphinx/var/log/searchd/searchd.pid | xargs kill -SIGHUP”
# Reload searchd on mater, this will automatically rotate in the new indexes
cat /opt/sphinx/var/log/searchd/searchd.pid | xargs kill -SIGHUP

CristianNovember 1st, 2012 at 11:33 am

Replicating/load balancing a Sphinx RT the way presented is actually making the index to be non-RT, in the sense the data won’t be available instant for search.

So this solution might be ok for some sites but for “real-time” sites not…

Sergey NikolaevNovember 7th, 2012 at 6:07 am

Hello Cristian

Yes, this article is about replicating traditional indexes, not RT.

ArunNovember 24th, 2016 at 11:11 am

Very helpful Article, thanks for this.

How we can implement http://sphinxsearch.com/blog/2013/04/01/high-availability-built-in-mirroring/comment-page-1/ after 2) Data synchronization.

Still we need to rotate the indexes manually?

Sergey NikolaevNovember 28th, 2016 at 1:27 am

Hello Arun

There’s no problem with using agent mirroring if you follow the way of indexes replication I described in the article. We actually did the both together in many projects. Yes, you would still need to rotate “manually” or better to say synchronously, otherwise since agent mirroring does load balancing and splits your sphinx traffic the end user you may get fluctuating results.

Leave a comment

Your comment