Sphinx: building 1M docs index having no one real doc

Hi guys

Just want to share an interesting trick on how to easily index something with Sphinx without need of populating database with a lot of data or doing smth like that. The below is a full Sphinx config which lets you build a 1M docs index consisting of random 3-char words and one numeric attribute with a random value. All you need is just any connection to any db (in this case ‘mysql -u root’ works).

source min
{
    type = mysql
    sql_host = localhost
    sql_user = root
    sql_pass =
    sql_db = test
    sql_attr_uint = attr
    sql_query_range = select 1, 1000000
    sql_range_step = 1
    sql_query = select $start, mid(md5(rand()), 1, 3) body, floor(rand() * 100000 * $end) attr
}
index idx
{
    path = idx
    source = min
}
searchd
{
    listen = 3306
    log = sphinx.log
    pid_file = sphinx.pid
}

As you can see the tricky part is to utilize Sphinx’ directives sql_query_range and sql_range_step to let Sphinx loop until it makes 1M docs collection. The drawback is slower indexing comparing to real fetching the same amount of data from db, but come on, you’re not going to use this in production, right?

I hope you’ll find it helpful when you decide to play with Sphinx.

Leave a comment

Your comment

Notify me of followup comments via e-mail. You can also subscribe without commenting.