Sergey Nikolaev
February 12, 2013 ・ Sphinx
Sphinx: building 1M docs index having no one real doc
Hi guys
Just want to share an interesting trick on how to easily index something with Sphinx without need of populating database with a lot of data or doing smth like that. The below is a full Sphinx config which lets you build a 1M docs index consisting of random 3-char words and one numeric attribute with a random value. All you need is just any connection to any db (in this case 'mysql -u root' works).
source min
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = test
sql_attr_uint = attr
sql_query_range = select 1, 1000000
sql_range_step = 1
sql_query = select $start, mid(md5(rand()), 1, 3) body, floor(rand() * 100000 * $end) attr
}
index idx
{
path = idx
source = min
}
searchd
{
listen = 3306
log = sphinx.log
pid_file = sphinx.pid
}
As you can see the tricky part is to utilize Sphinx' directives sql_query_range and sql_range_step to let Sphinx loop until it makes 1M docs collection. The drawback is slower indexing comparing to real fetching the same amount of data from db, but come on, you're not going to use this in production, right?
I hope you'll find it helpful when you decide to play with Sphinx.
- Sphinx
- Code