Sphinx in Action: Did you mean … ?

This is a post from a series of posts called “Sphinx in Action”. See the whole series by clicking here.

Many of our customers want to incorporate “did you mean … ?” functionality into their applications. How it works is that when a typo is made in the query, a corrected version of the typo is suggested:

This can be done using a special technique with Sphinx that has been successfully used in many different projects.

There’s a demo of this technique in misc/suggest/ dir in the Sphinx source and an article (in Russian) where Andrew Aksyonoff describes the main idea behind this. The following is based on his original idea: Read the rest of this entry »

Sphinx in Action: Good and bad in Sphinx real time indexes

This is a post from a series of posts called “Sphinx in Action”. See the whole series by clicking here.

Sphinx has supported real time indexes since version 1.10 was released. Ever since, they have been getting more stable and robust, and now it is ready for use in production. Many people like this because it’s really simple to understand (i.e. no indexing, crontasks, main + delta schemes, and so on), but anyone who wants to use them should also be aware of the drawbacks of this when comparing it to traditional monolithic indexes.

Read the rest of this entry »

Meet Ivinco at Sphinx Search Day 2012!

The Sphinx team has just announced The Sphinx Search Day, which will take place in Santa Clara, California on April 13 -  just after the MySQL Conference And Expo.

This is first Sphinx event in USA and a great opportunity for Sphinx users in US to get together and meet each other and the Sphinx team. As the organizers say:

The aim of the Sphinx Search Day 2012 is to provide a technical forum to educate those who are new to Sphinx and drive innovation for long-time users. The majority of the talks will be technical in nature and more importantly delivered by real-world users and Sphinx community members.

I am also very glad that Ivinco was invited to this event. In our talk we will share our experience in building Sphinx systems from small projects to multi-terabyte search engines.

This is a free event – if you are around the Silicon Valley in April – go ahead and register. We look forward to meeting you at the Sphinx Search Day!

SphinxSearch kill list feature

In using the SphinxSearch sql_query_killlist I am faced with a problem – the kill list doesn’t always work as expected for several distributed indexes.
I found the cause of this problem in the order of the physical indexes within the distributed index.

Lets say we have a main index which contains the majority of the data and updates once a day. And we have delta index which contains the latest changes and updates every 3 minutes.

In order to ignore the old records in the main index we need to add a kill list into the delta index. The kill list should select the IDs which you want to remove from the delta index.

The problem is that the kill list will take effect only if the delta index is placed after the main index, so in a nutshell the kill list removes all IDs from all preceding indexes.

The correct index definition using a kill list should look like this:

index dist
{
   type = distributed
   local = main
   local = delta
}

Group by MVA in SphinxQL

This is a just a quick note I’d like to share since this might be confusing how one should do “group by” mva attribute in SphinxQL. Starting v 2.0.1-beta there’s a compat_sphinxql_magics directive which makes SphinxQL similar to standard SQL although there’re still few things that differ especially when it comes to MVA which is a bit unusual thing for standard SQL. Anyway back to the problem: if you have MVA attribute ‘tags’ with some integers and want to group by ‘tags’ your first intuitively written command will be probably like this:

mysql> select tags, count(*) c from idx where match('word') group by tags order by c desc limit 10;
+--------------+------+
| tags         | c    |
+--------------+------+
| 210,348      |  366 |
| 204          |  116 |
| 206          |   73 |
| 132,348      |   71 |
| 210,348      |   40 |
| 29           |   36 |
| 25,29,270    |   30 |
| 208          |   28 |
| 180          |   24 |
| 25,348       |   23 |
+--------------+------+
10 rows in set (0.00 sec)

But as you can see the first column contains few values instead of one that you want. To see the needed value used by Sphinx for grouping you need to use special @groupby word:

mysql> select @groupby, tags, count(*) c from idx where match('word') group by tags order by c desc limit 10;
+----------+--------------+------+
| @groupby | tags         | c    |
+----------+--------------+------+
|      348 | 210,348      |  366 |
|      204 | 204          |  116 |
|      206 | 206          |   73 |
|      132 | 132,348      |   71 |
|      210 | 210,348      |   40 |
|       29 | 29           |   36 |
|      270 | 25,29,270    |   30 |
|      208 | 208          |   28 |
|      180 | 180          |   24 |
|       25 | 25,348       |   23 |
+----------+--------------+------+
10 rows in set (0.00 sec)

Now it’s clear that the ‘tags’ was not changed during grouping at all, this is just what you have in one record our of the group while the value which the dataset was grouped by can be seen in the @groupby column.
Hope this will help someone to save some time when he faces this situation.