<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ivinco</title>
	<atom:link href="http://www.ivinco.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ivinco.com</link>
	<description>Advanced Web Development Services</description>
	<lastBuildDate>Tue, 31 Jan 2012 11:13:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Group by MVA in SphinxQL</title>
		<link>http://www.ivinco.com/blog/group-by-mva-in-sphinxql/</link>
		<comments>http://www.ivinco.com/blog/group-by-mva-in-sphinxql/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 11:13:16 +0000</pubDate>
		<dc:creator>Sergey Nikolaev</dc:creator>
				<category><![CDATA[Sphinx search engine]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[GROUP BY]]></category>
		<category><![CDATA[MVA]]></category>
		<category><![CDATA[SphinxQL]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=1020</guid>
		<description><![CDATA[This is a just a quick note I&#8217;d like to share since this might be confusing how one should do &#8220;group by&#8221; mva attribute in SphinxQL. Starting v 2.0.1-beta there&#8217;s a compat_sphinxql_magics directive which makes SphinxQL similar to standard SQL although there&#8217;re still few things that differ especially when it comes to MVA which is [...]]]></description>
			<content:encoded><![CDATA[<p>This is a just a quick note I&#8217;d like to share since this might be confusing how one should do &#8220;group by&#8221; mva attribute in SphinxQL. Starting v 2.0.1-beta there&#8217;s a <a href="http://sphinxsearch.com/docs/current.html#conf-compat-sphinxql-magics">compat_sphinxql_magics</a> directive which makes SphinxQL similar to standard SQL although there&#8217;re still few things that differ especially when it comes to MVA which is a bit unusual thing for standard SQL. Anyway back to the problem: if you have MVA attribute &#8216;tags&#8217; with some integers and want to group by &#8216;tags&#8217; your first intuitively written command will be probably like this:</p>
<pre class="brush:bash">
mysql> select tags, count(*) c from idx where match('word') group by tags order by c desc limit 10;
+--------------+------+
| tags         | c    |
+--------------+------+
| 210,348      |  366 |
| 204          |  116 |
| 206          |   73 |
| 132,348      |   71 |
| 210,348      |   40 |
| 29           |   36 |
| 25,29,270    |   30 |
| 208          |   28 |
| 180          |   24 |
| 25,348       |   23 |
+--------------+------+
10 rows in set (0.00 sec)
</pre>
<p>But as you can see the first column contains few values instead of one that you want. To see the needed value used by Sphinx for grouping you need to use special @groupby word:</p>
<pre class="brush:bash">
mysql> select @groupby, tags, count(*) c from idx where match('word') group by tags order by c desc limit 10;
+----------+--------------+------+
| @groupby | tags         | c    |
+----------+--------------+------+
|      348 | 210,348      |  366 |
|      204 | 204          |  116 |
|      206 | 206          |   73 |
|      132 | 132,348      |   71 |
|      210 | 210,348      |   40 |
|       29 | 29           |   36 |
|      270 | 25,29,270    |   30 |
|      208 | 208          |   28 |
|      180 | 180          |   24 |
|       25 | 25,348       |   23 |
+----------+--------------+------+
10 rows in set (0.00 sec)
</pre>
<p>Now it&#8217;s clear that the &#8216;tags&#8217; was not changed during grouping at all, this is just what you have in one record our of the group while the value which the dataset was grouped by can be seen in the @groupby column.<br />
Hope this will help someone to save some time when he faces this situation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/group-by-mva-in-sphinxql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Company meeting in Turkey</title>
		<link>http://www.ivinco.com/blog/company-meeting-in-turkey/</link>
		<comments>http://www.ivinco.com/blog/company-meeting-in-turkey/#comments</comments>
		<pubDate>Fri, 09 Dec 2011 09:23:28 +0000</pubDate>
		<dc:creator>Sergey Nikolaev</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[meeting]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=970</guid>
		<description><![CDATA[Ivinco is a virtual company &#8211; we have no office, all our employees work from their homes. Being virtual allows us to keep our company costs low and it also allows us to hire talents all around the world. However, it&#8217;s very important for people who work together to meet in person regularly. So every [...]]]></description>
			<content:encoded><![CDATA[<p>Ivinco is a virtual company &#8211; we have no office, all our employees work from their homes. Being virtual allows us to keep our company costs low and it also allows us to hire talents all around the world. However, it&#8217;s very important for people who work together to meet in person regularly. So every year we organize company meetings in cool places where our staff can meet and spend some time together to work and have fun.</p>
<p>This year we went to Turkey, to a beautiful 5 star all-inclusive hotel on the cost of Mediterranean Sea. When we arrived Turkey met us with a near-hurricane weather, which was a bit dissapointing (but allowed us to concentrate on our work plans better <img src='http://www.ivinco.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ), but after a few days it was all nice and sunny and we were able to relax on the beach and took one day-off our schedule for a boat trip near the old Alanya city with breathtaking nature.</p>
<p><img src="http://www.ivinco.com/wp-content/themes/ivincowp/img/turkey_smaller.jpg" alt="Ivinco company photo" /><br />
<span id="more-970"></span><br />
Turkey was a great place to meet &#8211; we had a great relaxation and very good conditions for our group work sessions and presentations &#8211; hotel provided us a conference room with internet and projector.</p>
<p>In 2010 and 2009 we had company meetings in Moscow where we attended <a href="http://www.highload.ru/2011/news/13360.html">HighLoad</a> and <a href="http://www.ivinco.com/blog/talking-at-sphinx-user-conference-2010/">Sphinx User</a> conferences where we gave a few talks about using Sphinx search engine.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/company-meeting-in-turkey/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converting Sphinx original indexes to real-time indexes</title>
		<link>http://www.ivinco.com/blog/converting-sphinx-original-indexes-to-real-time-indexes/</link>
		<comments>http://www.ivinco.com/blog/converting-sphinx-original-indexes-to-real-time-indexes/#comments</comments>
		<pubDate>Fri, 02 Dec 2011 13:32:28 +0000</pubDate>
		<dc:creator>Yaroslav Vorozhko</dc:creator>
				<category><![CDATA[Sphinx search engine]]></category>
		<category><![CDATA[Sphinx Search]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=996</guid>
		<description><![CDATA[If your are using Sphinx server with many indexes and you decided to move to real-time indexes then this article is for your.
I will describe how to simply convert big about of original indexes to real-time using indexer and new Sphinx command ATTACH INDEX.]]></description>
			<content:encoded><![CDATA[<p>If you are using the Sphinx server with many indexes and you decide to move to real-time indexes then this article is for you. I will describe how to simply convert a large number of original indexes to real-time using indexer and the new Sphinx command ATTACH INDEX.</p>
<p>Before Sphinx version 2.0.2-beta was available, the only way to update real-time (RT) indexes was to use the SphinxQL commands (INSERT/DELETE/REPLACE) which needed to be executed through the MySQL protocol to update RT indexes. At the same time, the original indexes used the indexer tool, which made updating indexes very simple. However, this kind of tool doesn’t exist for real-time indexes.</p>
<p>The only solution was to write a <a href="http://www.ivinco.com/blog/indexer-for-real-time-indexes/">custom script</a>.<br />
Why isn’t a custom script good?</p>
<p>If we compare a custom script to the indexer: indexer was written in C++, it was very well optimized, and tested by many users on different amounts of data. If you want to write a custom script, do so using the language you know best, probably PHP/Ruby/Python, you will need to take care of:</p>
<ul>
<li>long term stability </li>
<li>memory leaks</li>
<li>data preparation and escaping</li>
<li>performance optimization, nobody would wait a month to index a terabyte of data</li>
</ul>
<p>Furthermore, this kind of script will load searchd which causes RT indexes to use a massive amount of memory. For example, 300 indexes with 100mb rt_mem_limit could require up to 30GB of free RAM.</p>
<p>Good news! Sphinx version 2.0.2-beta has added a new command which allows you to convert original indexes into real-time indexes. This command looks like: ATTACH INDEX diskindex TO RTINDEX rtindex.</p>
<blockquote><p>After a successful ATTACH, the data originally stored in the source disk index becomes a part of the target RT index, and the source disk index becomes unavailable (until the next rebuild). ATTACH does not result in any index data changes. Basically, it just renames the files (making the source index a new disk chunk of the target RT index), and updates the metadata. So it is a generally quick operation which might (frequently) complete as fast as under a second.
</p></blockquote>
<p>So, now using the MySQL protocol you can convert the original index into a real-time one. In order to achieve this, you could write custom script which will execute the attach command for each index using MySQL client, but I found a better way.</p>
<p>I suggest solving this task in a more traditional way using the indexer tool.<br />
To keep the example simple, I will show you how to convert just one index, you can use the code below to extend your sphinx.conf.<br />
<span id="more-996"></span><br />
Some terminology I use in the sphinx.conf:</p>
<ul>
<li>orig &#8211; original type of index</li>
<li>attach &#8211; special empty index which will executive attach command</li>
<li>rtinex &#8211; real-time type of index</li>
</ul>
<p>Before you start the real-time index should be empty and of course the original index should contain some data. The sphinx.conf structure should be as follows:</p>
<pre class="brush:bash">
source orig
{
    type            = mysql

    sql_host        = localhost
    sql_user        = root
    sql_pass        =
    sql_db          = test
    sql_port        = 3306  # optional, default is 3306

    sql_query       = \
        SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \
        FROM documents

    sql_attr_uint       = group_id
    sql_attr_timestamp  = date_added

    sql_query_info      = SELECT * FROM documents WHERE id=$id
}

index orig
{
    source          = orig
    path            = idx/orig
    docinfo         = extern
    charset_type        = sbcs
}

index rtindex
{
    type            = rt
    rt_mem_limit        = 32M

    path            = idx/rtindex
    charset_type        = utf-8

    rt_field        = title
    rt_field        = content
    rt_attr_uint        = group_id
    rt_attr_timestamp = date_added
}

source attach
{
    type            = mysql

    sql_host        = 127.0.0.1
    sql_user        =
    sql_pass        =
    sql_db          =
    sql_port        = 9306  # optional, default is 3306

    sql_query       = select 1 from testrt
    sql_query_post = ATTACH INDEX orig TO RTINDEX rtindex
}

index attach
{
    source          = attach
    path            = idx/attach
    docinfo         = extern
    charset_type        = sbcs
}
</pre>
<p>Notice that &#8216;attach&#8217; index is connected to searchd not to mysql and<br />
using sql_query_post executes the attach index command.</p>
<p>The configuration is ready, lets start converting:</p>
<p>Start searchd:</p>
<pre brush="bash">
./bin/searchd  -c ./etc/sphinx.conf
</pre>
<p>Ignore the warning about the ‘attach’ index being empty.</p>
<p>I don’t have any data at &#8216;orig&#8217; index, so I indexed it before running the transformation:<br />
Skip this command if you have data in your &#8216;orig&#8217; index:</p>
<pre brush="bash">
./bin/indexer -c ./etc/sphinx.conf orig --rotate
</pre>
<p>Time to convert:</p>
<pre brush="bash">
 ./bin/indexer -c ./etc/sphinx.conf attach
</pre>
<p>Pretty simple.</p>
<p>Let&#8217;s now check RT:</p>
<pre brush="bash">
mysql -P9306 -h127.0.0.1
mysql> select * from rtindex;
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |      1 |        1 | 1322419937 |
|    2 |      1 |        1 | 1322419937 |
|    3 |      1 |        2 | 1322419937 |
|    4 |      1 |        2 | 1322419937 |
+------+--------+----------+------------+
</pre>
<p>Yes, the data is in place, so the index was converted successfully!</p>
<p>Conclusion<br />
The big improvement is that indexing and converting can now be done using the same indexer tool. One thing you need to be sure of is to extend the sphinx.conf generator script with the  &#8216;attach&#8217; and &#8216;rtindex&#8217; indexes.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/converting-sphinx-original-indexes-to-real-time-indexes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Email me mysql/sphinx query result</title>
		<link>http://www.ivinco.com/blog/email-me-mysqlsphinx-query-result/</link>
		<comments>http://www.ivinco.com/blog/email-me-mysqlsphinx-query-result/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 06:14:52 +0000</pubDate>
		<dc:creator>Sergey Nikolaev</dc:creator>
				<category><![CDATA[Tips]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=989</guid>
		<description><![CDATA[Hi. Here&#8217;s just a simple trick which can be useful if you run some long-lasting mysql/sphinx query in a screen or whatever and want to get informed as soon as it&#8217;s finished. This works on mysql client level and applicable to MySQL and SphinxQL: The trick is to use &#8220;pager&#8221; directive to redirect the output [...]]]></description>
			<content:encoded><![CDATA[<p>Hi. Here&#8217;s just a simple trick which can be useful if you run some long-lasting mysql/sphinx query in a screen or whatever and want to get informed as soon as it&#8217;s finished. This works on mysql client level and applicable to MySQL and SphinxQL:</p>
<p>The trick is to use &#8220;pager&#8221; directive to redirect the output to a mail program:</p>
<pre class="brush:bash">
mysql> pager mail -s "subject" yourname@yourdomain.com
PAGER set to 'mail -s "subject" yourname@yourdomain.com'
</pre>
<p>The above will redirect the output to yourname@yourdomain.com and the subject will be &#8220;subject&#8221;.</p>
<p>Then just start your long-lasting query to MySQL:</p>
<pre class="brush:bash">
mysql> select count(*) from feed where ext_key like 'a%';
1 row in set (10 min 35.88 sec)
</pre>
<p>or SphinxQL:</p>
<pre class="brush:bash">
mysql> select * from huge_index;
20 rows in set (1 min 14.00 sec)
</pre>
<p>and be informed via email.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/email-me-mysqlsphinx-query-result/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sphinx RT-indexes memory consumption issue</title>
		<link>http://www.ivinco.com/blog/sphinx-rt-indexes-memory-consumption-issue/</link>
		<comments>http://www.ivinco.com/blog/sphinx-rt-indexes-memory-consumption-issue/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 12:07:12 +0000</pubDate>
		<dc:creator>Yaroslav Vorozhko</dc:creator>
				<category><![CDATA[Performance optimization]]></category>
		<category><![CDATA[Sphinx search engine]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Sphinx Search]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=974</guid>
		<description><![CDATA[The Sphinx team recently published new article about <a href="http://sphinxsearch.com/blog/2011/11/11/sphinx-memory-consumption/">Sphinx memory consumption</a>.
It provides the formula to estimate the memory consumption for RT indexes:

<blockquote>For RT-index you can estimate memory consumption by calculating the size of all on-disk chunks (minus .spd &#038; .spp sizes as noted above) plus RAM-chunk size (rt_mem_limit)</blockquote>

The formula looks good, but notice that they also added size of RAM-file into it.


What is that RAM-file?]]></description>
			<content:encoded><![CDATA[<p>The Sphinx team recently published new article about <a href="http://sphinxsearch.com/blog/2011/11/11/sphinx-memory-consumption/">Sphinx memory consumption</a>.<br />
It provides the formula to estimate the memory consumption for RT indexes:</p>
<blockquote><p>For RT-index you can estimate memory consumption by calculating the size of all on-disk chunks (minus .spd &#038; .spp sizes as noted above) plus RAM-chunk size (rt_mem_limit)</p></blockquote>
<p>The formula looks good, but notice that they also added size of RAM-file into it.</p>
<p>What is that RAM-file?</p>
<p>The RAM-file is used to store all necessary data for Sphinx RT-indexes.<br />
Sphinx keeps the RAM-file in memory to support real time updates for RT-indexes.<br />
Before the RAM-file data is copied to a general chunk file, the RAM-file should reach rt_mem_limit in size.<br />
Each RT-index has a personal RAM file.</p>
<p>So, what&#8217;s wrong with the RAM-file?</p>
<p>Imagine we have 30 indexes each 3Gb in size.<br />
If we want to keep number of index-chunks low, i.e. less than 5, we need to set rt_mem_limit to 1Gb.<br />
In this case we will have 3 chunks for each index.<br />
Now let&#8217;s estimate how much memory we need to  support this configuration.<br />
For 30 indexes we will have 30 RAM-files multiplied by 1Gb (rt_mem_limit) which will make 30Gb.<br />
30Gb of free memory is required to support our configuration and that&#8217;s without counting .spa and .spi files.<br />
Of course in a real system with random data distribution Sphinx will consume much less memory, probably 1.5 times less.</p>
<p>Ok, lets decrease rt_mem_limit 10 times to 100Mb.<br />
In this case we will get 3Gb memory required, but the number of chunks for each index will grow from<br />
3 to 30 (overall it is 900 chunks for 30 indexes).<br />
Now let&#8217;s imagine how fast Sphinx will query each of 30 or more chunks.<br />
It will be very slow, because of many disk I/O operations, especially if we query more than one index at a time!</p>
<p>Conclusion<br />
Low rt_mem_limit is good for memory, but hurts Sphinx performance.<br />
High rt_mem_limit could gives good performance, but requires a lot of free memory.</p>
<p>The Sphinx team definitely needs to optimize this feature, i.e. add a new option to control RAM-file size (i.e. rt_RAM_limit).<br />
rt_RAM_limit in conjunction with rt_mem_limit could give more flexibility to better setup the massive number of RT-indexes<br />
on an average server with 16Gb of memory on board.<br />
Another option is to exclude .spp and .spd data from RAM-file, so it keeps only those things in memory that are supposed to be there &#8211; .spa and .spi.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/sphinx-rt-indexes-memory-consumption-issue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interesting thing about BM25 in Sphinx Search</title>
		<link>http://www.ivinco.com/blog/interesting-thing-about-bm25-in-sphinx-search/</link>
		<comments>http://www.ivinco.com/blog/interesting-thing-about-bm25-in-sphinx-search/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 07:44:31 +0000</pubDate>
		<dc:creator>Yaroslav Vorozhko</dc:creator>
				<category><![CDATA[Sphinx search engine]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Sphinx Search]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=962</guid>
		<description><![CDATA[I recently faced strange behaviour of Sphinx Search. As I investigated, the problem turned out to be in the default ranking mode SPH_RANK_PROXIMITY_BM25 which is using the BM25 algorithm. Update: according to the latest documentation SPH_RANK_SPH04 solves the problem. Anyway I suggest to read the article, it was interesting investigation. Here&#8217;s the quote from Sphinx [...]]]></description>
			<content:encoded><![CDATA[<p>I recently faced strange behaviour of Sphinx Search. As I investigated, the problem turned out to be in the default ranking mode SPH_RANK_PROXIMITY_BM25 which is using the BM25 algorithm.</p>
<blockquote><p>Update: according to the latest documentation <a href="http://sphinxsearch.com/docs/current.html#weighting">SPH_RANK_SPH04</a> solves the problem. Anyway I suggest to read the article, it was interesting investigation.</p></blockquote>
<p>Here&#8217;s the quote from Sphinx documentation: </p>
<blockquote><p>Statistical rank is based on classic BM25 function which only takes word frequencies into account. If the word is rare in the whole database (ie. low frequency over document collection) or mentioned a lot in specific document (ie. high frequency over matching document), it receives more weight. Final BM25 weight is a floating point number between 0 and 1.</p></blockquote>
<p>So it states that a word which is mentioned a lot receives more weight. But it&#8217;s not always true.<br />
Let&#8217;s say we have the following table with some words repeated in every row: </p>
<pre class="brush:sql">
mysql>select * from titles;

| id | title |

| 1 | Test Category - Test Article |
| 2 | Test CategoryA |
| 3 | Test CategoryB |
</pre>
<p>Please notice that &#8220;Test&#8221; occurs two times in document #1 and only one time in documents #2 and #3.<br />
So we can expect that a search for the word &#8220;Test&#8221; will match #1 as the first result.</p>
<p>Let&#8217;s test it using SphinxQL (Sphinx query language):</p>
<pre class="brush:sql">
mysql> select * from test1 where match('Test') ;

| id | weight |

| 2 | 1319 |
| 3 | 1319 |
| 1 | 1252 |
</pre>
<p>Hmm, something&#8217;s wrong here, document #1 got lower weight than documents #2 and #3.</p>
<p>As it was investigated the key problem is that the word &#8220;Test&#8221; occurs in each document. See the quote from Andrey Aksenoff (creator of Sphinx Search):</p>
<blockquote><p>This is actually by BM25 design. It penalizes (!) the keywords that are overly frequent, ie. occur in more than 50% of the collection documents.</p></blockquote>
<p>In my case the word &#8220;Test&#8221; occurred in 100% of documents and that&#8217;s why was penalized.</p>
<p>A possible workaround of this can be to use another ranking mode which doesn&#8217;t use BM25, i.e. SPH_RANK_WORDCOUNT or SPH_RANK_MATCHANY:</p>
<p><strong>SPH_RANK_WORDCOUNT</strong>:</p>
<pre class="brush:sql">
mysql> select * from test1 where match('Test') option ranker=wordcount;

| id   | weight |

|    1 |      2 |
|    2 |      1 |
|    3 |      1 |
</pre>
<p><strong>SPH_RANK_MATCHANY</strong>:</p>
<pre class="brush:sql">
mysql> select * from test1 where match('Test') option ranker=MATCHANY;

| id   | weight |

|    1 |      1 |
|    2 |      1 |
|    3 |      1 |
</pre>
<p>It looks like SPH_RANK_WORDCOUNT solves our problem.<br />
In some cases it can be a solution, but in general it can&#8217;t, because for 99% of queries we might want to use as smart ranking algorithm as default SPH_RANK_PROXIMITY_BM25 ranking mode.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/interesting-thing-about-bm25-in-sphinx-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>We are speaking at Percona Live London, Oct 24-25</title>
		<link>http://www.ivinco.com/blog/we-are-speaking-at-percona-live-london-oct-24-25/</link>
		<comments>http://www.ivinco.com/blog/we-are-speaking-at-percona-live-london-oct-24-25/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 10:22:49 +0000</pubDate>
		<dc:creator>Mindaugas</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=948</guid>
		<description><![CDATA[We’re thrilled that we will be speaking at Percona Live conference. The conference takes place in London, UK, on October 24-25: Percona Live is an intensive two-days MySQL summit. There are many tracks of expert speakers, including Percona consultants and hand-picked guests. The sessions are 100% technical—even the sponsored sessions. It will be an honor [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.percona.com/live/london-2011/"><img class="alignright" title="Discover the Power of MySQL" src="http://www.percona.com/static/images/percona-live/London2011/promote/PL_Badge_Large_Speaker.jpg" alt="Percona Live MySQL Conference, London, Oct 24th and 25th, 2011" width="118" height="239" /></a>We’re thrilled that we will be speaking at <a href="http://www.percona.com/live/london-2011/">Percona Live conference</a>. The conference takes place in London, UK, on October 24-25:</p>
<blockquote><p>Percona Live is an intensive two-days MySQL summit. There are many tracks of expert speakers, including Percona consultants and hand-picked guests. The sessions are 100% technical—even the sponsored sessions.</p></blockquote>
<p>It will be an honor to speak among many highest-level professionals working on the largest projects in the world &#8211; speaker list and schedule is really impressive, <a href="http://www.percona.com/live/london-2011/schedule-conference/">check it out </a>at the official conference website.</p>
<p><span id="more-948"></span>Our session is titled &#8220;Building 50TB-scale search engine with MySQL and Sphinx&#8221; and shares the experience of using MySQL and Sphinx to build a search engine &#8211; the architecture of a stable, high-performance, easy-to-scale system that has low maintenance costs. We&#8217;ll discuss technical details starting from hardware and software configuration to maintenance, monitoring and high-availability solutions &#8211; how we ensure system is stable and what we do to to keep up with the data and usage growth.</p>
<p>If you are visiting the conference do not hesitate to say hello! If you are not attending, but in London during Oct 24-25 and would like to meet us, <a href="http://www.ivinco.com/contact-us/">please let us know</a>!</p>
<p>P.S. organizers provided a discount code &#8220;<strong>come-c-talk</strong>&#8221; &#8211; use it on registration to get £40 off for your tickets.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/we-are-speaking-at-percona-live-london-oct-24-25/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Contribute to Ivinco Open Source Projects on GitHub</title>
		<link>http://www.ivinco.com/blog/contribute-to-ivinco-open-source-projects-on-github/</link>
		<comments>http://www.ivinco.com/blog/contribute-to-ivinco-open-source-projects-on-github/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 15:09:34 +0000</pubDate>
		<dc:creator>Yaroslav Vorozhko</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[GitHub]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=932</guid>
		<description><![CDATA[We have published our open source projects on Github &#8211; to make it easier for people who use our projects to contribute. You can find a list of our projects at Ivinco Software page. If you have any questions, feel free to contact us or leave a comment.]]></description>
			<content:encoded><![CDATA[<p>We have published our open source projects on <a href="https://github.com/Ivinco">Github</a> &#8211; to make it easier for people who use our projects to contribute.</p>
<p>You can find a list of our projects at <a href="http://www.ivinco.com/software/">Ivinco Software</a> page.</p>
<p>If you have any questions, feel free to <a href="http://www.ivinco.com/contact-us/">contact us</a> or leave a comment.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/contribute-to-ivinco-open-source-projects-on-github/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>High performance BuildExcerpts() with Sphinx Search</title>
		<link>http://www.ivinco.com/blog/high-performance-buildexcerpts-with-sphinx-search/</link>
		<comments>http://www.ivinco.com/blog/high-performance-buildexcerpts-with-sphinx-search/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 08:40:18 +0000</pubDate>
		<dc:creator>Yaroslav Vorozhko</dc:creator>
				<category><![CDATA[Performance optimization]]></category>
		<category><![CDATA[Sphinx search engine]]></category>
		<category><![CDATA[Tips]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Search excerpts]]></category>
		<category><![CDATA[Sphinx Search]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=915</guid>
		<description><![CDATA[Overview Since version 2.0.1 Sphinx has ability to build snippets in a parallel mode. It means that Sphinx will use several CPUs to do that. Below is the instruction on how to efficiently do that, but recommend this only if you need to build excerpts for large amount of text like 10-100 Mb. Sphinx parallel [...]]]></description>
			<content:encoded><![CDATA[<h3>Overview</h3>
<p>Since version 2.0.1 Sphinx has ability to build snippets in a parallel mode. It means that Sphinx will use several CPUs to do that. Below is the instruction on how to efficiently do that, but recommend this only if you need to build excerpts for large amount of text like 10-100 Mb.</p>
<p>Sphinx parallel processing is controlled by &#8216;dist_threads&#8217; option, which tells searchd how many CPUs should be utilized for search processing.  This parameter is also used by BuildExcerpts() API call in combination with &#8216;load_files&#8217; option. By default the first parameter of function BuildExcerpts() is expected to be an array of text strings, but if the &#8216;load_files&#8217; option is set to &#8217;1&#8242; then it should contain another thing &#8211; array of file names. Each file should contain the text for which you want to build an excerpt. These two options in combination allow Sphinx to build excerpts in a parallel mode which works much faster for huge amount of texts for processing.</p>
<p>But, this implementation has a bottleneck – it requires a file system to be used to read/write the files. If you use a disk it may be 1000 times slower than when it&#8217;s done in memory, so the right solution is to use file system in memory.</p>
<p><a href="http://en.wikipedia.org/wiki/Tmpfs">tmpfs</a> does the job, it is a file system in memory, it is supported by the Linux kernel from version 2.4 and up. So I used this to workaround the files read/write performance issue.</p>
<h3>File system</h3>
<p>How to mount in-memory file system tmpfs:</p>
<pre class="brush:bash">
mkdir /space
mount -t tmpfs -o size=1G,nr_inodes=10k,mode=0700 tmpfs /space
</pre>
<p>First I created the directory and then mounted tmpfs to that.<br />
Among the parameters I specified file system size = 1 Gb and access permissions only for owner of the directory /space. </p>
<h3>My BuildExcerpts() function based on files</h3>
<pre class="brush:php">
function buildExcerptFile($documents, $options = array())
{
        foreach($documents as $doc){
            $file = "/space/".'snip_'.md5($doc).'_'.time();
            file_put_contents($file, $doc);
            $files[] = $file;
        }

        $client = new SphinxClient();
        $client->setServer('localhost', 9312);

        $res = $client->BuildExcerpts( $files, 'index', $keywords,
                array(
                    'around'=>10,
                    'limit' => 300,
                    'load_files' => 1
                    )
                );

        foreach($files as $file){
            unlink($file);
        }

        return $res;
}
</pre>
<p>Function works in three stages:</p>
<ul>
<li>1. Convert text documents into temporary files. I choose dynamic file names to prevent file name collisions.</li>
<li>1. Call BuildExcerpts() function. The first parameter contains the list of file names instead of the list of documents and the third parameter contains &#8216;load_files&#8217; option equal to &#8217;1&#8242;, which tells BuildExcerpts() to process the documents as files.</li>
<li>1. Remove the temporary files to clear garbage from memory.</li>
</ul>
<h3>Setup dist_threads option</h3>
<p>Add the following in the searchd section of your Sphinx config:</p>
<pre class="brush:bash">
dist_threads = 2
</pre>
<p>I prefer to set dist_threads equal to number of CPUs in the system.</p>
<h3>Conclusion</h3>
<p>In my testing environment I gained two times better performance compared to the default BuildExcerpts() call performance.<br />
Average size of documents was about 3-10 Mb. I passed 100 documents per one BuildExcerpts() call.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/high-performance-buildexcerpts-with-sphinx-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WordPress Sphinx Search plugin version 3.0 released</title>
		<link>http://www.ivinco.com/blog/wordpress-sphinx-search-plugin-version-3.0-released/</link>
		<comments>http://www.ivinco.com/blog/wordpress-sphinx-search-plugin-version-3.0-released/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 10:24:17 +0000</pubDate>
		<dc:creator>Yaroslav Vorozhko</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Plugin]]></category>
		<category><![CDATA[Sphinx Search]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.ivinco.com/?p=573</guid>
		<description><![CDATA[We&#8217;re glad to announce the third version of WordPress Sphinx Search plugin. WordPress Sphinx Search plugin allows to use Sphinx Search Server power to enable ultra-fast and feature-rich search on WordPress-based websites. It is especially useful when your WordPress site becomes very large. The new version enables a range of new tools that improve search [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Wordpress Sphinx Search plugin" rel="attachment wp-att-574" href="http://www.ivinco.com/blog/wordpress-sphinx-search-plugin-version-3.0-released/sphinx_wp_search/"><img class="alignright size-medium wp-image-574" title="Wordpress Sphinx Search plugin" src="http://www.ivinco.com/wp-content/uploads/2011/04/sphinx_wp_search-300x228.png" alt="Wordpress Sphinx Search plugin" /></a></p>
<p>We&#8217;re glad to announce the third version of WordPress Sphinx Search plugin.</p>
<p>WordPress Sphinx Search plugin allows to use Sphinx Search Server power to enable ultra-fast and feature-rich search on WordPress-based websites. It is especially useful when your WordPress site becomes very large.</p>
<p>The new version enables a range of new tools that improve search quality and can help you increase your site&#8217;s rankings in search engines like Google and Bing.</p>
<p>With this version search results are more relevant, you can perform searches in posts, pages and comments using flexible search syntax sorting the results by freshness and relevance. This plugin comes with sidebar widgets to display the most recent searches, top searches and related searches.<br />
<span id="more-573"></span></p>
<h2>New &#8220;freshness &amp; relevance&#8221; search mode</h2>
<p><a title="Wordpress Advanced Search form" rel="attachment wp-att-575" href="http://www.ivinco.com/blog/wordpress-sphinx-search-plugin-version-3.0-released/wp_sphinx_search_modes_widget/"><img class="alignright size-full wp-image-575" style="border: 1px solid black;" title="Wordpress" src="http://www.ivinco.com/wp-content/uploads/2011/04/wp_sphinx_search_modes_widget.png" alt="Wordpress Advanced Search form" /></a></p>
<p>New &#8220;freshness &amp; relevance&#8221; search mode lets you sort results by relevance within time segments. It is perfect for blogs and news sites as new relevant posts will rank higher than old ones not allowing old information to take best positions in the search results. This mode is on by default.</p>
<div style="clear: both;"><!-- do not remove --></div>
<h2>Related search terms widget</h2>
<p><a title="Wordpress Related Search Terms" rel="attachment wp-att-576" href="http://www.ivinco.com/blog/wordpress-sphinx-search-plugin-version-3.0-released/wp_sphinx_related_searches_widget/"><img class="alignright size-full wp-image-576" style="border: 1px solid black;" title="Wordpress Related Search Terms" src="http://www.ivinco.com/wp-content/uploads/2011/04/wp_sphinx_related_searches_widget.png" alt="Wordpress Related Search Terms" /></a></p>
<p>This is a great tool for SEO and navigation on your site, it shows relevant search phrases for different sections of your blog:</p>
<ul>
<li>Search result pagess</li>
<li>Posts</li>
<li>Pages</li>
</ul>
<p>For example if site visitor is on your product page the widget will analyze the page title and will show the most relevant search terms. When visitors search for &#8220;ipad&#8221;, other relevant search terms that include &#8220;ipad&#8221; will be displayed in this sidebar widget.<br />
And the most tasty:</p>
<ul>
<li>This widget will help search engines like Google  and Bing to find more relevant pages on your site to improve your search engine rankings</li>
<li>It can also help visitors find relevant information on your site easier</li>
</ul>
<h2>Search terms management</h2>
<p><a title="Wordpress Search Terms Management tools" rel="attachment wp-att-577" href="http://www.ivinco.com/blog/wordpress-sphinx-search-plugin-version-3.0-released/wp_sphinx_search_mng_tools/"><img class="alignright size-medium wp-image-577" style="border: 1px solid black;" title="Wordpress Search Terms Management tools" src="http://www.ivinco.com/wp-content/uploads/2011/04/wp_sphinx_search_mng_tools-300x207.png" alt="Wordpress Search Terms Management tools" /></a></p>
<p>All searches, performed by your visitors are being tracked and we&#8217;ve developed a range of tools to manage the tracked search terms. You can control the search terms that are displayed in the widgets.<br />
With the search terms management tools you can:</p>
<ul>
<li>Show only approved search terms</li>
<li>Block search terms</li>
<li>Make blacklist to hide all search terms containing the blacklisted phrases</li>
<li>Import your own search terms which fit your blog site best of all</li>
<li>Export search terms to Excel-like document</li>
</ul>
<p><a name="friendly_urls"></a></p>
<h2>Search engine friendly URLs</h2>
<p><em>Human readable or search engine friendly URLs are the URLs that make sense to both humans and search engines because they explain the path to the particular page they point to.</em></p>
<p>A few advantages of search engine friendly URLS:</p>
<ul>
<li>Letting users know what the URL is about</li>
<li>Keyword-rich URLs impact SEO (can improve your page rankings on search engines like Google).</li>
<li>URLs are cleaner, meaningful and descriptive, for example, <a href="#">http://yourblog.com/search/product+name/</a> looks better than <a href="#">http://yourblog.com/?s=product+name</a></li>
</ul>
<h3>How to enable search engine friendly URLs?</h3>
<p>By default this option is turned off, so here&#8217;s how you can enable it.</p>
<p>Activate WordPress permalinks. Go to <em>WP Admin → Settings → Permalinks</em> and chose non default option.<br />
<a href="http://www.ivinco.com/?attachment_id=846" rel="attachment wp-att-846"><img src="http://www.ivinco.com/wp-content/uploads/2011/07/permalinks.jpg" alt="permalinks settings" title="permalinks settings" width="665" height="255" class="aligncenter size-full wp-image-846" style="border: 1px solid black;" /></a></p>
<p>Then download and install <a href="http://wordpress.org/extend/plugins/wordpress-sphinx-plugin/">WordPress Sphinx Search plugin</a> version 3.3 or higher. Follow this <a href="http://www.ivinco.com/software/wordpress-sphinx-search-tutorial/">tutorial to install Sphinx Search</a> plugin.</p>
<p>After installation is complete, go to <em>Settings → Sphinx Search → Search settings (tab)</em> and set &#8220;<em>Enable friendly URLs</em>&#8221; option to ON.<br />
<a href="http://www.ivinco.com/?attachment_id=852" rel="attachment wp-att-852"><img src="http://www.ivinco.com/wp-content/uploads/2011/07/enable_friendly_urls.jpg" alt="Search engine friendly URLs" title="Search engine friendly URLs" width="302" height="171" class="aligncenter size-full wp-image-852" style="border: 1px solid black;" /></a></p>
<h2>And many more&#8230;</h2>
<ul>
<li><strong>Sphinx in each widget component</strong>. All parts of the plugin now use Sphinx which means no more heavy MySQL full-text search queries. All works extremely fast.</li>
<li><strong>Search terms statistics</strong>. This is a good tool to analyze your search terms. It helps you better understand what your visitors look for on your site.</li>
<li><strong>Improved top searches widget</strong>. Lots of new settings have been added to the top searches widget. You can now take more control over what things will be shown on different types of pages on your site. Besides, it is now easier to manage it using the search terms management tool. You can also add your own terms to always display in top searches list — good ability to promote something on your site, for example your new product.</li>
</ul>
<p>You can find more information on the plugin <a href="http://www.ivinco.com/software/wordpress-sphinx-search/">official page</a>.</p>
<h3>Support</h3>
<p>This plugin is developed by <a href="http://www.ivinco.com/">Ivinco</a>. If you need commercial support, or if you’d like WordPress Sphinx Search Plugin customized for your needs, we <a href="http://www.ivinco.com/contact-us/">can help</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.ivinco.com/blog/wordpress-sphinx-search-plugin-version-3.0-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

