[BNM] Suggest in search - way of doing with MySQL?
Alastair James
al.james at gmail.com
Thu Oct 2 16:27:17 BST 2008
HI...
+1 Not using mysql fulltext. Ever.
If you dont want to use lucene, I would suggest looking at sphinx
search. http://www.sphinxsearch.com/.
Its a nice little full text engine. Its biggest problem is that it is
hard to set up to work with live updates (i.e. you need to batch
update every so often so make sure your code can cope with the
fulltext index being out of sync without crashing). When you update
you can get a list of words (and their frequencies) and store in a
file / table. Then its pretty easy to split the query into words and
compare each to the words in the index file using the levenshtein
function.
Note, however, that solutions based on individual words (as opposed to
n-grams, groups of words) might suggest combinations of words that
dont occur together in the dataset. I.e. it might suggest a sentence
that does not occur in that order. If you are searching in a strict
mode, this suggested query will still produce no results!
I would have a look at the Zend implementation of lucene, I have not
used it but I hear its good, but not that fast.
Al
2008/10/2 Mark Ng <mark at markng.me.uk>:
> Oh, one extra bonus - mySQL full text searching algorithms can
> sometimes be quite slow (depending on the search). If you're working
> on a large (traffic wise) site, externalising the search into Lucene
> is 1) faster and 2) means you can take the weight of search away from
> the DB (which is the hardest thing to parallellize.)
>
> 2008/10/2 Mark Ng <mark at markng.me.uk>:
>> First, I'd suggest indexing whatever it is you need to index with some
>> form of Lucene implementation (Zend has one for PHP). The mysql full
>> text indexing is useful in some situations, but it requires myISAM
>> tables (which aren't always the best move) and Full-Text indexing is a
>> pain when you have to search across multiple fields and even more so
>> across multiple tables. Lucene indexing allows you to prioritise
>> certain fields (so, for example, you can automatically weight words in
>> a title field), and does a much better job of relevancy searching.
>>
>> You can then use levenshtein (as suggested elsewhere) distances to
>> suggest other words from the index if you want to, but Lucene has this
>> fuzzy matching already built in, so if you add a tilde to the search
>> query, it'll use those techniques to do this automatically for you.
>>
>> HTH,
>>
>> Mark
>>
>> 2008/10/2 Paul Silver <paul at tenpastmidnight.com>:
>>> Hi,
>>>
>>> I'm making a search within a database driven site soon. We'd like to be
>>> able to suggest corrections or just show results for the suggestions if
>>> nothing is returned from a straight search.
>>>
>>> E.g. visitor searches for "bananas in pijamas"
>>>
>>> Results say: "No results for 'bananas in pijamas', would you like to
>>> search for 'banans in pyjamas'"
>>>
>>> Is there an easy way of doing this with PHP 5 & MySQL 5?
>>>
>>> Thanks
>>>
>>> Paul
>>> --
>>> Paul Silver - tel: 01273 906020 or mobile: 07813 654285
>>> Web Development: http://www.paulsilver.co.uk
>>> Search Engine Promotion: http://webpositioningcentre.co.uk
>>> Check your web pages: http://www.spidertest.com
>>>
>>> --
>>>
>>> BNM Subscribe/Unsubscribe:
>>> http://www.brightonnewmedia.org/options/bnmlist
>>>
>>> BNM powered by Wessex Networks:
>>> http://www.wessexnetworks.com
>>>
>>
> --
>
> BNM Subscribe/Unsubscribe:
> http://www.brightonnewmedia.org/options/bnmlist
>
> BNM powered by Wessex Networks:
> http://www.wessexnetworks.com
>
--
Dr Alastair James
CTO James Publishing Ltd.
www.worldreviewer.com
Winner Yahoo! Finds of the Year
WINNER Travolution Awards Best New Online Travel Company 2008
"In a market increasingly crowded with new content and experience-led
sites, Worldreviewer stands out a mile. It has used exclusive
editorial and excellent tools to wonderful effect and ensured its
commercial partnerships are relevant to the target market."
Blogs: onewheeledbicycle.com, traveltelegraph.com
"Utinam logica falsa tuam philosophiam totam suffodiant!"
More information about the BNMlist
mailing list. Powered by Wessex Networks