您好,欢迎来到要发发知识网。
搜索
您的当前位置:首页转:MOSS Search Word Stemming - Part 2

转:MOSS Search Word Stemming - Part 2

来源:要发发知识网

MOSS Search Word Stemming - Part 2

So how Does MOSS Expand Search Query Terms to Related Words?

 

Here is how this works in MOSS:

 

In MOSS, stemming is used in combination with the word breaker component which determines where word boundaries are. The word breaker is used at both index and query time while the stemmer is used only at query time for most languages (the exceptions currently are Arabic and Hebrew) to perform both morphological analysis and morphological generation. In the case of Arabic and Hebrew, stemming is restricted to morphological analysis at both query and index time. A stemmer links word forms to their base form. For example, ”running,” ”ran,” and ”runs“ are all variants of the verb ”to run.” Stemming is currently turned off by default for some languages including English. Stemmers are only available for languages which have significant morphological variation among their word forms. This means that for languages where stemmers are not available (such as Vietnamese) turning on this feature in the Search Result Page (CoreResult Web Part) will not have any effect, since in such languages exact match is all that is needed.

 

Word Stemming is NOT the same thing as Wild Card Searching, which our engine supports as well. Wild Card searching has to do with doing searches with * in the query. This means you are asking the search engine to find you all words that start with the text string and end with anything, since * means match any character any number of times until you reach the end of the word which in most languages (excluding most East Asian languages) is indicated by a white space.  So a search query using * such as "Share*" will return results including "SharePoint", while a search query using morphological processing would bring back "sharing", which is an inflectional variant of Share. Wild Card searching and Word Stemming are often used to refer to the same thing but they are in fact separate and different mechanisms which can return different results.

 

Word Stemming would bring back words closely related to the query terms (usually inflectional variants for most languages, but for some languages derivational variants as well).

 

 For example, for the following queries, here are some sample results

  • If you type in "run" --> in addition to exact matches on “run”, it will bring back matches on "runs", "ran" and "running"
  • If you type in "page" --> in addition to exact matches on “page” it will bring back matches on "pages", "paged" and "paging"
  • If you type in "basket" --> in addition to exact matches on “basket” it will find "baskets", but it will not find "basketball".  A wild card search for “basket*” would find basketball, which our engine supports and I will discuss this in another article. Word Stemming does not handle this currently because we have focused on matching inflectional variants of words only rather than derivational variants.

However this option is turned off by default out of the box for English and some other languages. You can turn this on by going to the Search Results Web Part, and then Options and turn on this feature which is called “Enable Search Term Stemming”.

 

Thanks for Ian Johnson from the Natural Language Group at Microsoft for providing his feedback on this.

 

Hope that helps

Mike

 

Pasted from <>

转载于:https://www.cnblogs.com/wenjielee/archive/2010/12/29/1921152.html

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- net188.cn 版权所有 湘ICP备2022005869号-2

违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务