Showing posts with label marks. Show all posts
Showing posts with label marks. Show all posts

Wednesday, March 21, 2012

Punctuation marks

Does anyone have a list of all punctuation marks ignored by the full-text
indexing service by default. Noise files (i.e. noise.dat) only explicitly
list the dollar sign ($) and the underscore (_) as noise "words".
And another observation - the Windows implementation of MS Search (compared
to the MS SQL Server implementation) yields different results - try searching
for files with the "|" character in the file name. Ok, it's an illegal
character, but the result is at least 'interesting'.
As far as the rest of the characters ignored in SQL FTS are concerned, they
don't bother Windows search. Has anyone else come across these (or other)
discrepancies?
ML
I take it you are only talking about SQL FTS, you mention Indexing Services
and MSSearch in here which are two separate products although SQL FTS uses
the MSSearch engine.
SQL FTS indexes alphanumeric characters. Most other characters are not
indexed but the engine is aware that something existed there. So a search on
AT&T will match with AT&T, AT!T, AT*T, AT$T, and AT T, if A, T, and At are
not in your noise word list.
..,!:; are discarded.
"ML" <ML@.discussions.microsoft.com> wrote in message
news:F406D8AB-E6AA-4C21-BF8E-51010B809459@.microsoft.com...
> Does anyone have a list of all punctuation marks ignored by the full-text
> indexing service by default. Noise files (i.e. noise.dat) only explicitly
> list the dollar sign ($) and the underscore (_) as noise "words".
> And another observation - the Windows implementation of MS Search
> (compared
> to the MS SQL Server implementation) yields different results - try
> searching
> for files with the "|" character in the file name. Ok, it's an illegal
> character, but the result is at least 'interesting'.
> As far as the rest of the characters ignored in SQL FTS are concerned,
> they
> don't bother Windows search. Has anyone else come across these (or other)
> discrepancies?
>
> ML
|||Thank you, very much. Yes, mainly I'm referring to SQL FTS and I'm aware of
the fact tha SQL FTS and Windows Indexing Services two are separate products.
I'm just baffled by the fact that the two implementations of the MSSearch
engines differ in such a way. Any idea why?
Thanks for the list as well.
ML

Punctuation

I am trying to improve searching performance ... but I am storing data that
contains punctuation marks ... such as "E.L.O." and "R.E.M." (names of song
artists/groups).
Does this mean that I cannot use full-text searching at all for searching
for these artist names?
Is there a work-breaker that will allow the punctuation marks (fullstops in
particular), or is this a search issue rather than a word-breaker issue (i.e.
CONTAINS clause does not allow punctuation anyway)?
Wozza,
Can you post the full output of -- SELECT @.@.version -- where you have this
problem?
Have you removed all single letter from the language-specific noise word
files (under \FTDATA\SQLServer\Config where you have SQL Server installed)
and ran a Full Population after these modifications? If not, then please do
this. The default wordbreaker behavior for punctuation is dependent upon the
OS-supplied wordbreaker and the @.@.version info will provide that.
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Wozza" <Wozza@.discussions.microsoft.com> wrote in message
news:4FCD6297-AF76-4DE0-A43F-9FE2B667BAB2@.microsoft.com...
>I am trying to improve searching performance ... but I am storing data that
> contains punctuation marks ... such as "E.L.O." and "R.E.M." (names of
> song
> artists/groups).
> Does this mean that I cannot use full-text searching at all for searching
> for these artist names?
> Is there a work-breaker that will allow the punctuation marks (fullstops
> in
> particular), or is this a search issue rather than a word-breaker issue
> (i.e.
> CONTAINS clause does not allow punctuation anyway)?
>
|||Hi John,
select @.@.version produces ...
Microsoft SQL Server 2000 - 8.00.760 (Intel X86)
Dec 17 2002 14:22:05
Copyright (c) 1988-2003 Microsoft Corporation
Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 1)
"John Kane" wrote:

> Wozza,
> Can you post the full output of -- SELECT @.@.version -- where you have this
> problem?
> Have you removed all single letter from the language-specific noise word
> files (under \FTDATA\SQLServer\Config where you have SQL Server installed)
> and ran a Full Population after these modifications? If not, then please do
> this. The default wordbreaker behavior for punctuation is dependent upon the
> OS-supplied wordbreaker and the @.@.version info will provide that.
> Thanks,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> "Wozza" <Wozza@.discussions.microsoft.com> wrote in message
> news:4FCD6297-AF76-4DE0-A43F-9FE2B667BAB2@.microsoft.com...
>
>
|||John,
I have also cleared the Noise.dat file (my index set up to use the Neutral
language).
If I have done this ... how do I serach for "r.e.m." for instance.
Warren
"John Kane" wrote:

> Wozza,
> Can you post the full output of -- SELECT @.@.version -- where you have this
> problem?
> Have you removed all single letter from the language-specific noise word
> files (under \FTDATA\SQLServer\Config where you have SQL Server installed)
> and ran a Full Population after these modifications? If not, then please do
> this. The default wordbreaker behavior for punctuation is dependent upon the
> OS-supplied wordbreaker and the @.@.version info will provide that.
> Thanks,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> "Wozza" <Wozza@.discussions.microsoft.com> wrote in message
> news:4FCD6297-AF76-4DE0-A43F-9FE2B667BAB2@.microsoft.com...
>
>
|||Wozza,
Ok, as you're using Win2003 (Windows NT 5.2) and therefore using the
langwrbk.dll wordbreaker (vs. Win2K's infosoft.dll), you can search for the
three single letters using CONTAINS, for example: Note, the use of double
quotes to contain all single letters:
SELECT * FROM MyTable where CONTAINS(*,'"R.E.M"')
If you continue to get an error, then add back a single space character in
the noise.dat file under \FTDATA where SQL Server 2000 is installed and run
a Full Population, then re-run the above query.
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Wozza" <Wozza@.discussions.microsoft.com> wrote in message
news:28143E7D-BA1E-47EE-9C86-CB95D4A428ED@.microsoft.com...[vbcol=seagreen]
> John,
> I have also cleared the Noise.dat file (my index set up to use the Neutral
> language).
> If I have done this ... how do I serach for "r.e.m." for instance.
> Warren
> "John Kane" wrote:
|||ok, I tried
SELECT * FROM Track where CONTAINS(*,'"R.E.M."')
and
SELECT * FROM Track where CONTAINS(*,'"R.E.M"')
and got the same error each time ...
Server: Msg 7619, Level 16, State 1, Line 1
Execution of a full-text operation failed. A clause of the query contained
only ignored words.
... so I added a space to Noise.dat and am repopulating.
"John Kane" wrote:

> Wozza,
> Ok, as you're using Win2003 (Windows NT 5.2) and therefore using the
> langwrbk.dll wordbreaker (vs. Win2K's infosoft.dll), you can search for the
> three single letters using CONTAINS, for example: Note, the use of double
> quotes to contain all single letters:
> SELECT * FROM MyTable where CONTAINS(*,'"R.E.M"')
> If you continue to get an error, then add back a single space character in
> the noise.dat file under \FTDATA where SQL Server 2000 is installed and run
> a Full Population, then re-run the above query.
> Thanks,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> "Wozza" <Wozza@.discussions.microsoft.com> wrote in message
> news:28143E7D-BA1E-47EE-9C86-CB95D4A428ED@.microsoft.com...
>
>