|
You
can search for any word or phrase on a
Web site by typing the word or phrase
into a query form and clicking the button
to execute the query (for example, the
Execute Query button on the sample query
form). This section covers the following
topics:
Searches
produce a list of files that contain the
word or phrase no matter where they appear
in the text. This list gives the rules
for formulating queries:
- Consecutive
words are treated as a phrase; they
must appear in the same order within
a matching document.
- Queries
are case-insensitive, so you can type
your query in uppercase or lowercase.
- You
can search for any word except for those
in the exception list (for English,
this includes a, an,
and, as, and other
common words), which are ignored during
a search.
- Words
in the exception list are treated as
placeholders in phrase and proximity
queries. For example, if you searched
for Word for Windows, the
results could give you Word for
Windows and Word and Windows,
because for is a noise word
and appears in the exception list.
- Punctuation
marks such as the period (.), colon
(:), semicolon (;), and comma (,) are
ignored during a search.
- To
use specially treated characters such
as &, |, ^, #, @, $, (, ), in a
query, enclose your query in quotation
marks ().
- To
search for a word or phrase containing
quotation marks, enclose the entire
phrase in quotation marks and then double
the quotation marks around the word
or words you want to surround with quotes.
For example, World-Wide Web or
Web searches
for World-Wide Web or Web.
- You
can insert Boolean
operators (AND,
OR, and NOT)
and the proximity
operator (NEAR)
to specify additional search information.
- The
wildcard
character (*) can match words with
a given prefix. The query esc* matches
the terms ESC, escape,
and so on.
- Free-text
queries can be specified without
regard to query syntax.
- Vector
space queries can be specified.
- ActiveX
(OLE) and file attribute property
value queries can be issued.
Boolean
and proximity operators can create a more
precise query.
| To
Search For |
Example |
Results |
| Both
terms in the same page |
access
and basic
Or
access & basic |
Pages
with both the words access
and basic |
| Either
term in a page |
cgi
or isapi
Or
cgi | isapi |
Pages
with the words cgi or
isapi |
| The
first term without the second term
|
access
and not basic
Or
access & ! basic |
Pages
with the word access but
not basic |
| Pages
not matching a property value |
not
@size = 100
Or
! @size = 100 |
Pages
that are not 100 bytes |
| Both
terms in the same page, close together
|
excel
near project
Or
excel ~ project |
Pages
with the word excel near
the word project |
Hints:
- You
can add parentheses to nest expressions
within a query. The expressions in parentheses
are evaluated before the rest of the
query.
- Use
double quotes () to indicate that
a Boolean or NEAR operator
keyword should be ignored in your query.
For example, Abbott and Costello
will match pages with the phrase, not
pages that match the Boolean expression.
In addition to being an operator, the
word and is a noise word in
English.
- The
NEAR operator is similar to
the AND operator in
that NEAR returns a
match if both words being searched for
are in the same page. However, the NEAR
operator differs from AND
because the rank assigned by NEAR
depends on the proximity of words. That
is, the rank of a page with the searched-for
words closer together is greater than
or equal to the rank of a page where
the words are farther apart. If the
searched-for words are more than 50
words apart, they are not considered
near enough, and the page is assigned
a rank of zero.
- The
NOT operator can be
used only after an AND
operator in content queries; it can
be used only to exclude pages that match
a previous content restriction. For
property value queries, the NOT
operator can be used apart from the
AND operator.
- The
AND operator has a
higher precedence than OR.
For example, the first three queries
are equal, but the fourth is not:a AND
b OR c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note The
symbols (&, |, !, ~) and the English
keywords AND, OR,
NOT, and NEAR
work the same way in all languages supported
by Index Server. Localized keywords are
also available when the browser locale
is set to one of the following six languages:
| Language |
Keywords |
| German |
UND,
ODER, NICHT,
NAH |
| French |
ET,
OU, SANS,
PRES |
| Spanish |
Y,
O, NO,
CERCA |
| Dutch |
EN,
OF, NIET,
NABIJ |
| Swedish |
OCH,
ELLER, INTE,
NÄRA |
| Italian |
E,
O, NO,
VICINO |
Note The
NEAR operator can be applied only to words
or phrases.
Wildcard
operators help you find pages containing
words similar to a given word.
The
query engine finds pages that best match
the words and phrases in a free-text query.
This is done by automatically finding
pages that match the meaning, not the
exact wording, of the query. Boolean,
proximity, and wildcard operators are
ignored within a free-text query. Free-text
queries are prefixed with $contents.
The
query engine supports vector space queries.
Vector queries return pages that match
a list of words and phrases. The rank
of each page indicates how well the page
matched the query.
| To
Search For |
Example |
Results |
| Pages
that contain specific words |
light,
bulb |
Files
with words that best match the words
being searched for |
| Pages
that contain weighted prefixes, words,
and phrases |
invent*,
light[50], bulb[10], "light bulb"[400] |
Files
that contain words prefixed by invent,
the words light, bulb,
and the phrase light bulb
(the terms are weighted) |
- Components
in vector queries are separated by commas.
- Components
in vector queries can be weighted by
using the [weight] syntax.
- Pages
returned by vector queries do not necessarily
match every term in the query.
- Vector
queries work best when the results are
sorted by rank.
With
property value queries, you can find files
that have property values that match a
given criteria. The properties over which
you can query include basic file information
like file name and file size, and ActiveX
properties including the document summary
(information) that is stored in files
created by ActiveX-aware applications.
There
are two types of property queries:
- Relational
property queries consist of an
at character (@), a property
name, a relational
operator, and a property
value. For example, to find all
of the files larger than one million
bytes, issue the query @size > 1000000.
- Regular
expression property queries consist
of a number sign (#), a property name,
and a regular
expression for the property value.
For example, to find to find all of
the video (.avi) files, issue the query
#filename *.avi. Regular expressions
will never match the special properties
contents (#contents) and all (#all).
Properties that are not retrievable
at query time cannot be used in # queries.
these include HTML META properties not
stored in the property cache.
This
section covers the following topics:
Property
names are preceded by either the at
(@) or number sign (#) character. Use
@ for relational queries, and # for regular
expression queries.
If
no property name is specified, @contents
is assumed.
Properties
available for all files include:
| Property
Name |
Description |
| All |
Matches words, phrases,
and any property |
| Contents |
Words and phrases in
the file |
| Filename |
Name of the file |
| Size |
File size |
| Write |
Last time the file
was modified |
ActiveX
property values can also be used in queries.
Web sites with files created by most ActiveX-aware
applications can be queried for these
properties:
| Property
Name |
Description |
| DocTitle |
Title of the document |
| DocSubject |
Subject of the document
|
| DocAuthor |
The documents
author |
| DocKeywords |
Keywords for the document
|
| DocComments |
Comments about the
document |
For
a complete list of property names, see
the List
of Property Names later on this page.
Relational
operators are used in relational property
queries.
| To
Search For |
Example |
Results |
| Property values in
relation to a fixed value |
@size
< 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
Files whose size matches
the query |
| Property values with
all of a set of bits on |
@attrib
^a 0x820 |
Compressed files with
the archive bit on |
| Property values with
some of a set of bits on |
@attrib
^s 0x20 |
Files with the archive
bit on |
| To
Search For |
Example |
Results |
| A specific value |
@DocAuthor
= Bill Barnes |
Files authored by Bill
Barnes |
| Values beginning with
a prefix |
#DocAuthor
George* |
Files whose author
property begins with George
|
| Files with any of a
set of extensions |
#filename
*.|(exe|,dll|,sys|) |
Files with .exe, .dll,
or .sys extensions |
| Files modified after
a certain date |
@write
> 96/2/14 10:00:00 |
Files modified after
February 14, 1996 at 10:00 GMT |
| Files modified after
a relative date |
@write
> -1d2h |
Files modified in the
last 26 hours |
| Vectors matching a
vector |
@vectorprop
= { 10, 15, 20 } |
ActiveX documents with
a vectorprop value of { 10, 15, 20
} |
| Vectors where each
value matches a criteria |
@vectorprop
>^a 15 |
ActiveX documents with
a vectorprop value in which all values
in the vector are greater than 15 |
| Vectors where at least
one value matches a criteria |
@vectorprop
=^s 15 |
ActiveX documents with
a vectorprop value in which at least
one value is 15 |
- Be
sure to use the pound (#) character
before the property name when using
a regular expression in a property value,
and an at (@) character
otherwise. The equal (=) relational
operator is assumed for regular-expression
queries.
- File
name (#filename) is the only property
that efficiently supports regular expressions
with wildcards to the left
of text.
- Date
and time values are of the form yyyy/mm/dd
hh:mm:ss or yyyy-mm-dd hh:mm:ss.
The first two characters of the year
and the entire time can be omitted.
If you omit the first two characters
of the year, then 29 or less is interpreted
as the year 2000, and 30 or greater
is interpreted as the year 1900. All
dates and times are in Greenwich Mean
Time (GMT).
- Dates
and times relative to the current time
can be expressed with a minus (-) character
followed by zero or by more integer
unit and time unit pairs. Time units
are expressed as: (y) for years, (m)
for months, (w) for weeks, (d) for days,
(h) for hours, (n) for minutes, and
(s) for seconds. A three-digit millisecond
value can be optionally specified after
the seconds value in date expressions.
For example, 1997/12/8 10:10:03:452
- Currency
values are of the form x.y,
where x is the whole value
amount and y is the fractional
amount. There is no assumption about
units.
- Boolean
values are (t) or (true) for TRUE
and (f) or (false) for FALSE.
- Vectors
(VT_VECTOR) are expressed as an opening
brace ({), followed by a comma-separated
list of values, then a closing brace
(}).
- Single-value
expressions that are compared against
vectors are expressed as a relational
operator, then a (^a) for all
of or a (^s) for some of.
- Numeric
values can be in decimal or hexadecimal
(preceded by 0x).
- The
contents property does not
support relational operators. If a relational
operator is specified, no results will
be found. For example, @contents Microsoft
will find documents containing Microsoft,
but @contents=Microsoft
will find none.
Regular
expressions in property queries are defined
as follows:
- Any
character except asterisk (*), period
(.), question mark (?), and vertical
bar (|) defaults to matching just itself.
- Regular
expressions can be enclosed in matching
quotes (), and must be enclosed
in quotes if they contain a space (
) or closing parenthesis ()).
- The
characters *, ., and ? behave as they
behave in Windows; they match any number
of characters, match (.) or end of string,
and match any one character, respectively.
- The
character | is an escape character.
After |, the following characters have
special meaning:
(
opens a group. Must be followed by a
matching ).
)
closes a group. Must be preceded by
a matching (.
[
opens a character class. Must be followed
by a matching (un-escaped) ].
{
opens a counted match. Must be followed
by a matching }.
}
closes a counted match. Must be preceded
by a matching {.
,
separates OR clauses.
*
matches zero or more occurrences of
the preceding expression.
?
matches zero or one occurrences of the
preceding expression.
+
matches one or more occurrences of the
preceding expression.
Anything
else, including |, matches itself.
- Between
square brackets ([]) the following characters
have special meaning:
^
matches everything but following classes.
Must be the first character.
]
matches ]. May only be preceded by ^,
otherwise it closes the class.
-
range operator. Preceded and followed
by normal characters.
Anything
else matches itself (or begins or ends
a range at itself).
- Between
curly braces ({}) the following syntax
applies:
|{m|}
matches exactly m occurrences
of the preceding expression. (0 <
m < 256).
|{m,|}
matches at least m occurrences
of the preceding expression. (1 <
m < 256).
|{m,n|}
matches between m and n
occurrences of the preceding expression,
inclusive. (0 < m < 256, 0 <
n < 256).
- To
match *, ., and ?, enclose them in brackets
(for example, |[*]sample will match
*sample).
| Example |
Results |
@size
> 1000000 |
Pages larger than one
million bytes |
@write
> 95/12/23 |
Pages modified after
the date |
Apple
tree |
Pages with the phrase
apple tree |
"apple
tree" |
Same as above |
@contents
apple tree |
Same as above |
Microsoft
and @size > 1000000 |
Pages with the word
Microsoft that are larger
than one million bytes |
"microsoft
and @size > 1000000" |
Pages with the phrase
specified (not the same as above) |
#filename
*.avi |
Video files (the #
prefix is used because the query contains
a regular expression) |
@attrib
^s 32 |
Pages with the archive
attribute bit on |
@docauthor
= John Smith |
Pages with the given
author |
$contents
why is the sky blue? |
Pages that match the
query |
@size
< 100 & #filename *.gif |
Graphics Interchange
Format (GIF) files less than 100 bytes
in size |
These
properties are always available for queries.
Additional properties may also be available
depending on the configuration of the
Web server.
| Friendly
Name |
Datatype |
Property |
| A_HRef |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML HREF. This property name was
created for Microsoft® Site Server
and corresponds with the Index Server
property name HtmlHRef. Can be
queried but not retrieved. |
| Access |
VT_FILETIME |
Last time
file was accessed. |
| All |
(not applicable) |
Searches
every property for a string. Can
be queried but not retrieved. |
| AllocSize |
DBTYPE_I8 |
Size of
disk allocation for file. |
| Attrib |
DBTYPE_UI4 |
File attributes.
Documented in Win32 SDK. |
| ClassId |
DBTYPE_GUID |
Class ID
of object, for example, WordPerfect,
Word, and so on. |
| Characterization |
DBTYPE_WSTR | DBTYPE_BYREF |
Characterization,
or abstract, of document. Computed
by Index Server. |
| Contents |
(not applicable) |
Main contents
of file. Can be queried but not
retrieved. |
| Create |
VT_FILETIME |
Time file
was created. |
| Directory |
DBTYPE_WSTR | DBTYPE_BYREF |
Physical
path to the file, not including the
file name. |
| DocAppName |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of
application that created the file. |
| DocAuthor |
DBTYPE_WSTR | DBTYPE_BYREF |
Author
of document. |
| DocByteCount |
DBTYPE_14 |
Number of bytes in a document. |
| DocCategory |
DBTYPE_STR | DBTYPE_BYREF |
Type of document such as a memo,
schedule, or whitepaper. |
| DocCharCount |
DBTYPE_I4 |
Number
of characters in document. |
| DocComments |
DBTYPE_WSTR | DBTYPE_BYREF |
Comments
about document. |
| DocCompany |
DBTYPE_STR | DBTYPE_BYREF |
Name of the company for which the
document was written. |
| DocCreatedTm |
VT_FILETIME |
Time document
was created. |
| DocEditTime |
VT_FILETIME |
Total time
spent editing document. |
| DocHiddenCount |
DBTYPE_14 |
Number of hidden slides in a Microsoft®
PowerPoint document. |
| DocKeywords |
DBTYPE_WSTR | DBTYPE_BYREF |
Document
keywords. |
| DocLastAuthor |
DBTYPE_WSTR | DBTYPE_BYREF |
Most recent
user who edited document. |
| DocLastPrinted |
VT_FILETIME |
Time document
was last printed. |
| DocLastSavedTm |
VT_FILETIME |
Time document
was last saved. |
| DocLineCount |
DBTYPE_14 |
Number of lines contained in a document. |
| DocManager |
DBTYPE_STR | DBTYPE_BYREF |
Name of the manager of the documents
author. |
| DocNoteCount |
DBTYPE_14 |
Number of pages with notes in a
PowerPoint document. |
| DocPageCount |
DBTYPE_I4 |
Number
of pages in document. |
| DocParaCount |
DBTYPE_14 |
Number of paragraphs in a document. |
| DocPartTitles |
DBTYPE_STR | DBTYPE_VECTOR |
Names of document parts. For example,
in Excel part titles are the names
of spread sheets, in PowerPoint slide
titles, and in Word for Windows the
names of the documents in the master
document. |
| DocPresentationTarget |
DBTYPE_STR|DBTYPE_BYREF |
Target format (35mm, printer, video,
and so on) for a presentation in PowerPoint. |
| DocRevNumber |
DBTYPE_WSTR | DBTYPE_BYREF |
Current
version number of document. |
| DocSlideCount |
DBTYPE_14 |
Number of slides in a PowerPoint
document. |
| DocSubject |
DBTYPE_WSTR | DBTYPE_BYREF |
Subject
of document. |
| DocTemplate |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of
template for document. |
| DocTitle |
DBTYPE_WSTR | DBTYPE_BYREF |
Title of
document. |
| DocWordCount |
DBTYPE_I4 |
Number
of words in document. |
| FileIndex |
DBTYPE_I8 |
Unique
ID of file. |
| FileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Name of
file. |
| HitCount |
DBTYPE_I4 |
Number
of hits (words matching query) in
file. |
| HtmlHRef |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML HREF. Can be queried but
not retrieved. |
| HtmlHeading1 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML document in style H1. Can
be queried but not retrieved. |
| HtmlHeading2 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML document in style H2. Can
be queried but not retrieved. |
| HtmlHeading3 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML document in style H3. Can
be queried but not retrieved. |
| HtmlHeading4 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML document in style H4. Can
be queried but not retrieved. |
| HtmlHeading5 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML document in style H5. Can
be queried but not retrieved. |
| HtmlHeading6 |
DBTYPE_WSTR | DBTYPE_BYREF |
Text of
HTML document in style H6. Can
be queried but not retrieved. |
| Img_Alt |
DBTYPE_WSTR | DBTYPE_BYREF |
Alternate
text for <IMG> tags. Can
be queried but not retrieved. |
| Path |
DBTYPE_WSTR | DBTYPE_BYREF |
Full physical
path to file, including file name. |
| Rank |
DBTYPE_I4 |
Rank of
row. Ranges from 0 to 1000. Larger
numbers indicate better matches. |
| RankVector |
DBTYPE_I4 | DBTYPE_VECTOR |
Ranks of
individual components of a vector
query. |
| ShortFileName |
DBTYPE_WSTR | DBTYPE_BYREF |
Short (8.3)
file name. |
| Size |
DBTYPE_I8 |
Size of
file, in bytes. |
| USN |
DBTYPE_I8 |
Update
Sequence Number. NTFS drives only. |
| VPath |
DBTYPE_WSTR | DBTYPE_BYREF |
Full virtual
path to file, including file name.
If more than one possible path, then
the best match for the specific query
is chosen. |
| WorkId |
DBTYPE_I4 |
Internal
ID for file. Used within Index Server. |
| Write |
VT_FILETIME |
Last time
file was written. |
To
define properties that are not in the
previous list, you must list them in a
[Names] section in the .idq file. To use
these properties in a restriction, sort
specification, or as a retrieved column,
you have define them in the .idq file,
using the following format:
[Names]
#Properties that are not in the standard
list
Propertyname ( Datatype )
= GUID ["Name"
| propid]
In
the syntax, "Name"
is the property name ("Sales"
in the following example), and propid
is the property ID in hexadecimal. Note
that you need to surround the friendly
name with quotation marks, but the property
ID does not take quotation marks.
For
example, suppose you want to define an
HTML meta tag as a property name that
somebody can search for. The property
you want to define is Sales.
To
define the Sales property
- In
the .idq file, under the [Names] section,
add the following line.
MetaDescription(DBTYPE_WSTR)
= d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1
"Sales"
The
GUID number comes from the MetaTagClsid
parameter in the registry, at the following
location:
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\HtmlFilter
\MetaTagClsid
- Then,
in the HTML files where you want the
tag to appear, define the meta description.
For
example, say you want to search for
all files that give sales projections
for the future:
In
File1.htm:
<META
NAME="Sales" CONTENT="Projections
for 1998">
In
File2.htm:
<META
NAME="Sales" CONTENT="Projections
for 1999">
In
File3.htm:
<META
NAME="Sales" CONTENT="Sales
in 1997">
Note Be
sure to add your META NAME tags between
the <head> and </head> HTML
tags at the beginning of the file.
You
can now search for all files that show
sales projections. Send the following
query:
@metadescription
projections
This
query returns all the files with the word
projections in the CONTENT field
of the meta tag. In this example, File1.htm
and File2.htm are returned.
But
suppose you want to search for sales by
year, for example a list of sales in 2000.
Send the following query:
@metadescription
2000
File3.htm
is returned.
|