Full-text search enables you to find search term matches in the content of a document, whereas the standard search only locates matches in document fields. For performance reasons, full-text search returns only the first 10,000 matching documents.
Searching Document Content
You can only use full-text search from the Advanced Search dialog box. Complete the following steps to include document content in searches:
- Select Advanced Search (the binoculars icon) in the search bar to open the Advanced Search dialog box.
- Select the Include Content checkbox in the Search Scope section.
- Complete any remaining fields as needed, then select Search.
If SiteVault finds a match for your search terms in the document content, the search results page displays an excerpt from the document to provide context for the matching term.
Indexing for Full-Text Search
SiteVault automatically indexes the full text for documents with supported source file formats to support full-text search. Document content is typically available for search within minutes after upload, but in cases where SiteVault is uploading many documents simultaneously, there may be a delay. Indexing also occurs for document and object attachments.
Searchable Scanned Documents
SiteVault can extract and index text in scanned source documents that users upload as images or .PDF files. This functionality, called Optical Character Recognition (OCR), enables you to use full-text search on these documents. SiteVault only extracts typed, English-language text.
Supported Formats for Text Extract
OCR automatically attempts to extract text from files with the following supported formats:
- .PDF (only if the .PDF does not already contain text)
- Portable Network Graphics (.PNG)
- Tagged Image File Format (.TIF and .TIFF)
- .JPEG and .JPG
- Graphics Interchange Format (.GIF)
- Bitmap (.BMP)