public final class EmailTextExtractor extends EmailTextExtractorBase implements ISearchable, IHighlightExtractor, IRegexSearchable, IStructuredExtractor
Provides the text extractor for email messages.
Supported formats:
.MSG | Microsoft Outlook message |
.EML | Email Message |
.EMLX | Apple's macOS Mail message |
Extracting an email:
// Create a text extractor for emails
EmailTextExtractor extractor = new EmailTextExtractor(stream);
// Extract a body of the message
System.out.println(extractor.extractAll());
// Iterate attachments
for (int i = 0; i < extractor.getAttachmentCount(); i++) {
// Get the attachment
Container.Entity entity = extractor.getEntities().get(i);
// Print a content type of the attachment
System.out.println(entity.getMediaType());
// Create a text extractor for the attachment's stream
TextExtractor attachmentExtractor = extractorFactory.createTextExtractor(entity.openStream());
// If the content type is supported
if (attachmentExtractor != null) {
// Extract a text from the attachment
System.out.println(attachmentExtractor.extractAll());
}
}
Constructor and Description |
---|
EmailTextExtractor(InputStream stream)
Initializes a new instance of the
EmailTextExtractor class. |
EmailTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance of the
EmailTextExtractor class. |
EmailTextExtractor(String fileName)
Initializes a new instance of the
EmailTextExtractor class. |
EmailTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
EmailTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
void |
extractStructured(StructuredHandler handler)
Extracts a structured text.
|
protected String |
extractText()
Extracts all characters from the current position to the end of the text extractor
and returns them as one string.
|
void |
search(SearchOptions options,
ISearchHandler handler,
ISearchEngine searchEngine,
List<String> keywords)
Searches the keywords.
|
void |
search(SearchOptions options,
ISearchHandler handler,
List<String> keywords)
Searches the keywords.
|
void |
searchWithRegex(String expression,
ISearchHandler handler,
RegexSearchOptions searchOptions)
Searches the expression.
|
dispose, getAttachmentCount, getEntities, getStream, openEntityStream, prepareLine, reset
checkDisposed, close, dispose, extractAll, extractLine, extractTextLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public EmailTextExtractor(String fileName)
Initializes a new instance of the EmailTextExtractor
class.
fileName
- The path to the file.public EmailTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the EmailTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public EmailTextExtractor(InputStream stream)
Initializes a new instance of the EmailTextExtractor
class.
stream
- The stream of the document.public EmailTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance of the EmailTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public void extractStructured(StructuredHandler handler)
Extracts a structured text.
extractStructured
in interface IStructuredExtractor
handler
- Structured text extraction handler.public void search(SearchOptions options, ISearchHandler handler, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.keywords
- A collection of words to search.public void search(SearchOptions options, ISearchHandler handler, ISearchEngine searchEngine, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.searchEngine
- An instance of the search engine.keywords
- A collection of words to search.public void searchWithRegex(String expression, ISearchHandler handler, RegexSearchOptions searchOptions)
Searches the expression.
searchWithRegex
in interface IRegexSearchable
expression
- A regular expression.handler
- An instance of the search handler.searchOptions
- Options for searching.public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions.protected String extractText()
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
extractText
in class TextExtractor
Copyright © 2019. All rights reserved.