public final class EpubFormattedTextExtractor extends EpubTextExtractorBase implements IHighlightExtractor, ITextExtractorWithFormatter
Provides the formatted text extractor for EPUB documents.
Extracts a line of characters from a document:
// Create a text extractor for EPUB documents
TextExtractor extractor = new EpubFormattedTextExtractor(stream);
// Extract a line of the text
String line = extractor.extractLine();
// If the line is null, then the end of the file is reached
while (line != null) {
// Print a line to the console
System.out.println(line);
// Extract another line
line = extractor.extractLine();
}
Extracts all characters from a document:
// Create a text extractor for EPUB documents
TextExtractor extractor = new EpubFormattedTextExtractor(stream);
// Extract a text
System.out.println(extractor.extractAll());
For setting a formatter DocumentFormatter
property is used.
// Create a formatted text extractor for text documents
EpubFormattedTextExtractor extractor = new EpubFormattedTextExtractor(stream);
// Set a markdown formatter for formatting
extractor.setDocumentFormatter(new MarkdownDocumentFormatter()); // all the text will be formatted as Markdown
By default a text is formatted as a plain text by Formatters.Plain.PlainDocumentFormatter
.
Constructor and Description |
---|
EpubFormattedTextExtractor(InputStream stream)
Initializes a new instance of the
EpubFormattedTextExtractor class. |
EpubFormattedTextExtractor(String fileName)
Initializes a new instance of the
EpubFormattedTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
protected String |
extractItem(String path)
Extracts a text from the document's item.
|
protected String |
extractText()
Extracts all characters from the current position to the end of the text extractor
and returns them as one string.
|
protected String |
extractTextLine()
Extracts a line of characters from the text extractor and returns the data as a string.
|
DocumentFormatter |
getDocumentFormatter()
Gets a
DocumentFormatter . |
void |
reset()
Resets the current document.
|
void |
setDocumentFormatter(DocumentFormatter value)
Sets a
DocumentFormatter . |
get_Item, getCount, openContainerItem, prepareLine
checkDisposed, close, dispose, dispose, extractAll, extractLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public EpubFormattedTextExtractor(String fileName)
Initializes a new instance of the EpubFormattedTextExtractor
class.
fileName
- The path to the file.public EpubFormattedTextExtractor(InputStream stream)
Initializes a new instance of the EpubFormattedTextExtractor
class.
stream
- The stream of the document.public DocumentFormatter getDocumentFormatter()
Gets a DocumentFormatter
.
getDocumentFormatter
in interface ITextExtractorWithFormatter
DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public void setDocumentFormatter(DocumentFormatter value)
Sets a DocumentFormatter
.
setDocumentFormatter
in interface ITextExtractorWithFormatter
value
- An instance of the DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions
.
Mode = FixedWidth
.
UnsupportedOperationException
- Mode is not FixedWith.public void reset()
Resets the current document.
ExtractLine
method will return the first line of the document.
reset
in class EpubTextExtractorBase
protected String extractText()
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
extractText
in class TextExtractor
protected String extractTextLine()
Extracts a line of characters from the text extractor and returns the data as a string.
extractTextLine
in class TextExtractor
protected String extractItem(String path)
Extracts a text from the document's item.
extractItem
in class EpubTextExtractorBase
path
- A path to the document's item.Copyright © 2018. All rights reserved.