public final class FictionBookFormattedTextExtractor extends TextExtractor implements IHighlightExtractor, ITextExtractorWithFormatter
Provides the formatted text extractor for FB2 (FictionBook) documents.
Extracts a line of characters from a document:
// Create a text extractor for FB2 (FictionBook) documents
TextExtractor extractor = new FictionBookFormattedTextExtractor(stream);
// Extract a line of the text
String line = extractor.extractLine();
// If the line is null, then the end of the file is reached
while (line != null) {
// Print a line to the console
System.out.println(line);
// Extract another line
line = extractor.extractLine();
}
Extracts all characters from a document:
// Create a text extractor for FB2 (FictionBook) documents
TextExtractor extractor = new FictionBookFormattedTextExtractor(stream);
// Extract a text
System.out.println(extractor.extractAll());
For setting a formatter DocumentFormatter
property is used.
// Create a formatted text extractor for text documents
FictionBookFormattedTextExtractor extractor = new FictionBookFormattedTextExtractor(stream);
// Set a markdown formatter for formatting
extractor.setDocumentFormatter(new MarkdownDocumentFormatter()); // all the text will be formatted as Markdown
By default a text is formatted as a plain text by PlainDocumentFormatter
.
Constructor and Description |
---|
FictionBookFormattedTextExtractor(InputStream stream)
Initializes a new instance of the
FictionBookFormattedTextExtractor class. |
FictionBookFormattedTextExtractor(String fileName)
Initializes a new instance of the
FictionBookFormattedTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
protected String |
extractText()
Extracts all characters from the current position to the end of the text extractor
and returns them as one string.
|
protected String |
extractTextLine()
Extracts a line of characters from the text extractor and returns the data as a string.
|
DocumentFormatter |
getDocumentFormatter()
Gets a
DocumentFormatter . |
protected String |
prepareLine()
Returns a line of the text.
|
void |
reset()
Resets the current document.
|
void |
setDocumentFormatter(DocumentFormatter value)
Sets a
DocumentFormatter . |
checkDisposed, close, dispose, dispose, extractAll, extractLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public FictionBookFormattedTextExtractor(String fileName)
Initializes a new instance of the FictionBookFormattedTextExtractor
class.
fileName
- The path to the file.UnsupportedDocumentFormatException
- File format isn't supported.public FictionBookFormattedTextExtractor(InputStream stream)
Initializes a new instance of the FictionBookFormattedTextExtractor
class.
stream
- The stream of the document.UnsupportedDocumentFormatException
- File format isn't supported.public DocumentFormatter getDocumentFormatter()
Gets a DocumentFormatter
.
getDocumentFormatter
in interface ITextExtractorWithFormatter
DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public void setDocumentFormatter(DocumentFormatter value)
Sets a DocumentFormatter
.
setDocumentFormatter
in interface ITextExtractorWithFormatter
value
- An instance of the DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions
.
Mode = FixedWidth
.
UnsupportedOperationException
- Mode is not FixedWith.public void reset()
Resets the current document.
ExtractLine
method will return the first line of the document.
reset
in class TextExtractor
protected String extractText()
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
extractText
in class TextExtractor
protected String extractTextLine()
Extracts a line of characters from the text extractor and returns the data as a string.
extractTextLine
in class TextExtractor
protected String prepareLine()
Returns a line of the text.
prepareLine
in class TextExtractor
Copyright © 2019. All rights reserved.