public final class ChmFormattedTextExtractor extends TextExtractor implements IPageTextExtractor, ITextExtractorWithFormatter
Provides the formatted text extractor for chm documents.
Constructor and Description |
---|
ChmFormattedTextExtractor(InputStream stream)
Initializes a new instance of the
ChmFormattedTextExtractor class. |
ChmFormattedTextExtractor(String fileName)
Initializes a new instance of the
ChmFormattedTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
protected void |
dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
|
String |
extractPage(int pageIndex)
Extracts all characters from the page with pageIndex and returns the data as a string.
|
DocumentFormatter |
getDocumentFormatter()
Gets a
DocumentFormatter . |
int |
getPageCount()
Gets a total count of the pages.
|
List<TableOfContentsItem> |
getTableOfContents()
Gets a collection of table of contents items.
|
protected String |
prepareLine()
Returns a line of the text.
|
void |
reset()
Resets the current document.
|
void |
setDocumentFormatter(DocumentFormatter value)
Sets a
DocumentFormatter . |
checkDisposed, close, dispose, extractAll, extractLine, extractText, extractTextLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public ChmFormattedTextExtractor(String fileName)
Initializes a new instance of the ChmFormattedTextExtractor
class.
fileName
- The path to the file.public ChmFormattedTextExtractor(InputStream stream)
Initializes a new instance of the ChmFormattedTextExtractor
class.
stream
- The stream of the document.public int getPageCount()
Gets a total count of the pages.
getPageCount
in interface IPageTextExtractor
public List<TableOfContentsItem> getTableOfContents()
Gets a collection of table of contents items.
public DocumentFormatter getDocumentFormatter()
Gets a DocumentFormatter
.
getDocumentFormatter
in interface ITextExtractorWithFormatter
DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public void setDocumentFormatter(DocumentFormatter value)
Sets a DocumentFormatter
.
setDocumentFormatter
in interface ITextExtractorWithFormatter
value
- An instance of the DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public String extractPage(int pageIndex)
Extracts all characters from the page with pageIndex and returns the data as a string.
extractPage
in interface IPageTextExtractor
pageIndex
- The index of the page.public void reset()
Resets the current document.
ExtractLine
method will return the first line of the document.
reset
in class TextExtractor
protected void dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
dispose
in class TextExtractor
disposing
- A boolean true if invoked from Dispose; otherwise, false.protected String prepareLine()
Returns a line of the text.
prepareLine
in class TextExtractor
Copyright © 2018. All rights reserved.