public final class SlidesTextExtractor extends SlidesTextExtractorBase implements ISearchable, IHighlightExtractor, IRegexSearchable, IStructuredExtractor, IPageTextExtractor
Provides the text extractor for presentations.
Supported formats:
.PPT | Microsoft PowerPoint Presentation |
.PPTX | Microsoft Office Open XML Presentation |
.PPS | Microsoft PowerPoint Slideshow |
.PPSX | Microsoft Office Open XML Auto-Play Presentation |
.PPSM | PowerPoint Open XML Macro-Enabled Slideshow |
.ODP | OpenDocument presentation |
Extracting a text from a presentation:
// Create a text extractor for presentations
SlidesTextExtractor extractor = new SlidesTextExtractor(stream);
// Extract a text
System.out.println(extractor.extractAll());
Extracting text by slides:
// Create a text extractor for presentations
SlidesTextExtractor extractor = new SlidesTextExtractor(stream);
// Iterate slides
for (int slideIndex = 0; slideIndex < extractor.getSlideCount(); slideIndex++) {
// Extract a text from the slide which index is slideIndex
System.out.println(extractor.extractSlide(slideIndex));
}
Constructor and Description |
---|
SlidesTextExtractor(InputStream stream)
Initializes a new instance of the
SlidesTextExtractor class. |
SlidesTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance of the
SlidesTextExtractor class. |
SlidesTextExtractor(String fileName)
Initializes a new instance of the
SlidesTextExtractor class. |
SlidesTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
SlidesTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
String |
extractPage(int pageIndex)
Extracts all characters from the page with pageIndex and returns the data as a string.
|
void |
extractStructured(StructuredHandler handler)
Extracts a structured text.
|
protected String |
extractText()
Extracts all characters from the current position to the end of the text extractor
and returns them as one string.
|
int |
getExtractMode()
Gets a value indicating the mode of text extraction.
|
int |
getPageCount()
Gets a total count of the pages.
|
void |
search(SearchOptions options,
ISearchHandler handler,
ISearchEngine searchEngine,
List<String> keywords)
Searches the keywords.
|
void |
search(SearchOptions options,
ISearchHandler handler,
List<String> keywords)
Searches the keywords.
|
void |
searchWithRegex(String expression,
ISearchHandler handler,
RegexSearchOptions searchOptions)
Searches the expression.
|
void |
setExtractMode(int value)
Sets a value indicating the mode of text extraction.
|
dispose, extractSlide, getSlideCount, nextSlide, prepareLine, reset
checkDisposed, close, dispose, extractAll, extractLine, extractTextLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public SlidesTextExtractor(String fileName)
Initializes a new instance of the SlidesTextExtractor
class.
fileName
- The path to the file.public SlidesTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the SlidesTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public SlidesTextExtractor(InputStream stream)
Initializes a new instance of the SlidesTextExtractor
class.
stream
- The stream of the document.public SlidesTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance of the SlidesTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public int getExtractMode()
Gets a value indicating the mode of text extraction.
Standard
.public void setExtractMode(int value)
Sets a value indicating the mode of text extraction.
value
- The mode of text extraction. The default is Standard
.public int getPageCount()
IPageTextExtractor
Gets a total count of the pages.
getPageCount
in interface IPageTextExtractor
public void extractStructured(StructuredHandler handler)
Extracts a structured text.
extractStructured
in interface IStructuredExtractor
handler
- Structured text extraction handler.public void search(SearchOptions options, ISearchHandler handler, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.keywords
- A collection of words to search.public void search(SearchOptions options, ISearchHandler handler, ISearchEngine searchEngine, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.searchEngine
- An instance of the search engine.keywords
- A collection of words to search.public void searchWithRegex(String expression, ISearchHandler handler, RegexSearchOptions searchOptions)
Searches the expression.
searchWithRegex
in interface IRegexSearchable
expression
- A regular expression.handler
- An instance of the search handler.searchOptions
- Options for searching.public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions.public String extractPage(int pageIndex)
IPageTextExtractor
Extracts all characters from the page with pageIndex and returns the data as a string.
extractPage
in interface IPageTextExtractor
pageIndex
- The index of the page.protected String extractText()
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
extractText
in class TextExtractor
Copyright © 2018. All rights reserved.