public final class SlidesFormattedTextExtractor extends SlidesTextExtractorBase implements IHighlightExtractor, IPageTextExtractor, ITextExtractorWithFormatter
Provides the formatted text extractor for presentations.
Supported formats:
.PPT | Microsoft PowerPoint Presentation |
.PPTX | Microsoft Office Open XML Presentation |
.PPS | Microsoft PowerPoint Slideshow |
.PPSX | Microsoft Office Open XML Auto-Play Presentation |
.PPSM | PowerPoint Open XML Macro-Enabled Slideshow |
.ODP | OpenDocument presentation |
Extracting a text from a presentation:
// Create a formatted text extractor for presentations
SlidesFormattedTextExtractor extractor = new SlidesFormattedTextExtractor(stream);
// Extract a formatted text
System.out.println(extractor.extractAll());
Extracting text by slides:
// Create a formatted text extractor for presentations
SlidesFormattedTextExtractor extractor = new SlidesFormattedTextExtractor(stream);
// Iterate slides
for (int slideIndex = 0; slideIndex < extractor.getSlideCount(); slideIndex++) {
// Extract a formatted text from the slide which index is slideIndex
System.out.println(extractor.extractSlide(slideIndex));
}
For setting a formatter DocumentFormatter
property is used.
// Create a formatted text extractor for presentations
SlidesFormattedTextExtractor extractor = new SlidesFormattedTextExtractor(stream);
// Set a markdown formatter for formatting
extractor.setDocumentFormatter(new MarkdownDocumentFormatter()); // all the text will be formatted as Markdown
By default a text is formatted as a plain text by Formatters.Plain.PlainDocumentFormatter
.
Constructor and Description |
---|
SlidesFormattedTextExtractor(InputStream stream)
Initializes a new instance
of the
SlidesFormattedTextExtractor class. |
SlidesFormattedTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance
of the
SlidesFormattedTextExtractor class. |
SlidesFormattedTextExtractor(String fileName)
Initializes a new instance of the
SlidesFormattedTextExtractor class. |
SlidesFormattedTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
SlidesFormattedTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
String |
extractPage(int pageIndex)
Extracts all characters from the page with pageIndex and returns the data as a string.
|
protected String |
extractText()
Extracts all characters from the current position to the end of the text extractor
and returns them as one string.
|
protected String |
extractTextLine()
Extracts a line of characters from the text extractor and returns the data as a string.
|
DocumentFormatter |
getDocumentFormatter()
Gets a
DocumentFormatter . |
int |
getPageCount()
Gets a total count of the pages.
|
void |
reset()
Resets the current document.
|
void |
setDocumentFormatter(DocumentFormatter value)
Sets a
DocumentFormatter . |
dispose, extractSlide, getSlideCount, nextSlide, prepareLine
checkDisposed, close, dispose, extractAll, extractLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public SlidesFormattedTextExtractor(String fileName)
Initializes a new instance of the SlidesFormattedTextExtractor
class.
fileName
- The path to the file.public SlidesFormattedTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the SlidesFormattedTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public SlidesFormattedTextExtractor(InputStream stream)
Initializes a new instance
of the SlidesFormattedTextExtractor
class.
stream
- The stream of the document.public SlidesFormattedTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance
of the SlidesFormattedTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public DocumentFormatter getDocumentFormatter()
Gets a DocumentFormatter
.
getDocumentFormatter
in interface ITextExtractorWithFormatter
DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public void setDocumentFormatter(DocumentFormatter value)
Sets a DocumentFormatter
.
setDocumentFormatter
in interface ITextExtractorWithFormatter
value
- An instance of the DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public int getPageCount()
IPageTextExtractor
Gets a total count of the pages.
getPageCount
in interface IPageTextExtractor
public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions
.
Mode = FixedWidth
.
UnsupportedOperationException
- Mode is not FixedWith.public void reset()
Resets the current document.
ExtractLine
method will return the first line of the document.
reset
in class SlidesTextExtractorBase
public String extractPage(int pageIndex)
IPageTextExtractor
Extracts all characters from the page with pageIndex and returns the data as a string.
extractPage
in interface IPageTextExtractor
pageIndex
- The index of the page.protected String extractText()
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
extractText
in class TextExtractor
protected String extractTextLine()
Extracts a line of characters from the text extractor and returns the data as a string.
extractTextLine
in class TextExtractor
Copyright © 2018. All rights reserved.