public final class CellsFormattedTextExtractor extends CellsTextExtractorBase implements IHighlightExtractor, IPageTextExtractor, ITextExtractorWithFormatter
Provides the formatted text extractor for spreadsheets.
Supported formats:
.XLS | Microsoft Excel Spreadsheet |
.XLSX | Microsoft Office Open XML Workbook |
.XLSM | Microsoft Excel 2007 Macro-Enabled Workbook |
.XLSB | Microsoft Excel 2007 Binary Workbook |
.ODS | OpenDocument spreadsheet |
.CSV | Comma Separated Values text file |
Extracting a text from a spreadsheet:
// Create a formatted text extractor for spreadsheets
CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
// Extract a formatted text
System.out.println(extractor.extractAll());
Extracting by sheets:
// Create a formatted text extractor for spreadsheets
CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
// Extract a formatted text from the sheet which index is sheetIndex
System.out.println(extractor.extractSheet(sheetIndex));
Extracting the information about the sheet:
// Create a formatted text extractor for spreadsheets
CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
// Get a sheet info for sheet which index is sheetIndex
CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
System.out.println(String.format("Name: %s", sheetInfo.getName())); // sheet's name
System.out.println(String.format("Index: %d", sheetInfo.getIndex())); // sheet's index
System.out.println(String.format("Rows Count: %d", sheetInfo.getRowCount())); // a total number of the rows in the sheet
System.out.println("Columns");
// Iterate sheet's columns
for(int i = 0; i < sheetInfo.getColumnNames().size(); i++)
{
// Print a name of the column which index is i
System.out.println(String.format("%s%s",
sheetInfo.getColumnNames().get(i), i + 1 < sheetInfo.getColumnNames().size() ? ";" : ""));
}
Extracting by rows:
// Create a formatted text extractor for spreadsheets
CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
// Get a sheet info for sheet which index is sheetIndex
CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
// Extract a formatted text for the row which index is rowIndex
System.out.println(sheetInfo.extractRow(rowIndex));
Extracting only selected columns:
// Create a formatted text extractor for spreadsheets
CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
// Get a sheet info for sheet which index is sheetIndex
CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
// Extract a formatted text for the row which index is rowIndex (only A1 and C1 columns)
System.out.println(sheetInfo.extractRow(rowIndex, "A1", "C1"));
// Extract a formatted text for the entire sheet (only A1 and C1 columns)
System.out.println(sheetInfo.extractSheet("A1", "C1"));
For setting a formatter DocumentFormatter
property is used.
// Create a formatted text extractor for spreadsheets
CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
// Set a markdown formatter for formatting
extractor.setDocumentFormatter(new MarkdownDocumentFormatter()); // all the text will be formatted as Markdown
By default a text is formatted as a plain text by PlainDocumentFormatter
.
Constructor and Description |
---|
CellsFormattedTextExtractor(InputStream stream)
Initializes a new instance of the
CellsFormattedTextExtractor class. |
CellsFormattedTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance of the
CellsFormattedTextExtractor class. |
CellsFormattedTextExtractor(String fileName)
Initializes a new instance of the
CellsFormattedTextExtractor class. |
CellsFormattedTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
CellsFormattedTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
String |
extractPage(int pageIndex)
Extracts all characters from the page with pageIndex and returns the data as a string.
|
DocumentFormatter |
getDocumentFormatter()
Gets a
DocumentFormatter . |
int |
getPageCount()
Gets a total count of the pages.
|
void |
reset()
Resets the current document.
|
void |
setDocumentFormatter(DocumentFormatter value)
Sets a
DocumentFormatter . |
dispose, extractSheet, extractText, extractTextLine, getSheetCount, getSheetInfo, prepareLine, setSheetCount
checkDisposed, close, dispose, extractAll, extractLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public CellsFormattedTextExtractor(String fileName)
Initializes a new instance of the CellsFormattedTextExtractor
class.
fileName
- The path to the file.public CellsFormattedTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the CellsFormattedTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public CellsFormattedTextExtractor(InputStream stream)
Initializes a new instance of the CellsFormattedTextExtractor
class.
stream
- The stream of the document.public CellsFormattedTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance of the CellsFormattedTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public DocumentFormatter getDocumentFormatter()
Gets a DocumentFormatter
.
getDocumentFormatter
in interface ITextExtractorWithFormatter
DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public void setDocumentFormatter(DocumentFormatter value)
Sets a DocumentFormatter
.
setDocumentFormatter
in interface ITextExtractorWithFormatter
value
- An instance of the DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public int getPageCount()
IPageTextExtractor
Gets a total count of the pages.
getPageCount
in interface IPageTextExtractor
public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions
.
Mode = FixedWidth
.
UnsupportedOperationException
- Mode is not FixedWith.public void reset()
Resets the current document.
ExtractLine
method will return the first line of the document.
reset
in class CellsTextExtractorBase
public String extractPage(int pageIndex)
IPageTextExtractor
Extracts all characters from the page with pageIndex and returns the data as a string.
extractPage
in interface IPageTextExtractor
pageIndex
- The index of the page.Copyright © 2019. All rights reserved.