public final class CellsTextExtractor extends CellsTextExtractorBase implements ISearchable, IHighlightExtractor, IRegexSearchable, IStructuredExtractor, IPageTextExtractor, IDocumentContentExtractor, IFastTextExtractor
Provides the text extractor for spreadsheets.
Supported formats:
.XLS | Microsoft Excel Spreadsheet |
.XLSX | Microsoft Office Open XML Workbook |
.XLSM | Microsoft Excel 2007 Macro-Enabled Workbook |
.XLSB | Microsoft Excel 2007 Binary Workbook |
.ODS | OpenDocument spreadsheet |
.CSV | Comma Separated Values text file |
Extracting a text from a spreadsheet:
// Create a text extractor for spreadsheets
CellsTextExtractor extractor = new CellsTextExtractor(stream);
// Extract a text
System.out.println(extractor.extractAll());
Extracting by sheets:
// Create a text extractor for spreadsheets
CellsTextExtractor extractor = new CellsTextExtractor(stream);
// Extract a text from the sheet which index is sheetIndex
System.out.println(extractor.extractSheet(sheetIndex));
Extracting the information about the sheet:
// Create a text extractor for spreadsheets
CellsTextExtractor extractor = new CellsTextExtractor(stream);
// Get a sheet info for sheet which index is sheetIndex
CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
System.out.println(String.format("Name: %s", sheetInfo.getName())); // sheet's name
System.out.println(String.format("Index: %d", sheetInfo.getIndex())); // sheet's index
System.out.println(String.format("Rows Count: %d", sheetInfo.getRowCount())); // a total number of the rows in the sheet
System.out.println("Columns");
// Iterate sheet's columns
for(int i = 0; i<sheetInfo.getColumnNames().size(); i++)
{
// Print a name of the column which index is i
System.out.println(String.format("%d %s",
sheetInfo.getColumnNames().get(i),
i + 1 < sheetInfo.getColumnNames().size()? ";" : ""));
}
Extracting by rows:
// Create a text extractor for spreadsheets
CellsTextExtractor extractor = new CellsTextExtractor(stream);
// Get a sheet info for sheet which index is sheetIndex
CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
// Extract a text for the row which index is rowIndex
System.out.println(sheetInfo.extractRow(rowIndex));
Extracting only the selected columns:
// Create a text extractor for spreadsheets
CellsTextExtractor extractor = new CellsTextExtractor(stream);
// Get a sheet info for sheet which index is sheetIndex
CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
// Extract a text for the row which index is rowIndex (only A1 and C1 columns)
System.out.println(sheetInfo.extractRow(rowIndex, "A1", "C1"));
// Extract a text for the entire sheet (only A1 and C1 columns)
System.out.println(sheetInfo.extractSheet("A1", "C1"));
Constructor and Description |
---|
CellsTextExtractor(InputStream stream)
Initializes a new instance of the
CellsTextExtractor class. |
CellsTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance of the
CellsTextExtractor class. |
CellsTextExtractor(String fileName)
Initializes a new instance of the
CellsTextExtractor class. |
CellsTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
CellsTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
protected void |
dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
String |
extractPage(int pageIndex)
Extracts all characters from the page with pageIndex and returns the data as a string.
|
String |
extractSheet(int sheetIndex)
Extracts all characters from the sheet with
sheetIndex and returns the data as a string. |
void |
extractStructured(StructuredHandler handler)
Extracts a structured text.
|
protected String |
extractText()
Extracts all characters from the current position to the end of the text extractor
and returns them as one string.
|
protected String |
extractTextLine()
Extracts a line of characters from the text extractor and returns the data as a string.
|
DocumentContent |
getDocumentContent()
Gets an access to the document's content.
|
int |
getExtractMode()
Gets a value indicating the mode of text extraction.
|
int |
getPageCount()
Gets a total count of the pages.
|
void |
search(SearchOptions options,
ISearchHandler handler,
ISearchEngine searchEngine,
List<String> keywords)
Searches the keywords.
|
void |
search(SearchOptions options,
ISearchHandler handler,
List<String> keywords)
Searches the keywords.
|
void |
searchWithRegex(String expression,
ISearchHandler handler,
RegexSearchOptions searchOptions)
Searches the expression.
|
void |
setExtractMode(int value)
Sets a value indicating the mode of text extraction.
|
getSheetCount, getSheetInfo, prepareLine, reset, setSheetCount
checkDisposed, close, dispose, extractAll, extractLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public CellsTextExtractor(String fileName)
Initializes a new instance of the CellsTextExtractor
class.
fileName
- The path to the file.public CellsTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the CellsTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public CellsTextExtractor(InputStream stream)
Initializes a new instance of the CellsTextExtractor
class.
stream
- The stream of the document.public CellsTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance of the CellsTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public DocumentContent getDocumentContent()
Gets an access to the document's content.
getDocumentContent
in interface IDocumentContentExtractor
DocumentContent
class.public int getExtractMode()
Gets a value indicating the mode of text extraction.
getExtractMode
in interface IFastTextExtractor
Standard
.public void setExtractMode(int value)
Sets a value indicating the mode of text extraction.
setExtractMode
in interface IFastTextExtractor
value
- The mode of text extraction. The default is Standard
.public int getPageCount()
IPageTextExtractor
Gets a total count of the pages.
getPageCount
in interface IPageTextExtractor
public String extractSheet(int sheetIndex)
Extracts all characters from the sheet with sheetIndex
and returns the data as a string.
extractSheet
in class CellsTextExtractorBase
sheetIndex
- The index of the sheet.public void extractStructured(StructuredHandler handler)
Extracts a structured text.
extractStructured
in interface IStructuredExtractor
handler
- Structured text extraction handler.public void search(SearchOptions options, ISearchHandler handler, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.keywords
- A collection of words to search.public void search(SearchOptions options, ISearchHandler handler, ISearchEngine searchEngine, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.searchEngine
- An instance of the search engine.keywords
- A collection of words to search.public void searchWithRegex(String expression, ISearchHandler handler, RegexSearchOptions searchOptions)
Searches the expression.
searchWithRegex
in interface IRegexSearchable
expression
- A regular expression.handler
- An instance of the search handler.searchOptions
- Options for searching.public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions.public String extractPage(int pageIndex)
IPageTextExtractor
Extracts all characters from the page with pageIndex and returns the data as a string.
extractPage
in interface IPageTextExtractor
pageIndex
- The index of the page.protected void dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
dispose
in class CellsTextExtractorBase
disposing
- A boolean true if invoked from Dispose; otherwise, false.protected String extractText()
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
extractText
in class CellsTextExtractorBase
protected String extractTextLine()
CellsTextExtractorBase
Extracts a line of characters from the text extractor and returns the data as a string.
extractTextLine
in class CellsTextExtractorBase
Copyright © 2019. All rights reserved.