public final class EncodingDetector extends Object
Provides the functionality for detecting the encoding of the java.io.InputStream
.
The constructor accepts a default encoding for non-unicode encodings:
EncodingDetector detector = new EncodingDetector(java.nio.charset.Charset.forName("windows-1251"));
Detect the encoding only by BOM:
// Create an encoding detector
EncodingDetector detector = new EncodingDetector(java.nio.charset.Charset.forName("windows-1251"));
// Detect a charset
java.nio.charset.Charset charset = detector.detect(stream);
Detect the encoding only by BOM or by the content (if BOM is not presented):
EncodingDetector detector = new EncodingDetector(java.nio.charset.Charset.forName("windows-1251"));
// Detect a charset
java.nio.charset.Charset charset = detector.detect(stream, true);
If BOM is presented, it works like the previous method. But if BOM is not presented, it tries to detect encoding by the content. It uses indirect methods of the detecting, therefore, it is slower and less accurate.
Constructor and Description |
---|
EncodingDetector(Charset defaultAnsiCharset)
Initializes a new instance of the
EncodingDetector class. |
Modifier and Type | Method and Description |
---|---|
Charset |
detect(InputStream stream)
Detects the character encoding of the
stream by byte order mark (BOM). |
Charset |
detect(InputStream stream,
boolean detectByContent)
Detects the character encoding of the
stream . |
Charset |
getDefaultAnsiCharset()
Gets the character encoding that is used for ANSI encodings.
|
public EncodingDetector(Charset defaultAnsiCharset)
Initializes a new instance of the EncodingDetector
class.
defaultAnsiEncoding
- The character encoding that is used for ANSI encodings.ArgumentNullException
- defaultAnsiEncoding
is null.public Charset getDefaultAnsiCharset()
Gets the character encoding that is used for ANSI encodings.
public Charset detect(InputStream stream)
Detects the character encoding of the stream
by byte order mark (BOM).
stream
- The stream for which the character encoding must be detected.stream
or null if encoding cannot be detected.ArgumentNullException
- stream
is null.public Charset detect(InputStream stream, boolean detectByContent)
Detects the character encoding of the stream
.
stream
- The stream for which the character encoding must be detected.detectByContent
- Indicates whether to detect encoding only by byte order mark (BOM).
detectByContent
is set true) to detect
encoding by content. Detecting by content may not always detect the encoding accurately.
stream
or null if encoding cannot be detected.ArgumentNullException
- stream
is null.Copyright © 2018. All rights reserved.