Skip to content

Jaspersoft/encoding.js

 
 

Repository files navigation

encoding.js

Build Status

Converts character encoding in JavaScript.

README(Japanese)

Installation

In Browser:

<script src="encoding.js"></script>

or

<script src="encoding.min.js"></script>

Object Encoding will be defined in the global scope.

Conversion and detection for the Array (like Array object).

In Node.js:

encoding.js is published by module name of encoding-japanese in npm.

npm install encoding-japanese
var encoding = require('encoding-japanese');

Each methods are also available for the Buffer in Node.js.

bower:

bower install encoding-japanese

Convert character encoding (convert):

  • {Array.<number>|string} Encoding.convert ( data, to_encoding [, from_encoding ] )
    Converts character encoding.
    @param {Array.<number>|TypedArray|Buffer|string} data The target data.
    @param {(string|Object)} to_encoding The encoding name of conversion destination.
    @param {(string|Array.<string>)=} [from_encoding] The encoding name of source or 'AUTO'.
    @return {Array|string} Return the converted array/string.
// Convert character encoding to Shift_JIS from UTF-8.
var utf8Array = new Uint8Array(...) or [...] or Array(...) or Buffer(...);
var sjisArray = Encoding.convert(utf8Array, 'SJIS', 'UTF8');

// Convert character encoding by automatic detection (AUTO detect).
var sjisArray = Encoding.convert(utf8Array, 'SJIS');
// or  
var sjisArray = Encoding.convert(utf8Array, 'SJIS', 'AUTO');

// Detect the character encoding.
// The return value be one of the "Available Encodings" below.
var detected = Encoding.detect(utf8Array);
if (detected === 'UTF8') {
  console.log('Encoding is UTF-8');
}
Available Encodings:
  • 'UTF32' (detect only)
  • 'UTF16'
  • 'UTF16BE'
  • 'UTF16LE'
  • 'BINARY' (detect only)
  • 'ASCII' (detect only)
  • 'JIS'
  • 'UTF8'
  • 'EUCJP'
  • 'SJIS'
  • 'UNICODE' (JavaScript Unicode Array)

Note: UNICODE is an array that has a value of String.charCodeAt() in JavaScript.
(Each value in the array possibly has a number greater than 256.)

Specify the Object argument
var sjisArray = Encoding.convert(utf8Array, {
  to: 'SJIS', // to_encoding
  from: 'UTF8' // from_encoding
});

Readability improves by passing an object to the second argument.

Specify the string argument and 'type' option
var utf8String = 'ã\u0081\u0093ã\u0082\u0093ã\u0081«ã\u0081¡ã\u0081¯';
var unicodeString = Encoding.convert(utf8String, {
  to: 'UNICODE',
  from: 'UTF8',
  type: 'string' // Specify 'string' type. (Return as string)
});
console.log(unicodeString); // こんにちは

Following 'type' options are available:

  • 'string': Return as string.
  • 'arraybuffer': Return as ArrayBuffer.
  • 'array': Return as Array (default).
Specify BOM in UTF-16

It's possible to add the UTF16 BOM by specifying the bom option for conversion.

var utf16Array = Encoding.convert(utf8Array, {
  to: 'UTF16', // to_encoding
  from: 'UTF8', // from_encoding
  bom: true // With BOM
});

The byte order of UTF16 is big-endian by default.

Specify the 'LE' for the bom option if you want to convert as little-endian.

var utf16leArray = Encoding.convert(utf8Array, {
  to: 'UTF16', // to_encoding
  from: 'UTF8', // from_encoding
  bom: 'LE' // With BOM (little-endian)
});

You can specify UTF16LE or UTF16BE if the BOM is not required.

var utf16beArray = Encoding.convert(utf8Array, {
  to: 'UTF16BE',
  from: 'UTF8'
});

Note: UTF16, UTF16BE and UTF16LE are not JavaScript internal encodings, they are a byte arrays.

Detect character encoding (detect):

  • {string|boolean} Encoding.detect ( data [, encodings ] )
    Detect character encoding.
    @param {Array.<number>|TypedArray|string} data Target data
    @param {(string|Array.<string>)} [encodings] The encoding name that to specify the detection.
    @return {string|boolean} Return the detected character encoding, or false.
// Detect character encoding automatically. (AUTO detect).
var detected = Encoding.detect(utf8Array);
if (detected === 'UTF8') {
  console.log('Encoding is UTF-8');
}

// Detect character encoding by specific encoding name.
var isSJIS = Encoding.detect(sjisArray, 'SJIS');
if (isSJIS) {
  console.log('Encoding is SJIS');
}
URL Encode/Decode:
  • {string} Encoding.urlEncode ( data )
    URL(percent) encode.
    @param {Array.<number>|TypedArray} data Target data.
    @return {string} Return the encoded string.

  • {Array.<number>} Encoding.urlDecode ( string )
    URL(percent) decode.
    @param {string} string Target data.
    @return {Array.<number>} Return the decoded array.

// URL encode to an array that has character code.
var sjisArray = [
  130, 177, 130, 241, 130, 201, 130, 191, 130, 205, 129,
  65, 130, 217, 130, 176, 129, 153, 130, 210, 130, 230
];

var encoded = Encoding.urlEncode(sjisArray);
console.log(encoded);
// output:
// '%82%B1%82%F1%82%C9%82%BF%82%CD%81A%82%D9%82%B0%81%99%82%D2%82%E6'

var decoded = Encoding.urlDecode(encoded);
console.log(decoded);
// output: [
//   130, 177, 130, 241, 130, 201, 130, 191, 130, 205, 129,
//    65, 130, 217, 130, 176, 129, 153, 130, 210, 130, 230
// ]
Base64 Encode/Decode:
  • {string} Encoding.base64Encode ( data )
    Base64 encode.
    @param {Array.<number>|TypedArray} data Target data.
    @return {string} Return the Base64 encoded string.

  • {Array.<number>} Encoding.base64Decode ( string )
    Base64 decode.
    @param {string} string Target data.
    @return {Array.<number>} Return the Base64 decoded array.

var sjisArray = [
  130, 177, 130, 241, 130, 201, 130, 191, 130, 205
];
var encoded = Encoding.base64Encode(sjisArray);
console.log(encoded); // 'grGC8YLJgr+CzQ=='

var decoded = Encoding.base64Decode(encoded);
console.log(decoded);
// [130, 177, 130, 241, 130, 201, 130, 191, 130, 205]

Example:

Example using the XMLHttpRequest and Typed arrays (Uint8Array):

This sample reads the text file written in Shift_JIS as binary data, and displays a string that is converted to Unicode by Encoding.convert.

var req = new XMLHttpRequest();
req.open('GET', '/my-shift_jis.txt', true);
req.responseType = 'arraybuffer';

req.onload = function (event) {
  var buffer = req.response;
  if (buffer) {
    // Shift_JIS Array
    var sjisArray = new Uint8Array(buffer);

    // Convert encoding to UNICODE (JavaScript Unicode Array).
    var unicodeArray = Encoding.convert(sjisArray, {
      to: 'UNICODE',
      from: 'SJIS'
    });

    // Join to string.
    var unicodeString = Encoding.codeToString(unicodeArray);
    console.log(unicodeString);
  }
};

req.send(null);
Convert encoding for file using the File APIs:

Reads file using the File APIs.
Detect file encoding and convert to Unicode, and display it.

<input type="file" id="file">
<div id="encoding"></div>
<textarea id="result" rows="5" cols="80"></textarea>

<script>
function onFileSelect(event) {
  var file = event.target.files[0];

  var reader = new FileReader();
  reader.onload = function(e) {
    var codes = new Uint8Array(e.target.result);
    var encoding = Encoding.detect(codes);
    document.getElementById('encoding').textContent = encoding;

    // Convert encoding to unicode
    var unicodeString = Encoding.convert(codes, {
      to: 'unicode',
      from: encoding,
      type: 'string'
    });
    document.getElementById('result').value = unicodeString;
  };

  reader.readAsArrayBuffer(file);
}

document.getElementById('file').addEventListener('change', onFileSelect, false);
</script>

Demo

Example of the character encoding conversion:
var eucjpArray = [
  164, 179, 164, 243, 164, 203, 164, 193, 164, 207, 161,
  162, 164, 219, 164, 178, 161, 249, 164, 212, 164, 232
];

var utf8Array = Encoding.convert(eucjpArray, {
  to: 'UTF8',
  from: 'EUCJP'
});
console.log( utf8Array );
// output: [
//   227, 129, 147, 227, 130, 147, 227, 129, 171,
//   227, 129, 161, 227, 129, 175, 227, 128, 129,
//   227, 129, 187, 227, 129, 146, 226, 152, 134,
//   227, 129, 180, 227, 130, 136
// ]
//   => 'こんにちは、ほげ☆ぴよ'
Example of converting a character code by automatic detection (Auto detect):
var sjisArray = [
  130, 177, 130, 241, 130, 201, 130, 191, 130, 205, 129,
   65, 130, 217, 130, 176, 129, 153, 130, 210, 130, 230
];
var unicodeArray = Encoding.convert(sjisArray, {
  to: 'UNICODE',
  from: 'AUTO'
});
// codeToString is a utility method that Joins a character code array to string.
console.log( Encoding.codeToString(unicodeArray) );
// output: 'こんにちは、ほげ☆ぴよ'

Utilities

  • {string} Encoding.codeToString ( {Array.<number>|TypedArray} data )
    Joins a character code array to string.

  • {Array.<number>} Encoding.stringToCode ( {string} string )
    Splits string to an array of character codes.

Japanese Zenkaku/Hankaku

  • {Array.<number>|string} Encoding.toHankakuCase ( {Array.<number>|string} data )
    Convert the ascii symbols and alphanumeric characters to the zenkaku symbols and alphanumeric characters.

  • {Array.<number>|string} Encoding.toZenkakuCase ( {Array.<number>|string} data )
    Convert to the zenkaku symbols and alphanumeric characters from the ascii symbols and alphanumeric characters.

  • {Array.<number>|string} Encoding.toHiraganaCase ( {Array.<number>|string} data )
    Convert to the zenkaku hiragana from the zenkaku katakana.

  • {Array.<number>|string} Encoding.toKatakanaCase ( {Array.<number>|string} data )
    Convert to the zenkaku katakana from the zenkaku hiragana.

  • {Array.<number>|string} Encoding.toHankanaCase ( {Array.<number>|string} data )
    Convert to the hankaku katakana from the zenkaku katakana.

  • {Array.<number>|string} Encoding.toZenkanaCase ( {Array.<number>|string} data )
    Convert to the zenkaku katakana from the hankaku katakana.

  • {Array.<number>|string} Encoding.toHankakuSpace ({Array.<number>|string} data )
    Convert the em space(U+3000) to the single space(U+0020).

  • {Array.<number>|string} Encoding.toZenkakuSpace ( {Array.<number>|string} data )
    Convert the single space(U+0020) to the em space(U+3000).

Demo

License

MIT

About

Converts character encoding in JavaScript.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%