are represented in four bytes each. For a file that's mostly Latin text, this effectively halves the file size from what it would be in UCS-2. However, for a file that's primarily Japanese, Chinese, or Korean, the file size can grow by 50%. For most other living languages, the file size is close to the same.
UTF-8 is probably the most broadly supported encoding of Unicode. For instance, it's how Java .class files store strings; it's the native encoding of the BeOS, and it's the default encoding an XML processor assumes unless told otherwise by a byte-order mark or an encoding declaration. Chances are pretty good that if a program tells you it's saving Unicode, it's really saving UTF-8.
Copyright © 2002 O'Reilly & Associates. All rights reserved.