Bringing in More of the World with Unicode

FRAMINGHAM (03/24/2000) - One of the most overlooked aspects of e-commerce is international trade. E-mail and the Web largely are based on written language, and it has taken a long time for the world to agree on how to represent the many forms of writing in a single standard. The recent release of Unicode Version 3 goes a long way toward solving the problem of too many local standards, making it easier for companies to internationalize their Internet communications.

Although Unicode has been around for almost 10 years, it has come into the limelight only recently now that the Internet has made it almost trivial to engage in trans-border communication and commerce. Sending messages to people who use the same alphabet as you has always been quite easy, but if you started to use characters outside of your "normal" alphabet, things could get pretty rocky. Each operating system had its own way of coding different character sets, and international communicators often had no idea if the person receiving the message would see it the way it looked when it was sent.

Without Unicode, creating Web pages or e-mail for non-U.S. customers is hit-and-miss because there are usually many standards for each written language, particularly the Asian ones. Unicode solves that by defining a single set of characters for all the world's written languages and describing how to use these characters in computer-based writing. Of course, it is unlikely that your company needs to create Web pages in 50 languages today. But it is good to know that when you want to expand from U.S. English to any other language, you can do so without having to guess which character styles the recipient can use.

Unicode Version 3 contains more than 10,000 new characters than Version 2.1.

More than half the new characters are additions to the Chinese/Japanese/ Korean set of ideographs, many of which are used in personal names. Some of the new characters are whole alphabets that weren't coded before, while others are rarely seen characters for alphabets already in Unicode. These characters will probably be added to the fonts used by word processors, e-mail clients and Web browsers in the coming months, if they haven't been added already.

More important, Unicode Version 3 has been reorganized and many sections rewritten to make it easier for novices to understand Unicode, as well as to see how complicated it is. Topics including compatibility with other standards, bidirectional writing such as is used in the Middle East and implementation hints are all covered with enough detail for programmers, but with clear overviews that are useful to IS managers who need to understand what they are getting into.

If the problem of too many standards had you hesitating to internationalize, you can stop waiting. Standards groups such as the Internet Engineering Task Force and World Wide Web Consortium use Unicode, and so should you. The Unicode Standard, Version 3.0 is in bookstores now, and many important parts of the standard are available on the Unicode Consortium's Web site (www.unicode.org).

Hoffman is director of the Internet Mail Consortium and the VPN Consortium. He can be reached at phoffman@imc.org.

Join the newsletter!

Error: Please check your email address.

More about Internet Engineering Task ForceInternet Mail ConsortiumUnicodeVPN ConsortiumWorld Wide Web Consortium

Show Comments