Unicode Common to Latest Scripting Language

SAN FRANCISCO (05/10/2000) - It takes a scorecard to keep track of all the version numbers in computing these days. There are full-time consultants who do nothing but explain the differences between Sun Microsystems Inc.'s OS and Solaris, OSF/1 and Digital Unix, Tru64 Unix and VMS, and the many other flavors of Microsoft Corp.'s ActiveX DLLs and Windows proliferation. Open source projects haven't had the luxury of generating so much confusion, but even with the series of relatively straightforward Perl and Python versions it pays to speak precisely.

The Perl community generally regards Perl 4, Perl 5, Perl 6, and so on, as the language's major versions. For example, according to principal developer Chip Salzenburg, Perl 6 will be a complete rewrite. A significant degree of incompatibility exists between major versions -- the developers do what's possible to maintain compatibility, but no one is surprised if scripts must be touched up to make a move from 4 to 5, or from 5 to 6.

Within a major version, there's more complexity dealing with maintained versions, developer versions, and other subtleties than a single explanation can encompass. The main point, though, is that 5.000, 5.001, 5.002, 5.003, and 5.004 were released as "stable" in the last few years.

The follow-up to 5.004 is 5.6.0, which practitioners informally call 5.6.

Version number 5.6.0 conforms to the tripartite numbering scheme that Linux has conditioned open source consumers to expect. It is characteristic of Perl that 5.6.0 doesn't merely have a different number; it also includes a new version-number computational interface, so it's easy for developers to compare version 5.6.0 with version 5.004. In fact, the interface is considerably more general and cogently specified than this small example can convey. In a development world where many tools seem capricious about compatibility, this is the sort of amenity that makes fanatics of many Perl users.

Python is even more conservative, at least cosmetically. Python source written in the early '90s often works with no changes in the latest Python releases.

Python's numbering hints at this gentle evolution -- alpha versions of 1.6 just became available at the end of last winter, with a final scheduled for the first of June.

The deliberate pace of Python's official version numbers hides intense ferment below the surface. Even while Python creator Guido van Rossum methodically prepares official releases, part of his time is devoted to a major upgrade. The in-joke of the day asks whether Pythoneers will refer to this planned version as Python 2, Python 2000, or Python 3000. Perl development focuses all its attention on the improvement of a single, carefully maintained set of sources.

In contrast, Python's invention has spawned at least six loosely coupled implementations of Python, which occasionally bifurcate and rejoin. By the time an idea makes its appearance in van Rossum's master sources, it might have matured during a couple of years of experimentation in one of these laboratories of alternative implementation.

Where's the beef?

The new versions hold many, many benefits for Perl and Python programmers. Perl 5.6 includes a compiler that's been needed for years, while Python 1.6 builds in knowledge about memory-mapped files and zip archives. Click on the links in the Resources below to learn the hundreds of improvements and corrections in these latest releases.

One change in particular is likely to affect all users: Perl 5.6.0 and Python 1.6 (1.6 alpha 2, to be precise about the latter) both support Unicode. Over the last quarter century most computer specifications have relied on the English alphabet -- Unicode is a standard that promises to make it equally easy to sort, select, print, and manage texts in all other human languages.

Like most new technologies, Unicode is both more and less than this summary promises. Unicode falls short of its ambition and already requires patching to do many tasks for people who work with Hindi or Korean. At the same time, Unicode is an enormous advance compared to previous encodings, and because XML mandates Unicode for its expression, thousands of projects with no explicit internationalization component are incorporating Unicode by way of XML.

Scripting languages have a complex relation to XML and Unicode because scripts tend to be minimal in size and computational load. Applications with Unicode and/or XML tend to fill quite a bit more memory than those of previous generations, and it's been difficult for scripting languages to take on this extra weight. On the other hand, scripting languages aim at expressiveness and power, and it's much easier for the Perl, Python, or Tcl interpreter to handle Unicode correctly than to teach all the individual Perl, Python, and Tcl developers how to write their own Unicode-implementing routines. An amusing outcome of Perl's latest upgrades is that Perl can now express the actual version numbers, such as 5.6.0, in the characters of any human language, and they'll sort and display correctly.

It's also telling that Tcl, which usually shows the most concern for the compactness required of an embedded language, was the first of these languages to support Unicode fully, which it did in its 8.1 release last year.

Unicode and the internationalization it facilitates are here to stay. With the latest wave of scripting language releases, it's time to learn how to exploit Unicode in your own programming.

Upcoming installments of Regular Expressions will show simple exercises you can do with each of these languages to begin practicing your Unicode technique.

We're also working on introductions to graphical user interface (GUI) programming and principles of network management, a look at why there aren't enough books on scripting, and much more. See you then.

About the author

Cameron Laird and Kathryn Soraiz manage their own software consultancy, Phaseit, from just outside Houston, Texas.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about MicrosoftPhaseitSun MicrosystemsUnicode

Show Comments