PHT

(PHp with embedded HTml)

Introduction

As a programmer, I find myself doing more and more programs that generate output in HTML or XML (in the following, I'll use XML as a generic term for any flavor of HTML and XML itself). None of the languages I use (or know of) are quite adapted to producing this kind of output. For most programming languages, XML is just a stream of arbitrary characters going to the standard output. This made it possible to adopt XML quite fast since all that was required from our programs was to output plain characters, but in doing so we miss a lot. For all a program cares, the XML output is just an arbitrary string, its structure is completely ignored. It shouldn't be so.

XML is a structured language. So are our programs. Both have strict syntax rules. Program translators (compilers or interpreters) take care of checking the syntax of programming languages, but it is up to us to check the syntax of the XML they produce, a most error prone and tiring task. Fortunately, most of us don't even output any XML at all, we either configure ready-made packages, tie together modules and components, use template engines or occasionally use the DOM to manipulate the tree. Nevertheless, at one point or another some part of this whole package does have to produce some sort of XML. Most open source packages such as CMSs, picture galleries, wikis do not rely on any runtime support, neither runtime libraries nor template engines since such dependencies would reduce their portability; they produce XML directly. Yes, lots of XML is still produced directly and there has to be a way to do it better.

So, we have two well structured languages, one is procedural (any flavor of C, JavaScript, PHP, Perl, etc.), the other descriptive. Their blocks are quite compatible: to start with, they nest nicely within each other. If an XML block is contained within an if() block, it has to be completely within it, the boundaries of their blocks should not overlap.

Code is usually embedded into XML via processing instructions like in PHP <?php ....?> or other markings, such as <% ... %> in ASP or as plain text within an element, such as <script>. Nevertheless since most languages have print or echo instructions to output arbitrary strings to the XML stream, no structure can be forced into such a program and, to my knowledge, no code beautifier or highlighter can manage to make sense of this, they can either pretty print the code or the XML but not both.

Alternatively, other environments such as XML Stylesheets have processing instructions embedded as XML elements, but it is quite cumbersome, just the namespace prefix xsl: or whatever you choose in front of every instruction adds too much typing and the necessary use of attributes to hold instruction arguments, explicitly naming each option, as if you didn't know that the natural argument to a while instruction is a conditional, makes COBOL's verbosity look not so bad.

So, I looked at the problem the other way around, I added XML-specific instructions into an existing language. I called those instructions TAG and ATTRIBUTE and, since I find no pleasure in hitting more keys than absolutely required, I decided to shorten them to a couple of symbols: < for TAG and @ for ATTRIBUTE. I was working with PHP at the time and the @ sign was already in use in a manner which prevented my particular use, thus I switched to the & sign. Both < and & are used in PHP, but they are used in ways that a parser can tell when it is the native usage and when my extended use. And since I was daydreaming, I opted to use the ? sign as a synonym for print or echo but with a twist: since XML tags are already output via the TAG statement, the output of ? will be escaped.

Next: Sample >