Validation

I also wanted some more formal check on the XML produced. How could I integrate the definition of an XML file into this? An XML Schema file describes what an XML file can and cannot have. So, I added (not really in this release) one more declarative instruction, XMLSchema:

XMLSchema [<path to .xsd file>] [path <simple xPath expression> (, ....)*] [ignore case]

The XMLSchema declaration tells the pre-compiler the schema file to check against. Thus, as the tags and attributes are being parsed, the pre-compiler can check that they follow the schema. At some point I will have to add a DTD instruction to point to DTDs or rely on a DTD to XSD translator, of which there are many.

The XMLSchema declaration can be either a separate statement, usually included at the beginning of the file or, at least, before the first tag is output or as a modifier after a function declaration. Functions within a program usually generate just segments of XML, not full documents, for example a table row, thus you wouldn't expect a function to start by the root element but would, instead, want to declare that it generates a <tr> tag. You indicate that by using the path clause. This declaration serves two purposes, to check that the tags generated within the function are of the right types and that the function is used within the enclosing source where a <tr> is expected. Several xPaths can be declared since a single function can generate, for example, either a <td> or a <th> tag, and they would both be acceptable. The xPath expression will often be partial, for example, "td", "th", not a full path from the document root, since a tag might be embedded an arbitrary number of levels within optional tags.

The <path to .xsd file> may be omitted if the path is given, meaning that the global schema is used, which is usually the case for functions. A source file may look like this:

<?php

XMLSchema "my/own/html.xsd" ignore case;

function tableRow($record) XMLSchema path 'tr' {
    <tr {
        <td ? $record['fieldn'];
        // other columns
    }
}

<html {
    <body {
        &onLoad = 'setTimeout("updateClock()",1000)';
        // ... etc 

        // query the database
            <table {
            while ($record = mysql_fetch_assoc($recordSet)) {
                tableRow($record);
            }
        }
    }
}
?>

In this example, the whole PHP page is expected to follow an imaginary schema for HTML. The function tableRow is just expected to produce a single row out of the same schema ignoring upper or lower case, just as in the global declaration. That function is used further down, and it has to be used within a <table> tag or the pre-compiler will complain.

So far, the examples given deal mostly with a sequential flow through the program. Things start getting interesting with include files, conditionals and loops. What is sequential in the source file might not be so at execution time, for example, though the else part of an if() statement is sequentially after the then part source wise, they are at the same level at execution time. The number of occurrences is also important, a tag within an if() block must be declared with a minOccurs=0 attribute in the schema, while a tag within a while() loop must also have a maxOccurs=unbounded while a missing or minOccurs=1 is fine for a do --- while() loop.

I doubt it will make sense to produce more than warning messages since the final outcome of a piece of code cannot be guessed from the source. It wouldn't make sense to abort the pre-compiler because a tag produced within a while() loop was not declared with a minOccurs=0 since you might already know that the loop will always execute at least once.

Of course, none of this checking is possible if the tag or attribute name is given as an expression, since its value can seldom be predicted by analyzing the source code, one more reason to discourage generic expressions on tag names.

It is often the practice to have a couple of functions or small include files to generate the header and trailer of a document. If it is HTML, a header() function or include file would provide the beginning of the document up to the <body> tag, while a footer() function would take from the </body> to the end of the document. These two functions would violate the block structure. Nevertheless, solving this problem is not hard, either you call a function which provides the enclosing sections, passing a function providing the body contents as a callback function or, if programming OO-style, you would inherit from a Document class extending the bodyContents() method. In either case, the block structure is preserved.

< Previous: Development

Up

Next: Conclusion >