So, finally, we reach the part that processes the extended instructions into standard PHP. The source is located in files pht.h and pht.cpp.
I won't comment on all the code, which it already has comments within the source files, just on a couple of methods which give a good taste of how it is done.
Tree_visitor
and Tree_transform
classes are
provided by PHC. The first one
allows the full tree to be visited in an ordered way. Each has
methods from which to inherit, corresponding to each kind of node.
While Tree_visitor
allows to visit each node and modify its contents,
Tree_transform
allows you to further change the tree itself, not just the
contents of each node. I use both since in a first pass I gather
some information about each PHP function defined in the file to pre-compile
while in the second I have to restructure the tree.
The Convert_xml_elements
class is the main conversion class and
it inherits from Tree_transform
. A section of the
declarations of the class is shown here:
class Convert_xml_elements : public Tree_transform { private: enum State { ANY_TAG, START_TAG_OPEN, START_TAG_CLOSED, TAG_CLOSED, // never used, when the tag is closed, the stack is popped so it is never stored PI_OPEN, // processing instruction PI_CLOSED // never used, when the tag is closed, the stack is popped so it is never stored }; stack<State> s; ..... public: void pre_xml_element(AST_xml_element* in, AST_statement_list* out); void post_xml_element(AST_xml_element* in, AST_statement_list* out); ..... };
I'll just show the methods that process an xml_element
, that is
a <
tag
instruction. Tree_transform
provides hooks to visit the node
before visiting the nodes further down the tree and after visiting them.
In both cases, a reference to the node representing the given node is the first
argument. The second argument is not a node of the same kind but an AST_statement_list
.
This provides amazing flexibility because if nothing is copied from in
to out
, the node is deleted, if you just copy (actually, push into
the list) the in
node to the out
list, nothing changes but you can also add any other
valid statements to the out
list, either before, after or instead of the in
node.
Notice the stack<State> s;
which keeps track of the state
of the XML part. The AST tree fully represents the PHP code, but the
structure of the XML part is not fully reflected on the tree and, what is there
is lost while doing the transformation so this stack indicates the state of the
underlying XML, in particular which kind of tag is it and whether we are within
the start tag, fully within the element or out of it. Actually, when we
are out of an element, we don't get to record that since once closed, we pop the stack
and return to the previous state, so those two 'closed' states are formally
there, but never used. PI stands for 'Processing Instruction'.
void Convert_xml_elements::pre_xml_element(AST_xml_element* in, AST_statement_list* out) { // close any previous open start tag if (s.top() == START_TAG_OPEN) close_start_tag(out); // push the new state of the tags s.push(START_TAG_OPEN); // The start tag is left open since attributes might follow. out->push_back(echo("<")); print_element_name(in->xml_element_name,out); out->push_back(in); }
So, in this method we are pre-processing the node corresponding to an XML
element. First we check the top of the stack and if we see the
previous tag is still open (supposedly waiting for further attributes) we close
it. The close_start_tag
private method inserts into the out
statement list an echo ">";
and updates the state at the top of
the stack. Then we push the new state that we are about to enter and add a
statement to echo a "<". The function print_element_name
deals with the instructions to output the element name.
Once that is finished, the in
node is pushed into the out
statement list. This leaves the tree a little redundant. It
contains both the original in
node as well as the instructions that should
have replaced it. That will be taken care in the 'post' call, but
the 'post' will only be called if the node still remains there, so we just leave
it as a placeholder to ensure the post method will be called and then we take it
out.
Here is the post method.
void Convert_xml_elements::post_xml_element(AST_xml_element* in, AST_statement_list* out) { // if the statement list was empty, it is an empty element, such as <br /> or <hr /> // notice that if the statement_list originally contained xml attributes, these have been output as echoes // and taken out of the statement list. if (in->statements->size() == 0) { // if it is still open, close it if (s.top() == START_TAG_OPEN) { out->push_back(echo("/>")); s.pop(); // no need to do anything else return; } // if the statement list is not empty, then append the statements to the initial echo of the start tag. } else { out->push_back_all(in->statements); } // just to be sure, there shouldn't be any start tag open this far, meaning something like <html if (s.top() == START_TAG_OPEN) close_start_tag(out); // likewise, there should be a full tag to close, like <html> switch(s.top()) { case START_TAG_CLOSED: break; // This is fine, nothing to do case ANY_TAG: printf("\n<!-- %s[%d]: ***** Warning: there might not be a tag to close -->\n", in->get_filename()->c_str(),in->get_line_number()); break; default: printf("\n<!-- %s[%d]: ***** Error: no tag to close -->\n", in->get_filename()->c_str(),in->get_line_number()); } // So, just close the tag and pop whatever flags were there // TODO warning, if the tag name is given as a variable, the value might have changed in the statements in between // and the closing tag would not match the initial one out->push_back(echo("</")); print_element_name(in->xml_element_name,out); out->push_back(echo(">")); s.pop(); }
Remember that the <
instruction is followed by a tag name and
then a statement. The statement can be an empty statement (a single
semicolon), a simple statement or a compound statement enclosed by curly
braces. They are all represented within the AST tree as a AST_statement_list
which might be empty (size == 0
) contain a single element or more
than one. So, the first thing is to check if it was an empty
statement. If it was and the start tag is still open, we close it with a "/>"
.
That would be the case for a <br />
. Remember
that in the 'pre' method we left the in
node in the tree, now since
we learned it contains no further statements we need nothing else from it so we return after popping out from the stack the state of the element
we just closed.
If the size of the statement list is not zero, we push the statements right
behind the echos that we made in the 'pre' method. Those statements are in in->statements
,
a member of the AST_xml_element
object created automatically from
the phc.tea source. By now, these statements, if they have any extended
instructions, have already been
converted.
If the start tag is still open, it needs to be closed, before the end tag. (for reasons a little long to write here, it never happens). On
the other hand, if there is no start tag in the stack, it's an error.
The stack can also indicate a state called ANY_TAG
. This is
because a PHP function might have to close a tag opened elsewhere before it was
called. Every time the
pre-compiler finds a function definition, it sets the stack to ANY_TAG
,
because we can't know what the stack might contain when that method is called
and whatever it contains when it is defined it does not matter at all. The
pre-compiler issues a warning which, at this moment, has simply a different
text.
Finally, since we are done with this tag, we close the element by calling
print_element_name
surrounded by statements to echo "</"
,
">"
and then we pop the state from the stack.
Notice that we have pushed several echos and the statements under the tag statement but not the node representing the tag statement itself, which had been left behind in the 'pre' method so it got effectively deleted from the tree.
The private method print_element_name
is used by both the element and
attribute methods since they have the same naming conventions. If the
element xml_namespace
contains NULL
, it means there
isn't any. If there is it will check the flag ns_is_var
, if
true, it will assemble the tree nodes to echo a variable with the name given, if
it is not it will simply echo the text of the name and finally a colon. For the
tag name, it does basically the same.
void Convert_xml_elements::print_element_name(AST_xml_element_name* name,AST_statement_list* out) { if (name->xml_namespace) { if (name->ns_is_var) { out->push_back(echo(new AST_variable(new Token_variable_name(name->xml_namespace->value)))); } else { out->push_back(echo(name->xml_namespace->value)); } out->push_back(echo(":")); } if (name->is_var) { out->push_back(echo(new AST_variable(new Token_variable_name(name->tag_name->value)))); } else { out->push_back(echo(name->tag_name->value)); } }
< Previous: Building the tree |