Building the tree

The PHC package contains a program (maketea) to automatically create the Abstract Syntax Tree (AST) out of a definition given in a phc.tea file.

To the definition of a statement, I added the line in red, to have the AST nodes for the new statements:

statement ::=
     if | while | do | for | foreach 
   | switch | break | continue | return
   | static_declaration
   | unset | declare | try | throw | eval_expr 
   | xml_element | xml_element_attribute | escaped_print | xml_processing_instruction
   ;

Notice that these definitions only state what data has to be stored in the tree, it indicates node dependencies and repetitions, not the actual syntax of the statements, which is given in the parser. Thus, in the following:

xml_element ::= xml_element_name statement*;
xml_element_attribute ::= xml_element_name  expr;
escaped_print ::= expr*;
xml_processing_instruction ::= xml_element_name statement*;
xml_element_name ::=   ns_is_var:"$"? xml_namespace:TAG_NAME?  is_var:"$"? TAG_NAME;

an xml_element is composed of an xml_element_name and a list (possibly empty) of statements. This shows what nodes or lists of nodes have to be available and how they relate to each other.

Most nodes require an xml_element_name which is composed of two parts, an optional namespace and the actual name.  Either can be a variable or a literal thus, for each one there is a flag (ns_is_var and is_var) that indicates the presence of a "$" sign, which is the PHP prefix for variables.  Both names are of type TAG_NAME, the second takes that as a name, the first is of the same type but will be called xml_namespace.  The later can be optional  (? symbol which indicates it can take a NULL value) while the second cannot.  The maketea utility assumes that uppercase identifiers are tokens, so a node of type TAG_NAME will be created.

In order to operate on these nodes in a more generic way, they can be added to the lists of generic node types. Those that are full statements and can have full comments in separate lines are added to commented_node, those that are only a part of full statements are simply nodes and a TAG_NAME is, of course, an identifier. This is done in the following section with my additions in red.

node ::= 
	  php_script | class_mod | signature 
	| method_mod | formal_parameter | type | attr_mod 
	| directive | list_element | variable_name | target
	| array_elem | method_name | actual_parameter | class_name 
	| commented_node | expr | identifier 
	| formal_parameter* | directive* | array_elem* | actual_parameter* 
	| INTERFACE_NAME* | list_element* | expr*
	| xml_element_name
	;

commented_node ::= 
	  member | statement | interface_def | class_def | switch_case | catch 
	| interface_def* | class_def* | member* | statement* | switch_case* | catch*   
	| xml_element | xml_element_attribute | escaped_print | xml_processing_instruction
	;

identifier ::=
	  INTERFACE_NAME | CLASS_NAME | METHOD_NAME | VARIABLE_NAME 
	| DIRECTIVE_NAME | CAST | OP | CONSTANT_NAME
	| TAG_NAME
	;  

To process the phc.tea file, Haskell, a functional programming language, is required. This is not in popular Linux distributions, though it is available as an extra, and the build process might give you an error if it doesn't find it, after all, the public interface to PHC is via the plug-ins which are meant to process standard PHP, not to add extended instructions to it.

< Previous: Parsing

Up

Next: Modifying the tree >