GSoC 2017 etetoolkit/treematcher community wiki
Treematcher is a side project of ete toolkit. It’s target is to allow ete toolkit use relax match search
using syntax similar to regular expressions.
Contents:
About the code before gsoc
Milestone 1: implement regular expression functionality
Milestone 1: Approach
Milestone 1: Features
Milestone 1: Examples of usage
Milestone 1: Set of all implemented features
About the code before gsoc
The treematcher base code was written before gsoc. It contained the main TreePattern class and helping
classes. Treematcher transforms a regular expression written in newick format to a Tree and runs a
search against a tree to find possible reuslts.
The main TreePattern class extends the etetoolkit/Tree class. It contains a recursive function used to
search the target tree (the “match” function), a function to compare individual nodes for “equality” (the
“is_local_match” function) and a function used as interface (the “find_match” function).
The “match” function is the main treematcher’s function. The “match” could use a completely
described tree (every TreePattern node has on corresponding node at target tree) and the “one or more”
intermediate nodes symbol’s functionality by considering as match every node until a match of next
level nodes is found. These features are considered as the basic functionality treematcher have (one to
one match and relax matches) and was used as helping guide to all other features implemented next.
Milestone 1
Most of the part of the first milestone was completed at the first coding period of gsoc. It’s target was
to allow treematcher describe more relaxed relationships between TreePattern tree and the target tree.
The features that was implemented are “one”, “zero or more”, “one or more”, “defined number”,
“defined numbers in a range” intermediate nodes between two nodes and “is leaf” and “is root”
shortcuts. Later the logical comparison between a node a set of nodes (of possible match) or the whole
tree ( used for maximum or minimum of a value) compared with an attribute was implemented.
Milestone 1: Approach
It was observed (not at the beginning, but soon enough) that many the most of the features, those which
are used to represent the existence of intermediate nodes, are very similar. Most of the features have a
corresponding metachearacter, other have periphrastic way be written.