Dec 152010
 

Introduction

I know this doesn’t come up nearly as often as other topics but writing an interpreter (a.k.a. a compiler) can be a very useful concept depending on what exactly you need to do with the input to your program. For example, what if we had a fairly strict data store but we for some reason wanted to access it using something like SQL? We’d have to parse the SQL statements and then find a way to mogrify it until we had something that would allow us to get the data we wanted. There is an easier way involving a little parsing theory. For the purposes of this discussion I’m assuming you are at least somewhat familiar with Context Free Grammars.

Creating the Grammar

The first step to creating a parser (especially when using a parser generator) is to find or craft a definition for the grammar. The grammar we’ll use as an example is the expression grammar from tiny basic. This simple grammar is safe for LR or LL parsing (which is important if you look at a common definition of a language like SQL).

Tiny Basic Expression Grammar

expression ::= (+|-|ε) term ((+|-) term)*
term ::= factor ((*|/) factor)*
factor ::= number | (expression)

Enter pyparsing

To create a very simple and extensible LL parser I’ve recently stumbled upon pyparsing. This simple four production grammar expands to the following pyparsing implementation:

[sourcecode language="python" wraplines="false"]
expr = Forward()
factor = ( Word(nums) | Group(Suppress(‘(‘) + expr + Suppress(‘)’)) )
term = Group(factor + ZeroOrMore((Literal(‘*’)|Literal(‘/’)) + factor))
expr << Group(Optional(Literal(‘-’)|Literal(‘+’)) + term + ZeroOrMore((Literal(‘-’)|Literal(‘+’)) + term))
[/sourcecode]

This allows us to turn sentences such as ‘5+5*6/3-(47+56)*34‘ into an easy to work with list such as: ‘[[['5'], ‘+’, ['5', '*', '6', '/', '3'], ‘-’, [[[['47'], ‘+’, ['56']]], ‘*’, ’34′]]]‘. There are probably improvements that could be made to this parser so that it auto-collapses expressions and other fun handlers but for the purposes of a simple grammar this will suffice.

Calling the grammar after defining it is a very simple process: `expr.parseString(’5+5*6/3-(47+56)*34′)`.

Testing Parsers an Easier Way

Sure unit testing should be done (and parsers lend themselves to unit tests very well) but there’s something satisfying about seeing your sentences get parsed out in real time. The obvious answer is, “create a mini-shell like environment.” Python also makes this process extremely simple and only requires a few lines of code to get a basically functional shell for your parser (complete with history):

[sourcecode language="python" wraplines="false"]
import rlcompleter
import readline
import so

if not os.access(".history", os.F_OK): open(".history", "w").close()
readline.read_history_file(".history")
buffer = ""

while True:
try: line = raw_input(pycolorize.light_blue("BASIC$ "))
except EOFError:
readline.write_history_file(".history")
print
break

if line.lower() == "exit" or line.lower() == "quit":
readline.write_history_file(".history")
break

buffer += line
result = ACTION_ON_BUFFER
buffer = ""
[/sourcecode]

Putting It Together

The complete script for reference purposes:

[sourcecode language="python" wraplines="false"]
import rlcompleter
import readline
import os

from pyparsing import *
import pprint

if not os.access(".history", os.F_OK): open(".history", "w").close()
readline.read_history_file(".history")

try:
import pycolorize
except:
sys.path.append(os.path.join(os.path.dirname(__file__), "vendor", "pycolorize"))
import pycolorize

class ExpressionParser(object):
def __init__(self):
self._expr = Forward()
factor = ( Word(nums) | Group(Suppress(‘(‘) + self._expr + Suppress(‘)’)) )
term = Group(factor + ZeroOrMore((Literal(‘*’)|Literal(‘/’)) + factor))
self._expr << Group(Optional(Literal(‘-’)|Literal(‘+’)) + term + ZeroOrMore((Literal(‘-’)|Literal(‘+’)) + term))

def _calculate(self, l):
while any([ isinstance(x, list) for x in l]):
for n,i in enumerate(l):
if isinstance(i, list): l[n] = self._calculate(i)
return str(eval(" ".join(l)))

def __call__(self, string):
return self._calculate(self._expr.parseString(string).asList())

buffer = ""

print pycolorize.green("Enter your SQL commands to tokenize:")
print pycolorize.green("Enter a blank line to exit.")

while True:
try: line = raw_input(pycolorize.light_blue("BASIC$ "))
except EOFError:
readline.write_history_file(".history")
print
break

if line.lower() == "exit" or line.lower() == "quit":
readline.write_history_file(".history")
break

buffer += line
result = None
try: result = ExpressionParser()(buffer)
except ParseBaseException, e:
buffer = ""
pycolorize.error(e.line)
pycolorize.error(" "*(e.col – 1) + "^")
pycolorize.error(str(e))
continue
pycolorize.status("Result: %s", result)
buffer = ""
[/sourcecode]

Conclusion

Writing LL parsers is a breeze with pyparsing but it must be kept in mind that any grammar that has any left recursion will cause errors that may take some time to find or remove. Other parser generators (for C and C++) include bison and lemon. These parser generators are LR parser generators.

By coupling the parser with a small CLI quick checks on new features to the grammar (and by extension the parser) can be a breeze. Putting this all together with unit tests and proper grammar analysis can lead to a well written and extensible language to be used for whatever purpose you may have in mind.

Dec 032010
 

Introduction

Sometimes dynamically loaded modules (plugins or extensions) are pretty convenient to provide extensible functionality from your applications. For example, you need to provide a command that provides known data sources to subcommands but want the subcommands to be easily written and added even after the application has been finalized. We could do this with a simply proper modular design but it seems more natural to allow for the subcommands to be defined elsewhere with a standard interface to allow for extensible behavior even after the initial application development cycle.

The Problem

How do we find and then load and then run code that we didn’t necessarily write?

The first step is fairly obvious we simply ask (via a parameter, config option, or other method) where the code that should be loaded is located. Once we have that the other steps are much easier. In more detail, we need to know a location for code that follows our plugin API resides. To do this we can use the following code (where d is the directory with our plugins):

[sourcecode language="python"]
sys.path.append(d)
files = itertools.chain(*[ [ os.path.join(x[0], fs) for fs in x[2] ] for x in os.walk(d) ] )
plugins = [ f.split('/')[-1].split(‘.’)[0] for f in files if f.endswith(‘.py’) ]
modules = [ __import__(p, globals(), locals(), [], -1) for p in plugins ]

for p,m in zip(plugins,modules):
matches = [ x for x in m.__dict__.keys() if x.lower() == p ]
if len(matches) == 1: # and issubclass(m.__dict__[matches[0]], CorkyCommand):
self._commands.append(m.__dict__[matches[0]]())
[/sourcecode]

Break Down

  1. Add our directory to the python module path so we can simply load them by name
  2. Get a list of the files in this directory
  3. Filter this down to the names of the python files to find the Class that we need to create an instance of
  4. Import the modules as module objects we can manipulate
  5. Loop through the correlated list of plugin names and module objects
  6. Look for an object in the module dictionary that matches the name of the file (case insensitive)
  7. Find a match we then add an instantiated object of the class we found

Quite a bit is going in this short snippet of code but the important thing is it takes a directory path and creates a list of instantiated plugin objects we can use just like any other object variable. Once we have the objects it’s simply a matter of calling functions on them: `self._commands[n].method()`.

Conclusion

Getting a modular design can be daunting and making that modular design as dynamic as possible can be even more daunting but the modern languages (this technique but not syntax works with ruby as well) make this process much easier than the compiled languages (More to come on that later I hope).

© 2011 Alunduil's Hosting Suffusion theme by Sayontan Sinha