Dec 152010
 

Introduction

I know this doesn’t come up nearly as often as other topics but writing an interpreter (a.k.a. a compiler) can be a very useful concept depending on what exactly you need to do with the input to your program. For example, what if we had a fairly strict data store but we for some reason wanted to access it using something like SQL? We’d have to parse the SQL statements and then find a way to mogrify it until we had something that would allow us to get the data we wanted. There is an easier way involving a little parsing theory. For the purposes of this discussion I’m assuming you are at least somewhat familiar with Context Free Grammars.

Creating the Grammar

The first step to creating a parser (especially when using a parser generator) is to find or craft a definition for the grammar. The grammar we’ll use as an example is the expression grammar from tiny basic. This simple grammar is safe for LR or LL parsing (which is important if you look at a common definition of a language like SQL).

Tiny Basic Expression Grammar

expression ::= (+|-|ε) term ((+|-) term)*
term ::= factor ((*|/) factor)*
factor ::= number | (expression)

Enter pyparsing

To create a very simple and extensible LL parser I’ve recently stumbled upon pyparsing. This simple four production grammar expands to the following pyparsing implementation:

[sourcecode language="python" wraplines="false"]
expr = Forward()
factor = ( Word(nums) | Group(Suppress(‘(‘) + expr + Suppress(‘)’)) )
term = Group(factor + ZeroOrMore((Literal(‘*’)|Literal(‘/’)) + factor))
expr << Group(Optional(Literal(‘-’)|Literal(‘+’)) + term + ZeroOrMore((Literal(‘-’)|Literal(‘+’)) + term))
[/sourcecode]

This allows us to turn sentences such as ‘5+5*6/3-(47+56)*34‘ into an easy to work with list such as: ‘[[['5'], ‘+’, ['5', '*', '6', '/', '3'], ‘-’, [[[['47'], ‘+’, ['56']]], ‘*’, ’34′]]]‘. There are probably improvements that could be made to this parser so that it auto-collapses expressions and other fun handlers but for the purposes of a simple grammar this will suffice.

Calling the grammar after defining it is a very simple process: `expr.parseString(’5+5*6/3-(47+56)*34′)`.

Testing Parsers an Easier Way

Sure unit testing should be done (and parsers lend themselves to unit tests very well) but there’s something satisfying about seeing your sentences get parsed out in real time. The obvious answer is, “create a mini-shell like environment.” Python also makes this process extremely simple and only requires a few lines of code to get a basically functional shell for your parser (complete with history):

[sourcecode language="python" wraplines="false"]
import rlcompleter
import readline
import so

if not os.access(".history", os.F_OK): open(".history", "w").close()
readline.read_history_file(".history")
buffer = ""

while True:
try: line = raw_input(pycolorize.light_blue("BASIC$ "))
except EOFError:
readline.write_history_file(".history")
print
break

if line.lower() == "exit" or line.lower() == "quit":
readline.write_history_file(".history")
break

buffer += line
result = ACTION_ON_BUFFER
buffer = ""
[/sourcecode]

Putting It Together

The complete script for reference purposes:

[sourcecode language="python" wraplines="false"]
import rlcompleter
import readline
import os

from pyparsing import *
import pprint

if not os.access(".history", os.F_OK): open(".history", "w").close()
readline.read_history_file(".history")

try:
import pycolorize
except:
sys.path.append(os.path.join(os.path.dirname(__file__), "vendor", "pycolorize"))
import pycolorize

class ExpressionParser(object):
def __init__(self):
self._expr = Forward()
factor = ( Word(nums) | Group(Suppress(‘(‘) + self._expr + Suppress(‘)’)) )
term = Group(factor + ZeroOrMore((Literal(‘*’)|Literal(‘/’)) + factor))
self._expr << Group(Optional(Literal(‘-’)|Literal(‘+’)) + term + ZeroOrMore((Literal(‘-’)|Literal(‘+’)) + term))

def _calculate(self, l):
while any([ isinstance(x, list) for x in l]):
for n,i in enumerate(l):
if isinstance(i, list): l[n] = self._calculate(i)
return str(eval(" ".join(l)))

def __call__(self, string):
return self._calculate(self._expr.parseString(string).asList())

buffer = ""

print pycolorize.green("Enter your SQL commands to tokenize:")
print pycolorize.green("Enter a blank line to exit.")

while True:
try: line = raw_input(pycolorize.light_blue("BASIC$ "))
except EOFError:
readline.write_history_file(".history")
print
break

if line.lower() == "exit" or line.lower() == "quit":
readline.write_history_file(".history")
break

buffer += line
result = None
try: result = ExpressionParser()(buffer)
except ParseBaseException, e:
buffer = ""
pycolorize.error(e.line)
pycolorize.error(" "*(e.col – 1) + "^")
pycolorize.error(str(e))
continue
pycolorize.status("Result: %s", result)
buffer = ""
[/sourcecode]

Conclusion

Writing LL parsers is a breeze with pyparsing but it must be kept in mind that any grammar that has any left recursion will cause errors that may take some time to find or remove. Other parser generators (for C and C++) include bison and lemon. These parser generators are LR parser generators.

By coupling the parser with a small CLI quick checks on new features to the grammar (and by extension the parser) can be a breeze. Putting this all together with unit tests and proper grammar analysis can lead to a well written and extensible language to be used for whatever purpose you may have in mind.

Dec 032010
 

Introduction

Sometimes dynamically loaded modules (plugins or extensions) are pretty convenient to provide extensible functionality from your applications. For example, you need to provide a command that provides known data sources to subcommands but want the subcommands to be easily written and added even after the application has been finalized. We could do this with a simply proper modular design but it seems more natural to allow for the subcommands to be defined elsewhere with a standard interface to allow for extensible behavior even after the initial application development cycle.

The Problem

How do we find and then load and then run code that we didn’t necessarily write?

The first step is fairly obvious we simply ask (via a parameter, config option, or other method) where the code that should be loaded is located. Once we have that the other steps are much easier. In more detail, we need to know a location for code that follows our plugin API resides. To do this we can use the following code (where d is the directory with our plugins):

[sourcecode language="python"]
sys.path.append(d)
files = itertools.chain(*[ [ os.path.join(x[0], fs) for fs in x[2] ] for x in os.walk(d) ] )
plugins = [ f.split('/')[-1].split(‘.’)[0] for f in files if f.endswith(‘.py’) ]
modules = [ __import__(p, globals(), locals(), [], -1) for p in plugins ]

for p,m in zip(plugins,modules):
matches = [ x for x in m.__dict__.keys() if x.lower() == p ]
if len(matches) == 1: # and issubclass(m.__dict__[matches[0]], CorkyCommand):
self._commands.append(m.__dict__[matches[0]]())
[/sourcecode]

Break Down

  1. Add our directory to the python module path so we can simply load them by name
  2. Get a list of the files in this directory
  3. Filter this down to the names of the python files to find the Class that we need to create an instance of
  4. Import the modules as module objects we can manipulate
  5. Loop through the correlated list of plugin names and module objects
  6. Look for an object in the module dictionary that matches the name of the file (case insensitive)
  7. Find a match we then add an instantiated object of the class we found

Quite a bit is going in this short snippet of code but the important thing is it takes a directory path and creates a list of instantiated plugin objects we can use just like any other object variable. Once we have the objects it’s simply a matter of calling functions on them: `self._commands[n].method()`.

Conclusion

Getting a modular design can be daunting and making that modular design as dynamic as possible can be even more daunting but the modern languages (this technique but not syntax works with ruby as well) make this process much easier than the compiled languages (More to come on that later I hope).

 

I’ve often gotten frustrated with my /etc/portage/package.* files when they become massive and full of crud that I don’t even have installed any longer. Because of this I have crafted a simple little utility to clean out packages that are no longer installed and use flags that are no longer valid from these files. This should help trim the cruft from the Gentoo configuration.

The utility, pclean, does all of this and only has one major problem (so far) before I shall call it good enough for a 1.0 release. If you would like to try this little utility; it’s available in my overlay and if you notice any odd behavior please report it to my bugzilla.

Holland on Gentoo

 Linux Guides  Comments Off
Aug 012010
 

Introduction

There is a new king of backups in town, holland. This little framework written in Python allows one to easily backup anything that might need to be converted to a more flat file style before being backed up. Right now there is support for mysql, sqlite, and postgresql but with a little finesse it could potentially support directories as well as databases. This would make not only mysql backups a breeze but LDAP as well.

Progress Update

I have added a preliminary set of ebuilds to my overlay (which could use some code review if anyone is interested) that allows holland to easily be installed on Gentoo systems. So easy in fact that all it takes is `emerge holland`.

It accepts a set of use flags to bring in the “providers” you want to be able to backup for and makes sure that those packages are installed on the system.

Examples

The holland ebuilds have three providers right now:

  • mysql
  • postgresql
  • sqlite

You can install any of these three you want in any combination; it doesn’t care. It will default to installing the mysql but can easily be told not to by placing -mysql in the use flags for holland. Diego Pettenò — Flameeyes mentioned to me that in EAPI 4 we’ll get the cool option of being able to specify one of a set of use flags must be set without forcing the choice but until then we have this slick solution.

There is also lvm support for snapshotting off the database directory before grabbing the database and a myriad of other features I haven’t had a chance to explore yet.

To perform a rudimentary backup after installing holland simply run `holland bk`. This will read the configurations in `/etc/holland` and backup the databases it finds.

Conclusion

The new kid on the block, holland, will make backups of complex databases and directories a breeze. Simply change that cronjob from using mysqldump to calling holland and you’re finished.

 

KDE 4 has some major improvements over older versions, but it also seems to have gone backwards in places. The new libraries probably contribute to this and are absolutely the way to go. A nice ability that I’ve been looking for in powerdevil (the new power manager in KDE 4) is how to have the screensaver “disable” when entering presentation mode. This is behavior that I know I expected but found to my dismay partway through a presentation that the screensaver still kicked in.

After looking around for ways to “fix” this problem, I finally found some interesting information in the form of the DBUS interface provided by the screensaver in KDE. Using qdbusviewer I was able to find an API for the screensaver that can be invoked at any point and from anywhere (assuming that you’re part of the session). Using this new ammunition for more Google searching, I found that I could write a daemon in python that would keep the screensaver from displaying while it was turned on.

The result of this work can be found in my subversion repository as [stop_kscreensaver.py](http://www.alunduil.com/svn/stop_kscreensaver/trunk/stop_kscreensaver.py). This script only has 3 parameters and is very easy to use. When starting the daemon you simply pass a time between activity simulations (by setting this just shorter than the timeout for your screensaver activation it is much more efficient) and if desired a different log level. To stop the daemon you simply pass the kill parameter which reads the PID from a standard file and makes sure the daemon dies.

The timing parameter for this script is fairly functional in that you can pass the time in with various units and the conversion will be taken into account. For example, one could pass a time of 2h32.1m94.34s. Why anyone would is beyond me, but I figured with regular expressions it might be easy to do. If no units are passed the script assumes that the number passed was in seconds. As always if any bugs are found please e-mail me, [Alex Brandt](mailto:alunduil@alunduil.com) with a description of the problem (or if you’re ambitious a patch file would be appreciated).

Now the important part. How do we get this to work with powerdevil? That’s the easiest part of all with powerdevil’s execute this script when switching to this profile feature. We simply save the script somewhere, make it executable (chmod 755), and then set the path (or browse to it) in the powerdevil configuration interface.

Once that is in place you can switch to the profile you set the daemon up to start in and the screensaver although active will not start up until you switch profiles again. This lets you watch that movie you wanted to just like our favorite comic [XKCD](http://xkcd.com/196/) tells us about.

© 2011 Alunduil's Hosting Suffusion theme by Sayontan Sinha