Dynamic Subclassing in Python

Recently, I was working on a Python project for processing GCMS data that required something like the following steps:

  1. Load data from an input file
  2. Add some referencing metadata specific to file contents
  3. Fit the data

I wanted to write this as one class, so that I could easily store and compare several different files processed in this way. One way to do this would be to separate the different components into their own classes, and then use these as base classes for unifying derived class. For example, one could write the following type of code:

Reading data.
Referencing data.
Fitting data.

The trouble, though, was that each of these steps could be done in several very different ways. For example, there are potentially many different file types that one might want to process, even though the imported data is of the same type. One approach to this problem would be to write a file loading class for every file type. However, you now have to write a derived class for every possible file type class. It gets worse when the other steps have multiple different parts as well. If you have 3 file types, 3 referencing types, and 3 fitting types, that’s (3*3*3=) 27 total derived types!

Python’s type function to the rescue

Python has a built-in function type, which I’ve always used in the past to determine the object type for a particular variable.

<class ‘str’>

However, this function has a second use as well, which takes three arguments. This form is type(name, bases, dict), and it returns a new object named name with bases as the superclasses. dict is a dictionary of class attributes you’d like to set for this new class.

This is very powerful, because now we can write a class generator function that takes some flags and outputs the appropriate class. Below is an example that is similar to that shown above except that it uses several different types of Read and Processing classes. (I’ve left out fitting, but you get the picture.)

First readtype=2, proctype=1
Reading Type 2
Process type: 1
Processed data: 2Second readtype=1, proctype=1
Reading Type 1
Process type: 1
Processed data: 1Third readtype=2, proctype=2
Reading Type 2
Process type: 2
Processed data: 4

Although very cool, I ultimately used a different solution for my project. However, I posted this here in the hope that this information is useful to others.