Instead of writing a way more involved post, spending a bunch of time justifying why I would want to talk about this, and providing examples of a more professional implementation of using Python Data Model Descriptors; I decided it would be way more fun to show a silly example of how this feature can be used.
NOTE: this is done in Python 3.6 but the Python 2.7’s descriptors work exactly the same for new style classes.
Enjoy!
Imagine a world of magical creatures. In this world, all of our magical creatures have been granted the ability to interact with other creatures:
class Creature:
def __init__(self, name):
self.name = name
def interact(self, target):
raise NotImplementedError()
class Cat(Creature):
def interact(self, target):
return f"{self.name} is a cat and scratches {target}!"
class Dog(Creature):
def interact(self, target):
return f"{self.name} is a dog and barks at {target}!"
It also has things like trees and flowers:
class Tree:
def __repr__(self):
return "a tree"
class Flower:
def __repr__(self):
return "a flower"
And of course; Humans and Wizards:
class Human(Creature):
def interact(self, target):
return f"{self.name} is a normal human and waves at {target}!"
class Wizard(Human):
def interact(self, target):
return f"{self.name} is a powerful wizard and scoffs at {target}!"
Creatures can interact with other objects in the world:
>>> dog = Dog("Fido")
>>> dog.interact(Tree())
'Fido is a dog and barks at a tree!'
And wizards, as we know, have some spells. In particular, wizards can transform creatures into other creatures (don’t worry, we’ll talk about this at the end):
class Wizard(Human):
...
def transform(self, target, into):
if not isinstance(target, Creature):
raise ValueError(f"{type(target)} is not a {Creature.__name__}!")
if not inspect.isclass(into) or Creature not in into.__mro__:
raise ValueError(f"{type(into)} is not a {Creature.__name__}!")
attrs = [attr for attr in dir(into) if not attr.startswith('__')]
for attr in attrs:
value = getattr(into, attr)
if not hasattr(value, '__func__'):
continue
func = value.__func__
descriptor = func.__get__(target, type(target))
setattr(target, attr, descriptor)
Enter; Merlin and his apprentice, a young king Arthur:
>>> merlin = Wizard('Merlin')
>>> arthur = Human('Arthur')
As a human, Arthur interacts with things by waving at them:
>>> arthur.interact(merlin)
'Arthur is a normal human and waves at Merlin!'
Whereas Merlin, a mighty wizard, has a less benevolent reaction:
>>> merlin.interact(arthur)
'Merlin is a powerful wizard and scoffs at Arthur!'
As a young wizard’s apprentice, it is important that Arthur understand the world around him by observing interactions through the form of various other magical creatures.
So to give Arthur this knowledge and experience, Merlin transforms him into a cat:
>>> merlin.transform(arthur, into=Cat)
I, Merlin, have transformed Arthur into a Cat!
>>> assert type(arthur) is Human
>>> # Still a human!
>>> arthur.interact(Tree())
'Arthur is a cat and scratches a tree!'
>>> arthur.interact(Flower())
'Arthur is a cat and scratches a flower!'
And then into a Dog:
>>> merlin.transform(arthur, into=Dog)
I, Merlin, have transformed Arthur into a Dog!
>>> assert type(arthur) is Human
>>> # Still a human!
>>> arthur.interact(Tree())
'Arthur is a dog and barks at a tree!'
>>> arthur.interact(Flower())
'Arthur is a dog and barks at a flower!'
After a sufficient amount of transforming Arthur from one animal to the next and interacting with the world, Merlin finally transforms Arthur back into a human and he’s back to his old hand-waving ways:
>>> merlin.transform(arthur, into=Human)
I, Merlin, have transformed Arthur into a Human!
>>> assert type(arthur) is Human
>>> # Still a human!
>>> arthur.interact(Tree())
'Arthur is a normal human and waves at a tree!'
>>> arthur.interact(Flower())
'Arthur is a normal human and waves at a flower!'
The end!
Once you understand Python Descriptors, there isn’t actually too much to explain here. In short; descriptors are a way to take advantage of the machinery behind Python’s “New-Style” (default in 3.X) classes.
In order understand descriptors better, though, we first need to understand a bit about Python’s data model. Descriptors are a very special part of the data model and they are defined by three magic methods: __get__
, __set__
, and __del__
. We’ll focus mostly on __get__
— the other two are left up to the reader to research. :-)
There are two types of “member” objects of a Python class; values and functions. In most object-oriented languages member values are typically referred to as attributes where as member functions are referred to as methods; and that is usually only reserved for use by an instance of a class, not the class itself.
With Python, class rules aren’t as strict and we can do some funny things at class definition time. For example; we can define a method (i.e. using the first implied self
parameter) for an instance of an object OR we can define the method using @classmethod
in order to provide its usage at the class level as well as the instance level.
The difference between these two things isn’t exactly subtle:
class Foo:
@classmethod
def bar(cls):
print(f"{cls} called classmethod bar")
def baz(self):
print(f"{self} called instance method baz")
>>> Foo.bar()
<class '__main__.Foo'> called classmethod bar
>>> f = Foo()
>>> f.baz()
<__main__.Foo object at 0x101d28908> called instance method baz
>>> f.bar()
<class '__main__.Foo'> called classmethod bar
The first thing that almost every Python developer will immediately dismiss is that we can call both Foo.bar
and Foo().bar
. That is; we can call bar
from both the class reference (Foo
) and an instance of that class (Foo()
). That’s little more than muscle memory at this stage, but descriptors are the key to understanding why this is possible.
__dict__
Most of you have seen it before— you might have even used it directly at some point during debugging object values.
Python’s __dict__
is a special dictionary that rests as another core component to “New-Style” classes and work together with descriptors to provide the user-defined class system we are familiar with today.
In short; when you define a function as part of a class definition (e.g. def foo(self): print(self)
) Python exposes that as an attribute of your class using a descriptor for the underlying function, rather than the function itself. The underlying function is stored in a class’s __dict__
which is used to look up members when attributes are accessed.
To better explain, here is some code using our class Foo
from above to demonstrate to this. First, let’s illustrate how dot-style access and __dict__
are related1:
>>> Foo.bar
<bound method Foo.bar of <class '__main__.Foo'>>
>>> Foo.bar.__func__
<function Foo.bar at 0x101baaf28>
>>> Foo.__dict__['bar']
<classmethod object at 0x101d28cc0>
>>> Foo.__dict__['bar'].__func__
<function Foo.bar at 0x101baaf28>
>>> assert Foo.bar.__func__ == Foo.__dict__['bar'].__func__
>>>
That’s interesting. We have three different types of objects associated with the same bar
attribute of the Foo
class:
function
: Which is the actual function defintion of bar
.classmethod
: Which is a classmethod decorator wrapping bar
’s function defintion.bound method
: A descriptor (hah!) which takes care of automagically passing the Foo
class in as the cls
parameter to bar
upon invocation by dot-style access.Our function
object is different from classmethod
s is different from bound method
s — an example of this behavior:
>>> Foo.bar()
<class '__main__.Foo'> called classmethod bar
>>> Foo.bar.__func__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bar() missing 1 required positional argument: 'cls'
>>> Foo.__dict__['bar']()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'classmethod' object is not callable
So it makes sense classmethod
isn’t directly callalble, but the only problem with Foo.bar.__func__
was its argument. Let’s try with a positional argument:
>>> Foo.bar.__func__(Foo)
<class '__main__.Foo'> called classmethod bar
A step forward! But it would be nice if we didn’t have to pass in Foo
to every call. It turns out that with descriptors, we can generate a bound method
just like @classmethod
did above:
>>> Foo.bar.__func__.__get__(Foo, Foo)
<bound method Foo.bar of <class '__main__.Foo'>>
Hey! That looks familiar! So how about that not having to pass that argument at all?
>>> Foo.bar.__func__.__get__(Foo, Foo)()
<class '__main__.Foo'> called classmethod bar
Bingo!
Part of the behavior of a descriptor is to bind itself to its owner and provide its instance as the first argument to the function it references. That is why we were not required to pass Foo
into the function descriptor generated by __get__
.
Okay — so we have essentially just recreated @classmethod
and how the underlying function bar
works when accessed via the class Foo
. But how does that help with instances of Foo
? Read on!
In a more technical description; __get__
can be used to “bind” objects (almost abitrarily) to an “owner” (and optional “instance”)2. The “instance” part of a descriptor is what will be provided as an implied first argument to the function used to generate it once it is called.
As a general rule with descriptor creation; when you provide instance=None
the descriptor is registered as an instance method and the instance is passed as the magic first argument. If instance=Foo
(or any class), the descriptor will pass the class object Foo
as the magic first argument.
For example, here’s how we would use descriptors to create a new instance method for new instances of Foo
:
>>> def fizz(self):
... print(f"Called fizz from instance {self}")
...
>>> Foo.fizz = fizz.__get__(None, Foo)
>>>
>>> Foo.fizz
<function fizz at 0x101d30158>
>>> Foo.fizz()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: fizz() missing 1 required positional argument: 'self'
>>> f = Foo()
>>> f
<__main__.Foo object at 0x101d28e10>
>>> f.fizz()
Called fizz from instance <__main__.Foo object at 0x101d28e10>
And a similar strategy for a “class method”:
>>> def buzz(cls):
... print(f"Called buzz from cls: {cls}")
...
>>> Foo.buzz = buzz.__get__(Foo, Foo)
>>> Foo.buzz()
Called buzz from cls: <class '__main__.Foo'>
>>> Foo().buzz()
Called buzz from cls: <class '__main__.Foo'>
And that’s pretty much it.
So in our example at the top; we have a magical wizard who can transform
things into other things, but how did it work?
This is the important snippet of code:
attrs = [attr for attr in dir(into) if not attr.startswith('__')]
for attr in attrs:
value = getattr(into, attr)
if not hasattr(value, '__func__'):
continue
func = value.__func__
descriptor = func.__get__(target, type(target))
setattr(target, attr, descriptor)
So the first thing we do is grab every attribute (filtering out scary stuff that .startswith("__")
) from the class we’re going to turn our instance “into”.
Next, we get the underlying function from the descriptor and generate a new descriptor on the target class using __get__
. Assignment using setattr
means we don’t have to know the actual written attribute name (i.e. foo.bar =
) ahead of time and ensures that the new descriptor is bound to our target
and its owner (type(target)
) as target
’s original class.
NOTE: This does not modify target
’s class definition — that is; new instances of type(target)
will still be initialized with all of its original class definition.
Lastly, we set an attribute of the same name onto the target. Now we have successfully copied any attributes, methods, etc. from into
as instance methods on target
.
Voila! No magic; just descriptors3.
I used equality (==
) rather than identity (is
) checks here to be consistent with the flow of the example. In fact; Foo.baz is Foo.__dict__['baz']
but due to the way instances work, f.baz is not Foo.baz.__get__(f, type(f))
.
The signature for the “get” descriptor is __get__(self, instance, owner)
. “self” is the object __get__
is called from and is passed auto-magically.
For what it’s worth; what I have blogged about here is a very small amount of descriptor usage. There’s a whole internet out there using and abusing it in fun new ways every day.
↩