High level design#
What are hooks?#
Hook mechanism is a part of onETL which allows to inject some additional behavior into existing methods of (almost) any class.
Features#
Hooks mechanism allows to:
Inspect and validate input arguments and output results of method call
Access, modify or replace method call result (but NOT input arguments)
Wrap method calls with a context manager and catch raised exceptions
Hooks can be placed into Plugins, allowing to modify onETL behavior by installing some additional package.
Limitations#
Hooks can be bound to methods of a class only (not functions).
Only methods decorated with @slot decorator implement hooks mechanism. These class and methods are marked as .
Hooks can be bound to public methods only.
Terms#
@slot decorator - method of a class with a special decorator
Callback
- function which implements some additional logic which modifies slot behavior@hook decorator - wrapper around callback which stores hook state, priority and some useful methods
Hooks mechanism
- callingSlot()
will call all enabled hooks which are bound to the slot. Implemented by @support_hooks decorator.
How to implement hooks?#
TL;DR#
from onetl.hooks import support_hooks, slot, hook
@support_hooks # enabling hook mechanism for the class
class MyClass:
def __init__(self, data):
self.data = data
# this is slot
@slot
def method(self, arg):
pass
@MyClass.method.bind # bound hook to the slot
@hook # this is hook
def callback(obj, arg): # this is callback
print(obj.data, arg)
obj = MyClass(1)
obj.method(2) # will call callback(obj, 1)
# prints "1 2"
Define a slot#
Create a class with a method:
class MyClass:
def __init__(self, data):
self.data = data
def method(self, arg):
return self.data, arg
Add @slot decorator to the method:
from onetl.hooks import support_hooks, slot, hook
class MyClass:
@slot
def method(self, arg):
return self.data, arg
If method has other decorators like @classmethod
or @staticmethod
, @slot
should be placed on the top:
from onetl.hooks import support_hooks, slot, hook
class MyClass:
@slot
@classmethod
def class_method(cls, arg):
return cls, arg
@slot
@staticmethod
def static_method(arg):
return arg
Add @support_hooks decorator to the class:
from onetl.hooks import support_hooks, slot, hook
@support_hooks
class MyClass:
@slot
def method(self, arg):
return self.data, arg
Slot is created.
Define a callback#
Define some function (a.k.a callback):
def callback(self, arg):
print(self.data, arg)
It should have signature compatible with MyClass.method
. Compatible does not mean exactly the same -
for example, you can rename positional arguments:
def callback(obj, arg):
print(obj.data, arg)
Use *args
and **kwargs
to omit arguments you don’t care about:
def callback(obj, *args, **kwargs):
print(obj.data, args, kwargs)
There is also an argument method_name
which has a special meaning -
the method name which the callback is bound to is passed into this argument:
def callback(obj, *args, method_name: str, **kwargs):
print(obj.data, args, method_name, kwargs)
Note
method_name
should always be a keyword argument, NOT positional.
Warning
If callback signature is not compatible with slot signature, an exception will be raised, but ONLY while slot is called.
Define a hook#
Add @hook decorator to create a hook from your callback:
@hook
def callback(obj, arg):
print(obj.data, arg)
You can pass more options to the @hook
decorator, like state or priority.
See decorator documentation for more details.
Bind hook to the slot#
Use Slot.bind
method to bind hook to the slot:
@MyClass.method.bind
@hook
def callback(obj, arg):
print(obj, arg)
You can bind more than one hook to the same slot, and bind same hook to multiple slots:
@MyClass.method1.bind
@MyClass.method2.bind
@hook
def callback1(obj, arg):
"Will be called by both MyClass.method1 and MyClass.method2"
@MyClass.method1.bind
@hook
def callback2(obj, arg):
"Will be called by MyClass.method1 too"
How hooks are called?#
General#
Just call the method decorated by @slot
to trigger the hook:
obj = MyClass(1)
obj.method(2) # will call callback(obj, 2)
# prints "1 2"
There are some special callback types that has a slightly different behavior.
Context managers#
@hook
decorator can be placed on a context manager class:
@hook
class ContextManager:
def __init__(self, obj, arg):
self.obj = obj
self.arg = arg
def __enter__(self):
# do something on enter
print(obj.data, arg)
return self
def __exit__(self, exc_type, exc_value, traceback):
# do something on exit
return False
Context manager is entered while calling the Slot()
, and exited then the call is finished.
If present, method process_result
has a special meaning -
it can receive MyClass.method
call result, and also modify/replace it:
@hook
class ContextManager:
def __init__(self, obj, arg):
self.obj = obj
self.arg = arg
def __enter__(self):
# do something on enter
print(obj.data, arg)
return self
def __exit__(self, exc_type, exc_value, traceback):
# do something on exit
return False
def process_result(self, result):
# do something with method call result
return modified(result)
See examples below for more information.
Generator function#
@hook
decorator can be placed on a generator function:
@hook
def callback(obj, arg):
print(obj.data, arg)
# this is called before original method body
yield # method is called here
# this is called after original method body
It is converted to a context manager, in the same manner as contextlib.contextmanager.
Generator body can be wrapped with try..except..finally
to catch exceptions:
@hook
def callback(obj, arg):
print(obj.data, arg)
try:
# this is called before original method body
yield # method is called here
except Exception as e:
process_exception(a)
finally:
# this is called after original method body
finalizer()
There is also a special syntax which allows generator to access and modify/replace method call result:
@hook
def callback(obj, arg):
original_result = yield # method is called here
new_result = do_something(original_result)
yield new_result # modify/replace the result
Calling hooks in details#
The callback will be called with the same arguments as the original method.
If slot is a regular method:
callback_result = callback(self, *args, **kwargs)
Here
self
is a class instance (obj
).If slot is a class method:
callback_result = callback(cls, *args, **kwargs)
Here
cls
is the class itself (MyClass
).If slot is a static method:
callback_result = callback(*args, **kwargs)
Neither object not class are passed to the callback in this case.
If
callback_result
is a context manager, enter the context. Context manager can catch all the exceptions raised.If there are multiple hooks bound the the slot, every context manager will be entered.
Then call the original method wrapped by
@slot
:original_result = method(*args, **kwargs)
Process
original_result
:If
callback_result
object has methodprocess_result
, or is a generator wrapped with@hook
, call it:new_result = callback_result.process_result(original_result)
Otherwise set
new_result = callback_result
.If there are multiple hooks bound the the method, pass
new_result
through the chain:new_result = callback1_result.process_result(original_result) new_result = callback2_result.process_result(new_result or original_result) new_result = callback3_result.process_result(new_result or original_result)
Finally return:
return new_result or original_result
All
None
values are ignored on every step above.Exit all the context managers entered during the slot call.
Hooks priority#
Hooks are executed in the following order:
Parent class slot +
FIRST
Inherited class slot +
FIRST
Parent class slot +
NORMAL
Inherited class slot +
NORMAL
Parent class slot +
LAST
Inherited class slot +
LAST
Hooks with the same priority and inheritance will be executed in the same order they were registered (Slot.bind
call).
Note
Calls of super()
inside inherited class methods does not trigger hooks call.
Hooks are triggered only if method is called explicitly.
This allow to wrap with a hook the entire slot call without influencing its internal logic.
Hook types#
Here are several examples of using hooks. These types are not exceptional, they can be mixed - for example, hook can both modify method result and catch exceptions.
Before hook#
Can be used for inspecting or validating input args of the original function:
@hook
def before1(obj, arg):
print(obj, arg)
# original method is called after exiting this function
@hook
def before2(obj, arg):
if arg == 1:
raise ValueError("arg=1 is not allowed")
return None # return None is the same as no return statement
Executed before calling the original method wrapped by @slot
.
If hook raises an exception, method will not be called at all.
After hook#
Can be used for performing some actions after original method was successfully executed:
@hook
def after1(obj, arg):
yield # original method is called here
print(obj, arg)
@hook
def after2(obj, arg):
yield None # yielding None is the same as empty yield
if arg == 1:
raise ValueError("arg=1 is not allowed")
If original method raises an exception, the block of code after yield
will not be called.
Context hook#
Can be used for catching and handling some exceptions, or to determine that there was no exception during slot call:
# This is just the same as using @contextlib.contextmanager
@hook
def context_generator(obj, arg):
try:
yield # original method is called here
print(obj, arg) # <-- this line will not be called if method raised an exception
except SomeException as e:
magic(e)
finally:
finalizer()
@hook
class ContextManager:
def __init__(self, obj, args):
self.obj = obj
self.args = args
def __enter__(self):
return self
# original method is called between __enter__ and __exit__
def __exit__(self, exc_type, exc_value, traceback):
result = False
if exc_type is not None and isinstance(exc_value, SomeException):
magic(exc_value)
result = True # suppress exception
else:
print(self.obj, self.arg)
finalizer()
return result
Note
Contexts are exited in the reverse order of the hook calls. So if some hook raised an exception, it will be passed into the previous hook, not the next one.
It is recommended to specify the proper priority for the hook, e.g. FIRST
Replacing result hook#
Replaces the output result of the original method.
Can be used for delegating some implementation details for third-party extensions. See Hive and HDFS as an example.
@hook
def replace1(obj, arg):
result = arg + 10 # any non-None return result
# original method call result is ignored, output will always be arg + 10
return result
@hook
def replace2(obj, arg):
yield arg + 10 # same as above
Note
If there are multiple hooks bound to the same slot, the result of last hook will be used.
It is recommended to specify the proper priority for the hook, e.g. LAST
Accessing result hook#
Can access output result of the original method and inspect or validate it:
@hook
def access_result(obj, arg):
result = yield # original method is called here, and result can be used in the hook
print(result)
yield # does not modify result
@hook
class ModifiesResult:
def __init__(self, obj, args):
self.obj = obj
self.args = args
def __enter__(self):
return self
# original method is called between __enter__ and __exit__
# result is passed into process_result method of context manager, if present
def process_result(self, result):
print(result) # result can be used in the hook
return None # does not modify result. same as no return statement in the method
def __exit__(self, exc_type, exc_value, traceback):
return False
Modifying result hook#
Can access output result of the original method, and return the modified one:
@hook
def modifies_result(obj, arg):
result = yield # original method is called here, and result can be used in the hook
yield result + 10 # modify output result. None values are ignored
@hook
class ModifiesResult:
def __init__(self, obj, args):
self.obj = obj
self.args = args
def __enter__(self):
return self
# original method is called between __enter__ and __exit__
# result is passed into process_result method of context manager, if present
def process_result(self, result):
print(result) # result can be used in the hook
return result + 10 # modify output result. None values are ignored
def __exit__(self, exc_type, exc_value, traceback):
return False
Note
If there are multiple hooks bound to the same slot, the result of last hook will be used.
It is recommended to specify the proper priority for the hook, e.g. LAST
How to enable/disable hooks?#
You can enable/disable/temporary disable hooks on 4 different levels:
Manage global hooks state (level 1):
Manage all hooks bound to a specific class (level 2):
Manage all hooks bound to a specific slot (level 3):
Manage state of a specific hook (level 4):
More details in the documentation above.
Note
All of these levels are independent.
Calling stop
on the level 1 has higher priority than level 2, and so on.
But calling resume
on the level 1 does not automatically resume hooks stopped in the level 2,
they should be resumed explicitly.
How to see logs of the hook mechanism?#
Hooks registration emits logs with DEBUG
level:
from onetl.logs import setup_logging
setup_logging()
DEBUG |onETL| Registered hook 'mymodule.callback1' for 'MyClass.method' (enabled=True, priority=HookPriority.NORMAL)
DEBUG |onETL| Registered hook 'mymodule.callback2' for 'MyClass.method' (enabled=True, priority=HookPriority.NORMAL)
DEBUG |onETL| Registered hook 'mymodule.callback3' for 'MyClass.method' (enabled=False, priority=HookPriority.NORMAL)
But most of logs are emitted with even lower level NOTICE
, to make output less verbose:
from onetl.logs import NOTICE, setup_logging
setup_logging(level=NOTICE)
NOTICE |Hooks| 2 hooks registered for 'MyClass.method'
NOTICE |Hooks| Calling hook 'mymodule.callback1' (1/2)
NOTICE |Hooks| Hook is finished with returning non-None result
NOTICE |Hooks| Calling hook 'mymodule.callback2' (2/2)
NOTICE |Hooks| This is a context manager, entering ...
NOTICE |Hooks| Calling original method 'MyClass.method'
NOTICE |Hooks| Method call is finished
NOTICE |Hooks| Method call result (*NOT* None) will be replaced with result of hook 'mymodule.callback1'
NOTICE |Hooks| Passing result to 'process_result' method of context manager 'mymodule.callback2'
NOTICE |Hooks| Method call result (*NOT* None) is modified by hook!