About lifetime models#
When a model is composed of several parametric parts, which can
themselves be other ParametricModel
objects, it can be challenging
to create a new model without disrupting the overall logic of the ReLife
implementations. To assist with the implementation of new statistical
models, we have implemented framework objects called Parameters
and ParametricComponent
. These objects provide helpful properties
and operations to simplify the creation of new parametric models.
The Parameters
object is used to encode the parameters of a whole
model while preserving the composition structure of models. It can be
seen as a tree of parameter sets that follows the tree of model
composition. Additionally, it provides a set of helper properties and
operations that make it easier to define and work with the parameters of
a parametric model.
The ParametricComponent
object, on the other hand, is used to
represent a single parametric part of a model. It is composed of a
Parameters
instance and provides a unified interface for adding new
components to existing ones, similar to function composition. It can be
seen as a tree of parametric functions.
By using these framework objects, it is possible to more easily create new parametric models that are composed of multiple parts, without having to worry about disrupting the overall logic of the ReLife implementations.
Parameters
#
In previous versions of ReLife, model parameters were encapsulated in a
varying-sized list of floats, depending on the number of parameters. The
Parameters
object is a more powerful alternative to this simple
list, and it has been implemented to encapsulate the structure and
values of the parameters. This object is responsible for automatically
adapting the storage of parameters in a ParametricComponent
and
ensuring that parameters follow the structure of model composition. We
have adapted the composite
pattern to encode
parameters in a tree structure that follows the composition of the
model. Our Parameters
objects can be seen as a tree of dictionaries.
Each node in the tree has:
A dictionary of parameter names and values
All parameter names and values stored in lists (including those from the current node and all leaf nodes)
This allows for multiple nodes to have the same parameter names. Separating the dictionary of current node parameters and lists of all parameters is mainly due to computation constraints. When operations modify all parameters of a model, it avoids the need to read the entire parameter tree each time.
Each node can answer several requests, including:
Getting/setting all parameter values from a node (updating the current node’s parameter values and those of its leaves)
Getting/setting all parameter names from a node (updating the current node’s parameter names and those of its leaves)
Getting/setting a parameter from its name at the node level (avoiding the ability to call other node parameters by name to prevent naming conflicts)
Modifying the entire set of parameter names and values, including their number (updating the current node’s dictionary of parameters)
I hope this revised text is clear and helpful. Let me know if you have any further questions or concerns.
improvements : - replace composite by namedtuple and do not store all params and all names (not really used)
ParametricComponent
#
The ParametricComponent
object also follows a tree-like structure,
as it stores other ParametricComponent
objects in its leaves
attribute. It benefits from the __getattr__
and __setattr__
Python magic methods, which create a sort of bridge with the parameters
data in order to call parameters by their names inside methods.
A ParametricComponent
object has the following attributes:
params
: This is composed of aComposite
object and contains the parameters for the current node.leaves
: This is a list of otherParametricComponent
objects.
The ParametricComponent
object has the following methods:
compose_with
: This method adds aParametricComponent
object as a leaf that can be called from the current node.new_params
: This method changes the local parameters structure at the node level.
LifetimeModel
abstract baseclass : implementation control#
The LifetimeModel
class provides an abstract base interface consisting of survival probability functions.
At this level, the methods sf
, hf
, chf
, and pdf
are marked as abstract methods, meaning
they must be implemented in any derived class. It is worth noting that these abstract methods have a default implementation
that is conditionally based on the existence of other methods. For example, if a derived class implements hf
and pdf
,
it can use the super
mechanism to call the default implementation of sf
instead of providing a concrete formula.
For those who are new to Python, it may seem cumbersome to rewrite a method that already has a default implementation and
simply call super
within it. This is partially true. It is also important to recognize that a new contributor only needs
to read the derived class to understand the entire interface of its instances. In object-oriented programming (OOP), this explicitness
is particularly valuable. Additionally, in this case, LifetimeModel
is a variadic generic abstract class,
where concrete methods can have variadic parameterization. Therefore, the docstring used to document the object interface
must be specified in each case.
A potential workaround for implementing a derived class of LifetimeModel
could involve calling super
for each abstract method.
This approach works at compile time since all abstract methods are implemented in the code; however, it will result
in a RecursionError
at runtime if any of these methods are called.
One might consider using another Python feature, such as metaclasses, to exert more control over these classes.
However, we believe that using metaclasses would complicate the code significantly, while the abc module is a well-known
and established solution.
Finally, when examining a derived class of LifetimeModel
, one may notice that some methods are tagged with the
override decorator. This decorator is used solely for static type checking tools (such as mypy) to ensure that the overridden
method does not alter the base signature and simply provides an alternative implementation.
Thus, it is applied whenever a non-abstract method is overridden, either for documentation purposes
or to offer a more suitable or straightforward implementation.
improvements :
with
__init_subclass__
read methods signature recursively in order to to parse *args names and to fill args_names and nb_args
Variadic model args
: LifetimeModel
is Generic
#
In previous versions of ReLife, the unpacking operator *
was used to
create an infinite number of arguments that could be passed to a
function. This allowed the LifetimeModel
interface to be responsive
to a variadic number of extra arguments in methods signatures when the
model was composed of other models. The following piece of code
illustrates this idea in the case of a regression model:
class LifetimeModel:
...
def hf(self, time: NDArray[np.float64], *args: NDArray[np.float64]):...
class ProportionalHazard(LifetimeModel):
baseline : LifetimeModel
...
def hf(self, time: NDArray[np.float64], covar : NDArray[np.float64], *args: NDArray[np.float64]):...
return self.covar_effect.g(covar) * self.baseline.hf(time, *args)
In this example, ProportionalHazard
objects are composed of any
other LifetimeModel
instance and inherit the LifetimeModel
interface in order to reuse the base implementation of probability
functions if needed. However, ProportionalHazard
extends the hf
signature with one extra argument named covar
to explicitly tell
users that in its case *args
must have at least one covar
object. The *args
parameter also allows
model = ProportionalHazard(AFT(AFT(...(Weibull())
to run, because if
one wants to request model.hf
, the number of arguments that must be
passed varies and is spread recursively in the chain of baseline
composition.
However, typing rules can be easily fooled or misrespected if one is not
careful. In the previous example, strictly speaking,
ProportionalHazard
overrides the hf
signature and violates the
Liskov Substitution Principle (LSP): hf
expects
[float, tuple[float, ...]]
in LifetimeModel
, but
[float, float, tuple[float, ...]]
in ProportionalHazard
.
To handle correct type hinting and avoid issues related to the problem
explained above, ReLife uses TypeVarTuple
introduced in Python 3.11.
This allows LifetimeModel
to act as a
template,
enabling parametric polymorphism and variadic args.
Here is an example of how this can be implemented using
TypeVarTuple
:
VariadicArgs = TypeVarTuple("VariadicArgs")
class LifetimeModel(Generic[*VariadicArgs]):
...
def hf(self, time: NDArray[np.float64], *args: *VariadicArgs):...
ModelArgs = tuple[NDArray[np.float64], ...]
class ProportionalHazard(LifetimeModel[NDArray[np.float64], *ModelArgs]):
baseline : LifetimeModel[*ModelArgs]
...
def hf(self, time: NDArray[np.float64], covar : NDArray[np.float64], *args: *ModelArgs):...
return self.covar_effect.g(covar) * self.baseline.hf(time, *args)
In this example, VariadicArgs
is a type variable that can be any
tuple of types. Concrete implementation, like ProportionalHazard
can specify the expected tuple of types while still maintaining
correct type hinting. Here, ProportionalHazard
expects this tuple of
types as extra arguments :
tuple[NDArray[np.float64], *ModelArgs] = tuple[NDArray[np.float64], *tuple[NDArray[np.float64], ...]]
meaning a tuple consisting of at least one NDArray[np.float64]
as
first element followed by zero or more NDArray[np.float64]
. Note
that tuple[NDArray[np.float64], *tuple[NDArray[np.float64], ...]]
cannot be rewritten as tuple[NDArray[np.float64], ...]
as it would
mean a tuple consisting of zero or more NDArray[np.float64]
.
LifetimeData
factory#
The ParametricLifetimeModel
fitting process uses a Likelihood
object to estimate model parameters. In survival analysis, the
contribution of each observation to the likelihood depends on the type
of lifetime observation (complete, right censored, etc.) and any
truncations. Therefore, it is necessary to parse the data provided by
users and categorize each observation.
To accomplish this task, we use LifetimeReader
objects, which are
responsible for parsing lifetime data. These objects are then used in a
factory object called lifetime_data_factory
to construct a
LifetimeData
object. This object encapsulates each group of lifetime
data in an IndexedData
object, which keeps track of the index of the
original data.
IndexedData
can be thought of as a simplified version of
pandas.Series
that only allows for the intersection or union of data
based. For example, you can use: - intersection(*others)
to get
observations that are left truncated and complete. - union(*others)
to get observations that are complete or right censored.
Additionally, all values of lifetime data are stored as 2D arrays, which makes probability computations more homogeneous in cases where there are covariates.
Why a factory ? The advantage of using a factory is that it
decouples the process of reading data and creating LifetimeData
objects. This makes it much easier to create variations of the reader
process if needed and isolate code in a cleaner way.
Other considerations#
There are a few constraints that must be followed when using the
ParametricModel
object:
At the model level, a user cannot request methods of a model if one of the
params
values isnp.nan
. All parameter values must be passed at the instantiation or the empty model must be fit before any requests are made.At the model level,
params
cannot be set individually or by name. The user can only set all param values at once using a single setter. If a user wants to controlparams
names, they can use theparams_names
getter or the string representation of the instance.