::install_github("Abraham1011/degradr") devtools
degradr
Introduction
Accurately estimating the Remaining Useful Life (RUL) of degrading systems is essential for predictive maintenance, safety assurance, and optimal resource allocation across various engineering domains.
linear mixed effects models offer a principled way to account for both population-level degradation trends and unit-specific deviations. When combined with Bayesian updating, they enable real-time prediction of RUL as new sensor readings are collected. Furthermore, in multivariate settings where multiple sensors monitor the same component, health index construction via data fusion provides a powerful reduction strategy that preserves prognostic value while simplifying the modeling process.
This post introduces the R package degradr
, a toolbox designed to estimate RUL from univariate or multivariate degradation signals using linear mixed effects models. The package supports signal fusion, Bayesian posterior computation, and visualization, making it suitable for both exploratory and operational use.
Motivation and Scope
The statistical methodology implemented in degradr
is not new: it is based on the degradation modeling framework of Lu and Meeker (Lu and Meeker 1993), extended with multivariate signal fusion and Bayesian prediction techniques as proposed by Liu and Huang (Liu and Huang 2016). Despite its theoretical maturity and demonstrated performance in prognostic tasks, this methodology had not yet been made available in a reusable R library.
The main motivation behind degradr
is thus reproducibility and accessibility. By encapsulating the full workflow—from signal fusion to posterior predictive distributions—in a single R package, practitioners and researchers can now apply these techniques without having to reimplement complex routines from scratch. Additionally, the package serves as a didactic tool for understanding degradation modeling via mixed effects models and Bayesian updating.
This implementation aims to bridge the gap between advanced degradation modeling techniques and their practical application, particularly in industrial and reliability engineering settings.
Methodology
We follow the mixed effects degradation modeling framework originally proposed by Lu and Meeker (1993), extended by Liu and Huang (2016) to include multivariate sensor fusion and Bayesian updating.
A general degradation model with random effects can be written as:
\[ L_t = \eta(\phi, \theta, t) + \varepsilon_t \]
where:
\(\eta(\cdot)\) is a parametric degradation function,
\(\phi\) is a vector of fixed effects (common to all units),
\(\theta\) is a vector of random effects (unit-specific deviations),
and \(\varepsilon\_t\) is an i.i.d. error term representing measurement noise.
To operationalize this, we consider a polynomial model of degree ( p ):
\[ L_{i,j,t} = \sum_{k = 0}^{p} \theta_{i,j}^{(k)} t^k + \varepsilon_{i,j,t} \]
where:
\(L\_{i,j,t}\) is the degradation signal of unit \(i\), \(j\) at time \(t\),
\(p\) is the order of the polynomial model,
\(\Gamma_{i,j} = (\theta_{i,j}^{(0)},\dots,\theta_{i,j}^{(p)})\) is the vector of random effects parameters that is often assumed to follow a multivariate normal distribution: \(\Gamma_{i,j} \sim \mathcal{N}_{p+1}(u_j^0,\Sigma_j^0)\)
For exponentially degrading systems and \(p = 2\), the model is expressed as:
\[ x_{i,j,t} = \phi_j + \alpha_{i,j} \exp\left(\theta^{(1)}_{i,j} t + \theta^{(2)}_{i,j} t^2 + \varepsilon_{i,j,t} - \frac{\sigma_j^2}{2}\right) \]
Taking logarithms and subtracting the baseline \(\phi_j\), we obtain a linearized version:
\[ L_{i,j,t} = \log(x_{i,j,t} - \phi_j) = \theta^{(0)}_{i,j} + \theta^{(1)}_{i,j} t + \theta^{(2)}_{i,j} t^2 + \varepsilon_{i,j,t} \]
with \(\theta^{(0)}_{i,j}=ln\alpha_{i,j}-\sigma_j^2/2\).
Posterior Distribution of Random Effects
Let \(L_{i,j} = [L_{i,j,1},\dots,L_{i,j,n_i}]^\top\) be the observed degradation signal for unit \(i\), sensor \(j\), up to time \(n_i\). The posterior distribution of \(\Gamma\_{i,j}\) is:
\[ \Gamma_{i,j} \mid L_{i,j} \sim \mathcal{N}_{p+1}(u^1_j, \Sigma^1_j) \]
where:
\[ u^1_j = \left( \frac{\Psi_i^\top \Psi_i}{\sigma_j^2} + (\Sigma^0_j)^{-1} \right)^{-1} \left( \frac{\Psi_i^\top L_{i,j}}{\sigma_j^2} + (\Sigma^0_j)^{-1} u^0_j \right) \]
\[ \Sigma^1_j = \left( \frac{\Psi_i^\top \Psi_i}{\sigma_j^2} + (\Sigma^0_j)^{-1} \right)^{-1} \]
with:
\[ \Psi_i = \begin{bmatrix} 1 & t_1 & \cdots & t_1^p \\ 1 & t_2 & \cdots & t_2^p \\ \vdots & \vdots & \ddots & \vdots \\ 1 & t_{n_i} & \cdots & t_{n_i}^p \end{bmatrix} \in \mathbb{R}^{n_i \times (p+1)} \]
Remaining Life Distribution
Let \(\tilde{T}\_{i,j}\) be the estimated Remaining Useful Life (RUL) for unit \(i\), sensor \(j\), based on the observed degradation signal.
Assume the failure threshold \(D_j\sim\mathcal{N}(u_j^d,v_j^d)\) is a random variable estimated from historical failure times. The degradation trajectory projected forward is:
\[ L_{i,j,n_i + t} = \theta^{(0)}_{i,j} + \cdots + \theta^{(p)}_{i,j} (n_i + t)^p + \varepsilon_{i,j,n_i+t} \]
which is normally distributed with:
- Mean:
\[ \tilde{u}_{i,j,n_i+t} = [1, (n_i + t), \dots, (n_i + t)^p] u^1_j \]
- Variance:
\[ \tilde{\sigma}^2_{i,j,n_i+t} = [1, (n_i + t), \dots, (n_i + t)^p] \, \Sigma^1_j \, [1, (n_i + t), \dots, (n_i + t)^p]^\top \]
Then the conditional cumulative distribution function (cdf) of \(\tilde{T}_{i,j}\), given \(L_{i,j}\), is:
\[ F_{\tilde{T}_{i,j} \mid L_{i,j}}(t) = P(\tilde{T}_{i,j} \le t \mid L_{i,j}) = \Phi\left( \frac{ \tilde{u}_{i,j,n_i+t} - u^d_j }{ \sqrt{ \tilde{\sigma}^2_{i,j,n_i+t} + v^d_j } } \right) = \Phi(g(t)) \]
Truncating to non-negative RUL:
\[ P(\tilde{T}_{i,j} \le t \mid L_{i,j}, \tilde{T}_{i,j} \ge 0) = \frac{ \Phi(g(t)) - \Phi(g(0)) }{ 1 - \Phi(g(0)) } \]
If the failure threshold \(D_j\) is fixed:
\[ F_{\tilde{T}_{i,j} \mid L_{i,j}}(t) = 1 - \Phi\left( \frac{ D_j - \tilde{u}_{i,j,n_i+t} }{ \sqrt{ \tilde{\sigma}^2_{i,j,n_i+t} } } \right) =1 - \Phi(g_2(t)) \]
and the conditional distribution becomes:
\[ P(\tilde{T}_{i,j} \le t \mid L_{i,j}, \tilde{T}_{i,j} \ge 0) = \frac{ \Phi(g_2(0)) - \Phi(g_2(t)) }{ \Phi(g_2(0)) } \]
Data-Level Fusion Based on Selected Degradation Models
When multiple sensor signals are available and correlated with degradation, we adopt a data-level fusion approach to construct a univariate health index. This health index serves as a surrogate for the multivariate degradation process and simplifies the downstream modeling task. This index seeks to jointly minimize the error of the model fit and the variance of the failure threshold \(D_j\)
The health index is obtained by solving the following optimization problem:
\[ \text{obj} = \min_{\boldsymbol{w}} \ \boldsymbol{w}^\top \mathbf{A} \boldsymbol{w} \]
subject to:
\[ \boldsymbol{w}^\top \mathbf{M}^\top \mathbf{1} = 1, \quad \mathbf{M} \boldsymbol{w} \ge 0 \]
For more details see (Liu and Huang 2016).
Features of degradr
The degradr
package provides a compact and flexible toolbox to model degradation processes and predict Remaining Useful Life (RUL). It is designed to accommodate both univariate and multivariate degradation signals, and supports linear and exponential trajectories. Its main features include:
1. Support for Univariate and Multivariate Signals
- Univariate modeling is performed directly using raw degradation measurements from a single sensor.
- Multivariate modeling leverages a data fusion procedure to construct a health index from multiple correlated sensor readings. This fusion balances model fit and failure threshold variance, as proposed in (Liu and Huang 2016).
2. Linear and Exponential Mixed Effects Models
- Both linear and exponential degradation paths are supported.
- Models are estimated via:
nlme::lme()
for hierarchical mixed-effects modeling, or- an alternative method based on per-unit fits (when
method = "lm"
).
3. Bayesian Updating and Posterior RUL Estimation
- Once a model is trained on historical units,
degradr
uses Bayesian updating to compute the posterior distribution of model parameters for new units. - Remaining Life Distributions (RLDs) are then derived via forward simulation using these posteriors.
4. Core Functions
Function | Description |
---|---|
fit_model() |
Fits a linear or exponential mixed-effects model to univariate signals. |
fit_healthindex() |
Constructs and models a health index from multivariate signals. |
predict_rul() |
Estimates the Remaining Useful Life (RUL) for one or more partially observed degradation signals based on a previously fitted linear or exponential mixed-effects model. |
plot_degradr() |
Visualizes degradation trajectories and thresholds across units. |
5. Built-in Example Datasets
The package includes several datasets to facilitate testing and demonstration:
filter_train
andfilter_test
: Gas filter degradation data with censored RULs.train_FD001
andtest_FD001
: Subsets of the NASA C-MAPSS turbofan engine dataset.RUL_FD001
: Ground-truth RULs for evaluation.
These datasets allow users to experiment with real-world degradation scenarios and validate model performance.
Together, these features make degradr
a practical and educational tool for studying degradation phenomena and predictive maintenance workflows.
Examples
Install the package from GitHub
Example for a degradation signal
We begin with a simple example using a univariate degradation signal. The dataset filter_train
contains sensor readings from 49 gas filters. Failure is defined when the differential pressure exceeds 600 Pa. Our goal is to estimate the Remaining Useful Life (RUL) for new units using a parametric mixed-effects model.
1. Load and visualize the training data
library(degradr)
library(ggplot2)
data(filter_train)
data(filter_test)
# Rename columns as expected
colnames(filter_train) <- c("t", "x", "unit")
colnames(filter_test) <- c("t", "x", "unit", "RUL")
# Visualize degradation trajectories
<- ggplot(filter_train, aes(x = t, y = x, group = factor(unit), color = factor(unit))) +
g geom_line(size = 0.7, alpha = 0.8) +
geom_hline(yintercept = 600, linetype = "dashed", color = "red", linewidth = 1) +
labs(
title = "Degradation Trajectories of Filter Units",
subtitle = "Dashed red line indicates failure threshold (600 Pa)",
x = "Time",
y = "Differential Pressure (Pa)",
color = "Unit ID"
+
) theme_minimal(base_size = 13) +
theme(
legend.position = "none",
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(margin = margin(b = 10)),
axis.title = element_text(face = "bold"),
panel.grid.minor = element_blank()
+
) scale_color_viridis_d(option = "D")
print(g)
2. Fit a mixed-effects degradation model
<- fit_model(
model data = filter_train,
type = "exponential",
degree = 1
)
3. Predict Remaining Useful Life (RUL)
<- predict_rul(
rul_estimates data = filter_test,
model = model,
D = 600 # Failure threshold
)
head(rul_estimates)
unit RUL
1 1 39.884159
2 2 57.679633
3 3 7.585585
4 4 2.264131
5 5 16.828236
6 6 81.782255
Example with multiple signals
In this example, we use data from the C-MAPSS dataset (train_FD001 and test_FD001) which contains multiple sensor signals from simulated aircraft engine degradation. We aim to construct a health index from these signals and estimate RUL accordingly.
1. Load and preprocess data
library(dplyr)
data(train_FD001)
data(test_FD001)
# Select relevant sensor variables and adjust sign (all signals should trend upward)
<- train_FD001 %>%
train select(unit, t, T24, T50, P30, Nf, Ps30, phi, NRf, BPR, htBleed, W31, W32) %>%
mutate(across(c(P30, phi, W31, W32), ~ . * -1))
<- test_FD001 %>%
test select(unit, t, T24, T50, P30, Nf, Ps30, phi, NRf, BPR, htBleed, W31, W32) %>%
mutate(across(c(P30, phi, W31, W32), ~ . * -1))
2. Fit a health index model
<- fit_healthindex(
hi_model data = train,
type = "exponential",
degree = 2,
r = 0.8
)
3. Predict RUL using the health index model
<- predict_rul(
rul_hi data = test,
model = hi_model
)
head(rul_hi)
unit RUL
1 1 166.61559
2 2 124.49630
3 3 82.04446
4 4 80.91050
5 5 88.60125
6 6 119.89087