MLCOPULA

R
Copulas
Statistics
MLCOPULA package for supervised classification.
Author

Pedro Abraham Montoya Calzada

Published

October 23, 2024

Overview

MLCOPULA is a package that provides several classifiers based on probabilistic models. These classifiers allow to model the dependence structure of continuous features through bivariate copula functions and graphical models, see Salinas-Gutiérrez et al. (2014).

The package has been published and is available in the official R CRAN repository:  https://CRAN.R-project.org/package=MLCOPULA

Methodology

This package implements 7 copulas for supervised classification: frank, gaussian, clayton, joe, gumbel, AMH and grid. The classification model is based on the Bayes theorem, similar to the naive Bayes classifier model, but does not assume that the features are independent.

The probability of a class given a set of characteristics (predictor variables) is:

\[P(A|x_1,..x_d) \alpha \prod_{i = 1}^{d}f_{X_i|A}(x_i)c(u_1,...,u_i)\]

where each \(u_i = F_{X_i|A}(x_i)\) with \(i = 1,2,..d\).

The copula density function \(c(u_1,..u_i)\) is modeled by bivariate copula functions, using graphical models (trees and chains)

Copulas

Frank copula:

\[C(u_1,u_2;\theta) = -\frac{1}{\theta} ln \left[ 1 + \frac{(e^{-\theta u_1} - 1) (e^{-\theta u_2} - 1) } {e^{-\theta} - 1} \right]\]

with \(\theta \in (-\infty,\infty)/0\)

This copula has no upper nor lower tail dependency.

Clayton copula:

\[C(u_1,u_2;\theta) = \left( u_1^{-\theta} + u_2^{-\theta} - 1 \right)^{-1/\theta}\]

with \(\theta \in [-1,\infty)/0\)

When \(\theta \geq 0\) has lower tail dependence equal to \(\lambda_L = 2^{-1/\theta}\)

Gaussiana (Normal) copula \[C(u_1,u_2;\theta) = \Phi_G (\Phi^{-1} (u_1) , \Phi^{-1} (u_2) )\]

with \(\theta \in (-1,1)\)

This copula has no upper nor lower tail dependency.

Joe copula \[C(u_1,u_2) = 1 - \left[ (1 - u_1)^\theta + (1 - u_2)^\theta - (1 - u_1)^\theta (1 - u_2)^\theta \right ] ^ {1/\theta}\]

with \(\theta \in [1,\infty)\)

This copula has upper tail dependence equal to \(\lambda_U = 2 - 2^{1/\theta}\)

Gumbel copula

\[C(u_1,u_2) = exp \left[ - \left[ ( -ln(u_1) )^\theta + ( -ln(u_2) )^\theta \right]^{1/\theta} \right]\]

with \(\theta \in [1,\infty)\)

This copula has upper tail dependence equal to \(\lambda_U = 2 - 2^{1/\theta}\)

Ali–Mikhail–Haq copula

\[C(u_1,u_2) = \frac{u_1 u_2}{1 - \theta (1 - u_1)(1- u_2)}\]

with \(\theta \in [-1,1)\)

This copula has no upper nor lower tail dependency.

Installation

# Install from CRAN
install.packages("MLCOPULA")

Quick Example

library(MLCOPULA)

model <- copulaClassifier(X = iris[,1:4],
                          y = iris$Species)
y_pred <- copulaPredict(X = iris[,1:4], model = model)
classification_report(iris$Species,y_pred$class)
$metrics
           precision recall f1-score
setosa          1.00   1.00     1.00
versicolor      0.98   0.98     0.98
virginica       0.98   0.98     0.98

$confusion_matrix
            y_pred
y_true       setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         49         1
  virginica       0          1        49

$accuracy
[1] 0.9866667

$mutual_information
[1] 1.033253