Version: 0.1.5
The utiml package is a framework to support multi-label processing, like Mulan on Weka. It is simple to use and extend. This tutorial explain the main topics related with the utiml package. More details and examples are available on utiml repository.
The general prupose of utiml is be an alternative to processing multi-label in R. The main methods available on this package are organized in the groups:
The utiml package needs of the mldr package to handle multi-label datasets. It will be installed together with the utiml1.
The installation process is similar to other packages available on CRAN:
After installed, you can now load the utiml package (The mldr package will be also loaded):
## Loading required package: mldr
## Loading required package: parallel
## Loading required package: ROCR
The utiml brings two multi-label datasets. A
synthetic toy dataset called toyml
and a real world dataset
called foodtruck
. To understand how to load your own
dataset, we suggest the read of mldr documentation.
The toyml
contains 100 instances, 10 features and 5 labels,
its prupose is to be used for small tests and examples.
## att1 att2 att3 att4 att5 att6 att7
## 1 -0.150258 0.000461 0.237302 0.004333 0.086273 0.611953 -0.040632
## 2 0.219093 0.023877 0.038309 -0.041287 0.013978 0.277978 0.147673
## 3 0.137491 0.042125 0.011613 0.066545 0.388947 -0.312591 -0.163133
## 4 -0.318716 -0.054081 0.005198 0.085436 0.660657 0.011783 0.096005
## 5 0.004815 0.659007 0.023343 -0.135839 0.063470 -0.207688 0.091519
## 6 0.336280 -0.140629 -0.032099 -0.365930 0.004982 0.124665 -0.133950
## iatt8 iatt9 ratt10 y1 y2 y3 y4 y5
## 1 -0.215861 0.447483 0.611953 1 1 0 1 0
## 2 -0.592199 -0.164926 0.277978 1 1 0 1 0
## 3 -0.426994 -0.564884 -0.312591 1 1 0 1 0
## 4 -0.526278 0.505936 0.011783 1 1 0 0 1
## 5 0.170262 0.389038 -0.207688 1 1 0 0 0
## 6 0.652938 0.961077 0.124665 1 1 0 0 0
The foodtruck
contains different types of cousines to be
predicted from user preferences and habits. The dataset has 12
labels:
## index count freq IRLbl SCUMBLE SCUMBLE.CV
## street_food 22 295 0.72481572 1.000000 0.1249889 1.0276904
## gourmet 23 120 0.29484029 2.458333 0.1396873 0.7994104
## italian_food 24 43 0.10565111 6.860465 0.2059097 0.4101859
## brazilian_food 25 72 0.17690418 4.097222 0.1463292 0.7305315
## mexican_food 26 41 0.10073710 7.195122 0.2491880 0.2759161
## chinese_food 27 16 0.03931204 18.437500 0.2831969 0.3981316
## japanese_food 28 36 0.08845209 8.194444 0.2113363 0.5936371
## arabic_food 29 25 0.06142506 11.800000 0.2840999 0.3441985
## snacks 30 67 0.16461916 4.402985 0.1526898 0.5780729
## healthy_food 31 33 0.08108108 8.939394 0.2138170 0.5414302
## fitness_food 32 30 0.07371007 9.833333 0.2268195 0.5120827
## sweets_desserts 33 154 0.37837838 1.915584 0.1730439 0.5959228
In the following section, an overview of how to conduct a multi-label experiment are explained. Next, we explores each group of methods and its particularity.
After load the multi-label dataset some data processing may be
necessary. The pre-processing methods are utilities that manipulate the
mldr
datasets. Suppose that we want to normalize the
attributes values (between 0 and 1), we can do:
Next, we want to stratification the dataset in two partitions (train and test), containing 65% and 35% of instances respectively, then we can do:
## [1] "train" "test"
Now, the ds
object has two elements
ds$train
and ds$test
, where the first will be
used to create a model and the second to test the model. For example,
using the Binary Relevance multi-label method with the base
algorithm Random Forest2, we can do:
The prediction
is an object of class
mlresult
that contains the probability (also called
confidence or score) and the bipartitions values:
## y1 y2 y3 y4 y5
## 8 0 1 0 1 0
## 10 1 1 0 1 0
## 12 0 1 0 1 0
## 14 0 1 0 1 0
## 18 0 1 0 1 0
## 19 0 1 0 0 0
## y1 y2 y3 y4 y5
## 8 0.196 0.948 0.046 0.904 0.216
## 10 0.588 0.956 0.006 0.610 0.104
## 12 0.188 0.942 0.090 0.698 0.248
## 14 0.384 0.850 0.040 0.798 0.280
## 18 0.292 0.838 0.416 0.640 0.296
## 19 0.088 0.844 0.316 0.472 0.086
## y1 y2 y3 y4 y5
## 8 4 1 5 2 3
## 10 3 1 5 2 4
## 12 4 1 5 2 3
## 14 3 1 5 2 4
## 18 5 1 3 2 4
## 19 4 1 3 2 5
A threshold strategy can be applied:
## y1 y2 y3 y4 y5
## 8 0 1 0 1 0
## 10 0 1 0 1 0
## 12 0 1 0 1 0
## 14 0 1 0 1 0
## 18 0 1 0 1 0
## 19 0 1 0 1 0
Now we can evaluate the models and compare if the use of the MCUT threshold improved the results:
result <- multilabel_evaluate(ds$tes, prediction, "bipartition")
thresres <- multilabel_evaluate(ds$tes, newpred, "bipartition")
round(cbind(Default=result, RCUT=thresres), 3)
## Default RCUT
## F1 0.681 0.718
## accuracy 0.557 0.581
## hamming-loss 0.223 0.211
## macro-AUC 0.565 0.565
## macro-F1 0.480 0.434
## macro-precision 0.587 0.490
## macro-recall 0.463 0.448
## micro-AUC 0.814 0.814
## micro-F1 0.711 0.730
## micro-precision 0.706 0.714
## micro-recall 0.716 0.746
## precision 0.710 0.714
## recall 0.752 0.810
## subset-accuracy 0.143 0.114
Details of the labels evaluation can be obtained using:
## AUC F1 accuracy balacc precision recall TP TN FP FN
## y1 0.7133333 0.2857143 0.8571429 0.5833333 0.5000000 0.2000000 1 29 1 4
## y2 0.6620370 0.8852459 0.8000000 0.5625000 0.7941176 1.0000000 27 1 7 0
## y3 0.6494253 0.2500000 0.8285714 0.5660920 0.5000000 0.1666667 1 28 1 5
## y4 0.2859848 0.6923077 0.5428571 0.4204545 0.6428571 0.7500000 18 1 10 6
## y5 0.5166667 0.2857143 0.8571429 0.5833333 0.5000000 0.2000000 1 29 1 4
The pre-processing methods were developed to facilitate some operations with the multi-label data. Each pre-processing method receives a mldr dataset and returns other mldr dataset. You can use them as needed.
Here, an overview of the pre-processing methods:
# Fill sparse data
mdata <- fill_sparse_mldata(toyml)
# Remove unique attributes
mdata <- remove_unique_attributes(toyml)
# Remove the attributes "iatt8", "iatt9" and "ratt10"
mdata <- remove_attributes(toyml, c("iatt8", "iatt9", "ratt10"))
# Remove labels with less than 10 positive or negative examples
mdata <- remove_skewness_labels(toyml, 10)
# Remove the labels "y2" and "y3"
mdata <- remove_labels(toyml, c("y2", "y3"))
# Remove the examples without any labels
mdata <- remove_unlabeled_instances(toyml)
# Replace nominal attributes
mdata <- replace_nominal_attributes(toyml)
# Normalize the predictive attributes between 0 and 1
mdata <- normalize_mldata(mdata)
If you want to create a specific or a random subset of a dataset, you
can use the methods create_subset
and
create_random_subset
, respectively. In the first case, you
should specify which rows and optionally attributes, you want. In the
second case, you just define the number of instances and optionally the
number of attributes.
# Create a subset of toyml dataset with the even instances and the first five attributes
mdata <- create_subset(toyml, seq(1, 100, 2), 1:5)
# Create a subset of toyml dataset with the ten first instances and all attributes
mdata <- create_subset(toyml, 1:10)
# Create a random subset of toyml dataset with 30 instances and 6 attributes
mdata <- create_random_subset(toyml, 30, 6)
# Create a random subset of toyml dataset with 7 instances and all attributes
mdata <- create_random_subset(toyml, 7)
To create two or more partitions of the dataset, we use the method
create_holdout_partition
. The first argument is a mldr
dataset, the second is the size of partitions and the third is the
partition method. The options are: random
,
iterative
and stratified
. The
iterative
is a stratification by label and the
stratified
is a stratification by labelset. The return of
the method is a list with the names defined by the second parameter. See
some examples:
# Create two equal partitions using the 'iterative' method
toy <- create_holdout_partition(toyml, c(train=0.5, test=0.5), "iterative")
## toy$train and toy$test is a mldr object
# Create three partitions using the 'random' method
toy <- create_holdout_partition(toyml, c(a=0.4, b=0.3, c=0.3))
## Use toy$a, toy$b and toy$c
# Create two partitions using the 'stratified' method
toy <- create_holdout_partition(toyml, c(0.6, 0.4), "stratified")
## Use toy[[1]] and toy[[2]]
The simplest way to run a k-fold cross validation is by using the
method cv
:
results <- cv(foodtruck, br, base.algorith="SVM", cv.folds=5,
cv.sampling="stratified", cv.measures="example-based",
cv.seed=123)
round(results, 4)
## F1 accuracy hamming-loss precision recall
## 0.5191 0.4408 0.1519 0.6982 0.4810
## subset-accuracy
## 0.2580
To obtain detailed results of the folds, use the parameter
cv.results
, such that:
results <- cv(toyml, "rakel", base.algorith="RF", cv.folds=10, cv.results=TRUE,
cv.sampling="random", cv.measures="example-based")
#Multi-label results
round(results$multilabel, 4)
## F1 accuracy hamming-loss precision recall subset-accuracy
## [1,] 0.6500 0.5500 0.22 0.7500 0.6667 0.3
## [2,] 0.6633 0.5583 0.26 0.7000 0.6833 0.2
## [3,] 0.6233 0.5000 0.28 0.6833 0.6833 0.1
## [4,] 0.7800 0.6833 0.16 0.8000 0.8167 0.4
## [5,] 0.6467 0.5083 0.30 0.6500 0.7833 0.1
## [6,] 0.6433 0.5167 0.24 0.6500 0.7333 0.1
## [7,] 0.6400 0.5167 0.26 0.7000 0.6500 0.1
## [8,] 0.7200 0.6083 0.24 0.7833 0.7500 0.3
## [9,] 0.5233 0.4000 0.32 0.6667 0.5000 0.0
## [10,] 0.8067 0.7000 0.14 0.8500 0.8500 0.3
## y1 y2 y3 y4 y5
## accuracy 0.82 0.76 0.78 0.6000 0.83
## balacc NaN NaN NaN 0.4355 0.50
## TP 0.10 7.30 0.10 6.0000 0.00
## TN 8.10 0.30 7.70 0.0000 8.30
## FP 0.20 1.90 0.40 3.1000 0.00
## FN 1.60 0.50 1.80 0.9000 1.70
Finally, to manually run a k-fold cross validation, you can use the
create_kfold_partition
. The return of this method is an
object of type kFoldPartition
that will be used with the
method partition_fold
to create the datasets:
# Create 3-fold object
kfcv <- create_kfold_partition(toyml, k=3, "iterative")
result <- lapply(1:3, function (k) {
toy <- partition_fold(kfcv, k)
model <- br(toy$train, "RF")
predict(model, toy$test)
})
# Create 5-fold object and use a validation set
kfcv <- create_kfold_partition(toyml, 5, "stratified")
result <- lapply(1:5, function (k) {
toy <- partition_fold(kfcv, k, has.validation=TRUE)
model <- br(toy$train, "RF")
list(
validation = predict(model, toy$validation),
test = predict(model, toy$test)
)
})
The multi-label classification is a supervised learning task that seeks to learn and predict one or more labels together. This task can be grouped in: problem transformation and algorithm adaptation. Next, we provide more details about the methods and their specifities.
The transformation methods require a base algorithm (binary or multi-class) and use their predictions to compose the multi-label result. In the utiml package there are some default base algorithms that are accepted.
Each base algorithm requires a specific package, you need to install manually it, because they are not installed together with utiml. The follow algorithm learners are supported:
Use | Name | Package | Call |
---|---|---|---|
CART | Classification and regression trees | rpart | rpart::rpart(…) |
C5.0 | C5.0 Decision Trees and Rule-Based Models | C50 | C50::C5.0(…) |
KNN | K Nearest Neighbor | kknn | kknn::kknn(…) |
MAJORITY | Majority class prediction | - | - |
NB | Naive Bayes | e1071 | e1071::naiveBayes(…) |
RANDOM | Random prediction | - | - |
RF | Random Forest | randomForest | randomForest::randomForest(…) |
SVM | Support Vector Machine | e1071 | e1071::svm(…) |
XGB | eXtreme Gradient Boosting | xgboost | xgboost::xgboost(…) |
To realize a classification first it is necessary to create a multi-label model, the available methods are:
Method | Name | Approach |
---|---|---|
br | Binary Relevance (BR) | one-against-all |
brplus | BR+ | one-against-all; stacking |
cc | Classifier Chains | one-against-all; chaining |
clr | Calibrated Label Ranking (CLR) | one-versus-one |
dbr | Dependent Binary Relevance (DBR) | one-against-all; stacking |
ebr | Ensemble of Binary Relevance (EBR) | one-against-all; ensemble |
ecc | Ensemble of Classifier Chains (ECC) | one-against-all; ensemble |
eps | Ensemble of Pruned Set (EPS) | powerset |
homer | Hierarchy Of Multi-label classifiER (HOMER) | hierarchy |
lift | Learning with Label specIfic FeaTures (LIFT) | one-against-all |
lp | Label Powerset (LP) | powerset |
mbr | Meta-Binary Relevance (MBR or 2BR) | one-against-all; stacking |
ns | Nested Stacking (NS) | one-against-all; chaining |
ppt | Pruned Problem Transformation (PPT) | powerset |
prudent | Pruned and Confident Stacking Approach (Prudent) | one-against-all; stacking |
ps | Pruned Set (PS) | powerset |
rakel | Random k-labelsets (RAkEL) | powerset |
rdbr | Recursive Dependent Binary Relevance (RDBR) | one-against-all; stacking |
rpc | Ranking by Pairwise Comparison (RPC) | one-versus-one |
The first and second parameters of each multi-label method is always the same: The multi-label dataset and the base algorithm, respectively. However, they may have specific parameters, examples:
#Classifier chain with a specific chain
ccmodel <- cc(toyml, "RF", chain = c("y5", "y4", "y3", "y2", "y1"))
# Ensemble with 5 models using 60% of sampling and 75% of attributes
ebrmodel <- ebr(toyml, "C5.0", m = 5, subsample=0.6, attr = 0.75)
Beyond the parameters of each multi-label methods, you can define the parameters for the base algorithm, like this:
# Specific parameters for SVM
brmodel <- br(toyml, "SVM", gamma = 0.1, scale=FALSE)
# Specific parameters for KNN
ccmodel <- cc(toyml, "KNN", c("y5", "y4", "y3", "y2", "y1"), k=5)
# Specific parameters for Random Forest
ebrmodel <- ebr(toyml, "RF", 5, 0.6, 0.75, proximity=TRUE, ntree=100)
After build the model, To predict new data use the
predict
method. Here, some predict methods require specific
arguments and you can assign arguments for the base method too. For
default, all base learner will predict the probability of prediciton,
then do not use these parameters. Instead of, use the
probability
parameter defined by the multi-label prediction
method.
# Predict the BR model
result <- predict(brmodel, toyml)
# Specific parameters for KNN
result <- predict(ccmodel, toyml, kernel="triangular", probability = FALSE)
An object of type mlresult
is the return of predict
method. It always contains the bipartitions and the probabilities
values. So you can use: as.bipartition
,
as.probability
and as.ranking
for specific
values.
Until now, only a single adaptation method is available the
mlknn
.
Almost all multi-label methods can run in parallel. The train and
prediction methods receive a parameter called cores
that
specify the number of cores used to run the method. For some multi-label
methods are not possible running in multi-core, then read the
documentation of each method, for more details.
# Running Binary Relevance method using 2 cores
brmodel <- br(toyml, "SVM", cores=2)
prediction <- predict(brmodel, toyml, cores=2)
If you need of reproducibility, you can set a specific seed:
# Running Binary Relevance method using 2 cores
brmodel <- br(toyml, "SVM", cores=2, seed=1984)
prediction <- predict(brmodel, toyml, seed=1984, cores=2)
The cv
method also supports multicores:
The threshold methods receive a mlresult
object and
return a new mlresult
, except for scut
that
returns the threshold values. These methods, change mainly the
bipartitions values using the probabilities values.
# Use a fixed threshold for all labels
newpred <- fixed_threshold(prediction, 0.4)
# Use a specific threshold for each label
newpred <- fixed_threshold(prediction, c(0.4, 0.5, 0.6, 0.7, 0.8))
# Use the MCut approch to define the threshold
newpred <- mcut_threshold(prediction)
# Use the PCut threshold
newpred <- pcut_threshold(prediction, ratio=0.65)
# Use the RCut threshold
newpred <- rcut_threshold(prediction, k=3)
# Choose the best threshold values based on a Mean Squared Error
thresholds <- scut_threshold(prediction, toyml, cores = 2)
newpred <- fixed_threshold(prediction, thresholds)
#Predict only the labelsets present in the train data
newpred <- subset_correction(prediction, toyml)
To evaluate multi-label models you can use the method
multilabel_evaluate
. There are two ways of call this
method:
toy <- create_holdout_partition(toyml)
brmodel <- br(toy$train, "SVM")
prediction <- predict(brmodel, toy$test)
# Using the test dataset and the prediction
result <- multilabel_evaluate(toy$test, prediction)
print(round(result, 3))
## F1 accuracy average-precision clp
## 0.737 0.631 0.816 0.400
## coverage hamming-loss macro-AUC macro-F1
## 2.133 0.207 0.510 0.347
## macro-precision macro-recall margin-loss micro-AUC
## 0.307 0.400 1.200 0.756
## micro-F1 micro-precision micro-recall mlp
## 0.748 0.767 0.730 0.600
## one-error precision ranking-loss recall
## 0.267 0.767 0.219 0.761
## subset-accuracy wlp
## 0.267 0.600
# Build a confusion matrix
confmat <- multilabel_confusion_matrix(toy$test, prediction)
result <- multilabel_evaluate(confmat)
print(confmat)
## Multi-label Confusion Matrix
##
## Absolute Matrix:
## -------------------------------------
## Expected_1 Expected_0 TOTAL
## Prediction_1 46 14 60
## Predicion_0 17 73 90
## TOTAL 63 87 150
##
## Proportinal Matrix:
## -------------------------------------
## Expected_1 Expected_0 TOTAL
## Prediction_1 0.307 0.093 0.4
## Predicion_0 0.113 0.487 0.6
## TOTAL 0.420 0.580 1.0
##
## Label Matrix
## -------------------------------------
## TP FP FN TN Correct Wrong %TP %FP %FN %TN %Correct %Wrong MeanRanking
## y1 0 0 3 27 27 3 0.00 0.00 0.10 0.90 0.90 0.10 3.70
## y2 22 8 0 0 22 8 0.73 0.27 0.00 0.00 0.73 0.27 1.00
## y3 0 0 6 24 24 6 0.00 0.00 0.20 0.80 0.80 0.20 3.43
## y4 24 6 0 0 24 6 0.80 0.20 0.00 0.00 0.80 0.20 2.00
## y5 0 0 8 22 22 8 0.00 0.00 0.27 0.73 0.73 0.27 4.87
## MeanScore
## y1 0.19
## y2 0.80
## y3 0.20
## y4 0.64
## y5 0.13
The confusion matrix summarizes a lot of data, and can be merged. For example, using a k-fold experiment:
kfcv <- create_kfold_partition(toyml, k=3)
confmats <- lapply(1:3, function (k) {
toy <- partition_fold(kfcv, k)
model <- br(toy$train, "RF")
multilabel_confusion_matrix(toy$test, predict(model, toy$test))
})
result <- multilabel_evaluate(merge_mlconfmat(confmats))
Its possible choose which measures will be computed:
# Example-based measures
result <- multilabel_evaluate(confmat, "example-based")
print(names(result))
## [1] "F1" "accuracy" "hamming-loss" "precision"
## [5] "recall" "subset-accuracy"
# Subset accuracy, F1 measure and hamming-loss
result <- multilabel_evaluate(confmat, c("subset-accuracy", "F1", "hamming-loss"))
print(names(result))
## [1] "F1" "hamming-loss" "subset-accuracy"
# Ranking and label-basedd measures
result <- multilabel_evaluate(confmat, c("label-based", "ranking"))
print(names(result))
## [1] "average-precision" "coverage" "macro-AUC"
## [4] "macro-F1" "macro-precision" "macro-recall"
## [7] "margin-loss" "micro-AUC" "micro-F1"
## [10] "micro-precision" "micro-recall" "one-error"
## [13] "ranking-loss"
## [1] "F1" "accuracy" "all"
## [4] "average-precision" "bipartition" "clp"
## [7] "coverage" "example-based" "hamming-loss"
## [10] "label-based" "label-problem" "macro-AUC"
## [13] "macro-F1" "macro-based" "macro-precision"
## [16] "macro-recall" "margin-loss" "micro-AUC"
## [19] "micro-F1" "micro-based" "micro-precision"
## [22] "micro-recall" "mlp" "one-error"
## [25] "precision" "ranking" "ranking-loss"
## [28] "recall" "subset-accuracy" "wlp"
The utiml repository is available on (https://github.com/rivolli/utiml). If you want to contribute with the development of this package, contact us and you will be very welcome.
Please, report any bugs or suggestions on CRAN mail or git hub page.
You may also be interested in mldr.datasets↩︎
Requires the randomForest package.↩︎