Publications:Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data

From ISLAB/CAISR
Revision as of 21:05, 6 May 2015 by Slawek (talk | contribs) (Created page with "<div style='display: none'> == Do not edit this section == </div> {{PublicationSetupTemplate|Author=Evaldas Vaiciukynas, Antanas Verikas, Adas Gelzinis, Marija Bacauskiene, Ir...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Do not edit this section

Property "Publisher" has a restricted application area and cannot be used as annotation property by a user. Property "Author" has a restricted application area and cannot be used as annotation property by a user. Property "Author" has a restricted application area and cannot be used as annotation property by a user. Property "Author" has a restricted application area and cannot be used as annotation property by a user. Property "Author" has a restricted application area and cannot be used as annotation property by a user. Property "Author" has a restricted application area and cannot be used as annotation property by a user.

Keep all hand-made modifications below

Title Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data
Author
Year 2015
PublicationType Journal Paper
Journal Chemometrics and Intelligent Laboratory Systems
HostPublication
Conference
DOI http://dx.doi.org/10.1016/j.chemolab.2015.04.018
Diva url http://hh.diva-portal.org/smash/record.jsf?searchId=1&pid=diva2:809939
Abstract

Multivariate permutation-based energy test of equal distributions is considered here. Approach is attributable to the emerging field of ε-statistics and uses natural logarithm of Euclidean distance for within-sample and between-sample components. Result from permutations is enhanced by a tail approximation through generalized Pareto distribution to boost precision of obtained p-values. Generalization from two-sample case to multiple samples is achieved by combining p-values through meta-analysis. Several strategies of varied statistical power are possible, while a maximum of all pairwise p-values is chosen here. Proposed approach is tested on several morphometric and chemometric data sets. Each data set is additionally transformed by principal component analysis for the purpose of dimensionality reduction and visualization in 2D space. Variable selection, namely, sequential search and multi-cluster feature selection, is applied to reveal in what aspects the groups differ most.

Morphometric data sets used: 1) survival data of house sparrows Passer domesticus; 2) orange and blue varieties of rock crabs Leptograpsus variegatus; 3) ontogenetic stages of trilobite species Trimerocephalus lelievrei; 4) marine phytoplankton species Prorocentrum minimum.

Chemometric data sets used: 1) essential oils composition of medicinal plant Hyptis suaveolensspecimens; 2) chemical information of olive oil samples; 3) elemental composition of biomass ash; 4) exchangeable cations of earth metals in forest soil samples.

Statistically significant differences between groups were successfully indicated, but the selection of variables had a profound effect on the result. Permutation-based energy test and it’s multi-sample generalization through meta-analysis proved useful as an unbalanced non-parametric MANOVA approach. Introduced solution is simple, yet flexible and powerful, and by no means is confined to morphometrics or chemometrics alone, but has a wide range of potential applications. Copyright © 2015 Elsevier B.V.