C.V. Cannistraci, T. Ravasi, F.M. Montevecchi, T. Ideker, M. Alessio
Bioinformatics, 26(18), pp. i531-i539, (2010)
small datasets, which are characterized by low numbers of samples and
very high numbers of measures, occur frequently in computational
biology, and pose problems in their investigation. Unsupervised
hybrid-two-phase (H2P) procedures—specifically dimension reduction (DR),
coupled with clustering—provide valuable assistance, not only for
unsupervised data classification, but also for visualization of the
patterns hidden in high-dimensional feature space.
‘Minimum Curvilinearity’ (MC) is a principle that—for small
datasets—suggests the approximation of curvilinear sample distances in
the feature space by pair-wise distances over their minimum spanning
tree (MST), and thus avoids the introduction of any tuning parameter. MC
is used to design two novel forms of nonlinear machine learning (NML):
Minimum Curvilinear embedding (MCE) for DR, and Minimum Curvilinear
affinity propagation (MCAP) for clustering.
Compared with several other unsupervised and supervised algorithms, MCE
and MCAP, whether individually or combined in H2P, overcome the limits
of classical approaches. High performance was attained in the
visualization and classification of: (i) pain patients (proteomic
measurements) in peripheral neuropathy; (ii) human organ tissues
(genomic transcription factor measurements) on the basis of their
provides a valuable framework to estimate nonlinear distances in small
datasets. Its extension to large datasets is prefigured for novel NMLs.
Classification of neuropathic pain by proteomic profiles offers new
insights for future molecular and systems biology characterization of
pain. Improvements in tissue embryological classification refine results
obtained in an earlier study, and suggest a possible reinterpretation
of skin attribution as mesodermal.