Diagnostic with incomplete nominal/discrete data

Herbert F. Jelinek, Andrew Yatsko, Andrew Stranieri, Sitalakshmi Venkatraman, Adil Bagirov


Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise applicationof readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown.Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, nospecial handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation.Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour,and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the enteredmissing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification.The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents anumber of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing downof the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. Theproposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating asignificant improvement.

Full Text:


DOI: https://doi.org/10.5430/air.v4n1p22


  • There are currently no refbacks.

Artificial Intelligence Research

ISSN 1927-6974 (Print)   ISSN 1927-6982 (Online)

Copyright © Sciedu Press 
To make sure that you can receive messages from us, please add the 'Sciedupress.com' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.