Learning Imbalanced Data Sets with Noisy Replication Thesis uri icon



  • Thesis (M.S., Statistical Sciences) -- University of Idaho, 2017 | The noisy replication method has been proven to be an effective approach in learning the imbalanced binary data set in previous researches. This thesis expands its concept and effectiveness in broader scenarios: we study with several levels of sigma noise, a wide range of imbalanced ratios (IR), eight commonly used machine learning models, both binary and multi-class data sets, adding both noise and anti-noise, and more than 60 simulated and real data sets, etc. This thesis finds that the performance of the noisy replication method is significantly improved with the increase of IR by adding a relatively small noise for some models, KNN, Neural Network and C5.0, for instance. Moreover, it further shows that the noisy replication method is an ideal model-free approach in learning both the binary and the multi-class imbalanced data sets in terms of ROC area and Kullback-Leibler distance.

publication date

  • June 1, 2017