A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem

Duygu Sinanc Terzi; Seref Sagiroglu

A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem

2019
Duygu Sinanc Terzi, Seref Sagiroglu

The class imbalance problem, one of the common data irregularities, causes the development of under-represented models. To resolve this issue, the present study proposes a new cluster-based MapReduce design, entitled Distributed Cluster-based Resampling for Imbalanced Big Data (DIBID). The design aims at modifying the existing dataset to increase the classification success. Within the study, DIBID has been implemented on public datasets under two strategies. The first strategy has been designed to present the success of the model on data sets with different imbalanced ratios. The second strategy has been designed to compare the success of the model with other imbalanced big data solutions in the literature. According to the results, DIBID outperformed other imbalanced big data solutions in the literature and increased area under the curve values between 10 % and 24 % through the case study.

Keywords
Big data, cluster-based resampling, imbalanced big data classification, imbalanced data.
DOI
10.2478/acss-2019-0013
Hyperlink
https://doi.org/10.2478/acss-2019-0013

Terzi, D., Sagiroglu, S. A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem. Applied Computer Systems, 2019, Vol. 24, No. 2, pp. 104-110. ISSN 2255-8683. e-ISSN 2255-8691. Available from: doi:10.2478/acss-2019-0013

Publication language
English (en)

Publication Type
Scientific article indexed in SCOPUS or WOS database
Funding for basic activity
Research project
Field of research
2. Engineering and technology
Sub-field of research
2.2 Electrical engineering, Electronic engineering, Information and communication engineering
Research platform
None
ID: 30725