Online data preprocessing: a case study approach
Abstract
Besides the Internet search facility and e-mails, social networking is now one
of the three best uses of the Internet. A tremendous number of volunteers
every day write articles, share photos, videos and links at a scope and scale
never imagined before. However, because social network data are huge and
come from heterogeneous sources, the data are highly susceptible to
inconsistency, redundancy, noise, and loss. For data scientists, preparing the
data and getting it into a standard format is critical because the quality of
data is going to directly affect the performance of mining algorithms that are
going to be applied next. Low-quality data will certainly limit the analysis
and lower the quality of mining results. To this end, the goal of this study is
to provide an overview of the different phases involved in data
preprocessing, with a focus on social network data. As a case study, we will
show how we applied preprocessing to the data that we collected for the
Malaysian Flight MH370 that disappeared in 2014.
Journal/Conference Information
International Journal of Electrical and Computer Engineering (IJECE),DOI: 10.11591/ijece.v9i4.pp2620-2626, ISSN: 2088-8708, Volume: 9, Issue: 4, Pages Range: 2620-2626,