Finally, some suggestions for further work are provided to improve the big data cleansing mechanisms in the future. Furthermore, this paper denotes the advantages and disadvantages of the chosen data cleansing techniques and discusses the related parameters, comparing them in terms of scalability, efficiency, accuracy, and usability. A number of articles are reviewed in each category. Therefore, five categories to review these mechanisms are considered, which are machine learning-based, sample-based, expert-based, rule-based, and framework-based mechanisms. As such, a comprehensive and systematic study on the state-of-the-art mechanisms within the scope of the big data cleansing is done in this survey. However, to the best of our knowledge, there has not been any comprehensive review of data cleansing techniques for big data analytics. data cleansing, and various techniques have been presented to solve this issue. One of the key challenges in this context is to detect and repair dirty data, i.e. is inherently uncertain due to noise, missing values, inconsistencies and other problems that impact the quality of big data analytics. The data gathered through different sources, such as sensor networks, social media, business transactions, etc. It is thus necessary to develop data management strategies in order to handle the large-scale datasets. With the evolution of new technologies, the production of digital data is constantly growing. © 2018 Institute of Advanced Engineering and Science. Machine learning algorithms can be used to analyze data and make predictions and finally clean data automatically. Also challenges faced in cleaning big data due to nature of data are discussed. Then, cleaning tools available in market are summarized. In this survey paper, data quality troubles which may occur in big data processing to understand clearly why an organization requires data cleaning are examined, followed by data quality criteria (dimensions used to indicate data quality). Data cleaning is an essential part of managing and analyzing data. One of the biggest challenges in big data analytics is to discover and repair dirty data failure to do this can lead to inaccurate analytics and unpredictable conclusions. the set may contain inaccuracies, missing data, miscoding and other issues that influence the strength of big data analytics. It’s challenging to analyze such large scale of data to extract data meaning and handling uncertain outcomes. This needs to have strategies to manage large volumes of structured, unstructured and semi-structured data. Recently Big Data has become one of the important new factors in the business field.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |