The VADA Architecture for Cost-Effective Data Wrangling

Autoren: N. Konstantinou, M. Koehler, E. Abel, C. Civili, B. Neumayr, E. Sallinger, A. A. Fernandes, G. Gottlob, J. A. Keane, L. Libkin, N. W. Paton
Paper: Neum17c (2017)
Zitat: Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD 2017), May 14-19, 2017, Chicago, Illinois, USA, ACM Press, ISBN 978-1-4503-4197-4, available at: https://dl.acm.org/citation.cfm?doid=3035918.3058730, peer reviewed, pp. 1599-1602, 2017.
Ressourcen: Kopie (Senden Sie ein Email mit Neum17c als Betreff an dke.win@jku.at um diese Kopie zu erhalten)

Kurzfassung (Englisch)

Data wrangling, the multi-faceted process by which the data required by an application is identified, extracted, cleaned and integrated, i soften cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is informed by whatever data is available, refines automatically produced results in the light of feedback, takes into account the user’s priorities, and supports data scientists with diverse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.

Keywords: Data Wrangling