Data Collection and Quality Challenges for Deep Learning

Cited 39 time in webofscience Cited 16 time in scopus
  • Hit : 306
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorWhang, Steven Euijongko
dc.contributor.authorLee, Jae-Gilko
dc.date.accessioned2021-01-28T05:57:28Z-
dc.date.available2021-01-28T05:57:28Z-
dc.date.created2021-01-21-
dc.date.created2021-01-21-
dc.date.created2021-01-21-
dc.date.issued2020-08-
dc.identifier.citationPROCEEDINGS OF THE VLDB ENDOWMENT, v.13, no.12, pp.3429 - 3432-
dc.identifier.issn2150-8097-
dc.identifier.urihttp://hdl.handle.net/10203/280093-
dc.description.abstractSoftware 2.0 refers to the fundamental shift in software engineering where using machine learning becomes the new norm in software with the availability of big data and computing infrastructure. As a result, many software engineering practices need to be rethought from scratch where data becomes a first-class citizen, on par with code. It is well known that 80{90% of the time for machine learning development is spent on data preparation. Also, even the best machine learning algorithms cannot perform well without good data or at least handling biased and dirty data during model training. In this tutorial, we focus on data collection and quality challenges that frequently occur in deep learning applications. Compared to traditional machine learning, there is less need for feature engineering, but more need for significant amounts of data. We thus go through state-of-the-art data collection techniques for machine learning. Then, we cover data validation and cleaning techniques for improving data quality. Even if the data is still problematic, hope is not lost, and we cover fair and robust training techniques for handling data bias and errors. We believe that the data management community is well poised to lead the research in these directions. The presenters have extensive experience in developing machine learning platforms and publishing papers in top-tier database, data mining, and machine learning venues.-
dc.languageEnglish-
dc.publisherASSOC COMPUTING MACHINERY-
dc.titleData Collection and Quality Challenges for Deep Learning-
dc.typeArticle-
dc.identifier.wosid000597303100084-
dc.type.rimsART-
dc.citation.volume13-
dc.citation.issue12-
dc.citation.beginningpage3429-
dc.citation.endingpage3432-
dc.citation.publicationnamePROCEEDINGS OF THE VLDB ENDOWMENT-
dc.identifier.doi10.14778/3415478.3415562-
dc.contributor.localauthorWhang, Steven Euijong-
dc.contributor.localauthorLee, Jae-Gil-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
Appears in Collection
EE-Journal Papers(저널논문)CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 39 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0