Web Table Extraction, Retrieval, and Augmentation: A Survey Journal Article Shuo Zhang; Krisztian Balog In: ACM Transactions on Intelligent Systems and Technology (TIST), vol. 11, no. 2, pp. 1-35, 2020, (Pre SFI). @article{Zhang2020,
title = {Web Table Extraction, Retrieval, and Augmentation: A Survey},
author = {Shuo Zhang and Krisztian Balog},
url = {https://arxiv.org/pdf/2002.00207.pdf},
year = {2020},
date = {2020-02-01},
journal = {ACM Transactions on Intelligent Systems and Technology (TIST)},
volume = {11},
number = {2},
pages = {1-35},
abstract = {Tables are a powerful and popular tool for organizing and manipulating data. A vast number of tables can be found on the Web, which represents a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research on web tables. In particular, we organize existing literature into six main categories of information access tasks: table extraction, table interpretation, table search, question answering, knowledge base augmentation, and table augmentation. For each of these tasks, we identify and describe seminal approaches, present relevant resources, and point out interdependencies among the different tasks.},
note = {Pre SFI},
keywords = {Table augmentation, Table extraction, Table interpretation, Table mining, Table retrieval, Table search, WP2: User Modeling Personalization and Engagement},
pubstate = {published},
tppubtype = {article}
}
Tables are a powerful and popular tool for organizing and manipulating data. A vast number of tables can be found on the Web, which represents a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research on web tables. In particular, we organize existing literature into six main categories of information access tasks: table extraction, table interpretation, table search, question answering, knowledge base augmentation, and table augmentation. For each of these tasks, we identify and describe seminal approaches, present relevant resources, and point out interdependencies among the different tasks. |
Ad Hoc Table Retrieval using Semantic Similarity Conference Shuo Zhang; Krisztian Balog Proceedings of The Web Conference 2018 (WWW’18), 2018, (Pre SFI). @conference{Zhang2018,
title = {Ad Hoc Table Retrieval using Semantic Similarity},
author = {Shuo Zhang and Krisztian Balog},
url = {https://arxiv.org/pdf/1802.06159.pdf},
doi = {10.1145/3178876.3186067},
year = {2018},
date = {2018-02-16},
booktitle = {Proceedings of The Web Conference 2018 (WWW’18)},
pages = {1553-1562},
abstract = {We introduce and address the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. This task is not only interesting on its own account, but is also being used as a core component in many other table-based information access scenarios, such as table completion or table mining. The main novel contribution of this work is a method for performing semantic matching between queries and tables. Specifically, we (i) represent queries and tables in multiple semantic spaces (both discrete sparse and continuous dense vector representations) and (ii) introduce various similarity measures for matching those semantic representations. We consider all possible combinations of semantic representations and similarity measures and use these as features in a supervised learning model. Using a purpose-built test collection based on Wikipedia tables, we demonstrate significant and substantial improvements over a state-of-the-art baseline.},
note = {Pre SFI},
keywords = {Semantic matching, Semantic representations, Semantic similarity, Table retrieval, Table search, WP2: User Modeling Personalization and Engagement},
pubstate = {published},
tppubtype = {conference}
}
We introduce and address the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. This task is not only interesting on its own account, but is also being used as a core component in many other table-based information access scenarios, such as table completion or table mining. The main novel contribution of this work is a method for performing semantic matching between queries and tables. Specifically, we (i) represent queries and tables in multiple semantic spaces (both discrete sparse and continuous dense vector representations) and (ii) introduce various similarity measures for matching those semantic representations. We consider all possible combinations of semantic representations and similarity measures and use these as features in a supervised learning model. Using a purpose-built test collection based on Wikipedia tables, we demonstrate significant and substantial improvements over a state-of-the-art baseline. |