commentcomment
printprint
share

Building and Exploring Web Corpora (WAC3 - 2007)

Proceedings of the 3rd web as corpus workshop, incorporating cleaneval

Edited by Cédrick Fairon, Hubert NAETS, Adam KILGARRIFF, Gilles-Maurice de SCHRYVER

Presses universitaires de LouvainCahiers du CENTAL

Paperback - In English
Price: 19.70 €
Add to shopping cart
 
Adobe PDF - In English
Price: 9.00 €
Add to shopping cart
 

WAC

More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes.

CLEANEVAL

Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.

Table of Contents

Details

Issue 4
Type Monograph
Language English
Publisher Presses universitaires de Louvain
Format Paperback
ISBN-10 2874630829
ISBN-13 9782874630828
Publication Date Jan 2007
Nb of pages 182

Format Adobe PDF
ISBN-10 2-87463-504-9
ISBN-13 978-2-87463-504-5
Publication Date Jan 2007



See related titles in...