Nullification test collections for web spam and SEO

Research in the area of adversarial information retrieval has been facilitated by the availability of the UK-2006/UK-2007 collections, comprising crawl data, link graph, and spam labels. However, research into nullifying the negative effect of spam or excessive search engine optimisation (SEO) on th...

Full description

Bibliographic Details
Published in:Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web - AIRWeb '09
Main Authors: Jones, Timothy, Sankaranarayana, Ramesh S, Hawking, David, Craswell, Nick
Format: Conference Object
Language:unknown
Published: Association for Computing Machinery Inc (ACM)
Subjects:
Online Access:http://hdl.handle.net/1885/56223
https://doi.org/10.1145/1531914.1531927
https://openresearch-repository.anu.edu.au/bitstream/1885/56223/5/AIRWeb_nullification.pdf.jpg
https://openresearch-repository.anu.edu.au/bitstream/1885/56223/7/01_Jones_Nullification_test_collections_2009.pdf.jpg
id ftanucanberra:oai:openresearch-repository.anu.edu.au:1885/56223
record_format openpolar
spelling ftanucanberra:oai:openresearch-repository.anu.edu.au:1885/56223 2024-01-14T10:10:38+01:00 Nullification test collections for web spam and SEO Jones, Timothy Sankaranarayana, Ramesh S Hawking, David Craswell, Nick Madrid Spain http://hdl.handle.net/1885/56223 https://doi.org/10.1145/1531914.1531927 https://openresearch-repository.anu.edu.au/bitstream/1885/56223/5/AIRWeb_nullification.pdf.jpg https://openresearch-repository.anu.edu.au/bitstream/1885/56223/7/01_Jones_Nullification_test_collections_2009.pdf.jpg unknown Association for Computing Machinery Inc (ACM) International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2009) 9781605584386 http://hdl.handle.net/1885/56223 doi:10.1145/1531914.1531927 https://openresearch-repository.anu.edu.au/bitstream/1885/56223/5/AIRWeb_nullification.pdf.jpg https://openresearch-repository.anu.edu.au/bitstream/1885/56223/7/01_Jones_Nullification_test_collections_2009.pdf.jpg Proceedings of The 5th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2009) http://doi.acm.org/10.1145/1531914.1531927 Keywords: Air research Evaluation test Optimisations Query lists Search results Test Collection Information retrieval Research Sea ice Search engines Internet evaluation test collections web spam Conference paper ftanucanberra https://doi.org/10.1145/1531914.1531927 2023-12-15T09:39:01Z Research in the area of adversarial information retrieval has been facilitated by the availability of the UK-2006/UK-2007 collections, comprising crawl data, link graph, and spam labels. However, research into nullifying the negative effect of spam or excessive search engine optimisation (SEO) on the ranking of non-spam pages is not well supported by these resources. Nor is the study of cloaking techniques or of click spam. Finally, the domain-restricted nature of a .uk crawl means that only parts of link-farm icebergs may be visible in these crawls. We introduce the term nullification which we define as "preventing problem pages from negatively affecting search results". We show some important differences between properties of current .uk-restricted crawls and those previously reported for the Web as a whole. We identify a need for an adversarial IR collection which is not domain-restricted and which is supported by a set of appropriate query sets and (optimistically) user-behaviour data. The billion-page unrestricted crawl being conducted by CMU (web09-bst) and which will be used in the 2009 TREC Web Track is assessed as a possible basis for a new AIR test collection. We discuss the pros and cons of its scale, and the feasibility of adding resources such as query lists to enhance the utility of the collection for AIR research. Conference Object Sea ice Australian National University: ANU Digital Collections Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web - AIRWeb '09 53
institution Open Polar
collection Australian National University: ANU Digital Collections
op_collection_id ftanucanberra
language unknown
topic Keywords: Air research
Evaluation test
Optimisations
Query lists
Search results
Test Collection
Information retrieval
Research
Sea ice
Search engines
Internet evaluation
test collections
web spam
spellingShingle Keywords: Air research
Evaluation test
Optimisations
Query lists
Search results
Test Collection
Information retrieval
Research
Sea ice
Search engines
Internet evaluation
test collections
web spam
Jones, Timothy
Sankaranarayana, Ramesh S
Hawking, David
Craswell, Nick
Nullification test collections for web spam and SEO
topic_facet Keywords: Air research
Evaluation test
Optimisations
Query lists
Search results
Test Collection
Information retrieval
Research
Sea ice
Search engines
Internet evaluation
test collections
web spam
description Research in the area of adversarial information retrieval has been facilitated by the availability of the UK-2006/UK-2007 collections, comprising crawl data, link graph, and spam labels. However, research into nullifying the negative effect of spam or excessive search engine optimisation (SEO) on the ranking of non-spam pages is not well supported by these resources. Nor is the study of cloaking techniques or of click spam. Finally, the domain-restricted nature of a .uk crawl means that only parts of link-farm icebergs may be visible in these crawls. We introduce the term nullification which we define as "preventing problem pages from negatively affecting search results". We show some important differences between properties of current .uk-restricted crawls and those previously reported for the Web as a whole. We identify a need for an adversarial IR collection which is not domain-restricted and which is supported by a set of appropriate query sets and (optimistically) user-behaviour data. The billion-page unrestricted crawl being conducted by CMU (web09-bst) and which will be used in the 2009 TREC Web Track is assessed as a possible basis for a new AIR test collection. We discuss the pros and cons of its scale, and the feasibility of adding resources such as query lists to enhance the utility of the collection for AIR research.
format Conference Object
author Jones, Timothy
Sankaranarayana, Ramesh S
Hawking, David
Craswell, Nick
author_facet Jones, Timothy
Sankaranarayana, Ramesh S
Hawking, David
Craswell, Nick
author_sort Jones, Timothy
title Nullification test collections for web spam and SEO
title_short Nullification test collections for web spam and SEO
title_full Nullification test collections for web spam and SEO
title_fullStr Nullification test collections for web spam and SEO
title_full_unstemmed Nullification test collections for web spam and SEO
title_sort nullification test collections for web spam and seo
publisher Association for Computing Machinery Inc (ACM)
url http://hdl.handle.net/1885/56223
https://doi.org/10.1145/1531914.1531927
https://openresearch-repository.anu.edu.au/bitstream/1885/56223/5/AIRWeb_nullification.pdf.jpg
https://openresearch-repository.anu.edu.au/bitstream/1885/56223/7/01_Jones_Nullification_test_collections_2009.pdf.jpg
op_coverage Madrid Spain
genre Sea ice
genre_facet Sea ice
op_source Proceedings of The 5th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2009)
http://doi.acm.org/10.1145/1531914.1531927
op_relation International Workshop on Adversarial Information Retrieval on the Web (AIRWeb 2009)
9781605584386
http://hdl.handle.net/1885/56223
doi:10.1145/1531914.1531927
https://openresearch-repository.anu.edu.au/bitstream/1885/56223/5/AIRWeb_nullification.pdf.jpg
https://openresearch-repository.anu.edu.au/bitstream/1885/56223/7/01_Jones_Nullification_test_collections_2009.pdf.jpg
op_doi https://doi.org/10.1145/1531914.1531927
container_title Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web - AIRWeb '09
container_start_page 53
_version_ 1788065409593769984