CountryInfo.txt: Country names, codes, places and leaders

CountryInfo.txt is a general purpose file intended to facilitate natural language processing of news reports and political texts. It was originally developed to identify states for the text filtering system used in the development of MID4, then extended to incorporate CIA World Factbook and WordNet...

Full description

Bibliographic Details
Main Author: Schrodt, Philip
Format: Dataset
Language:unknown
Published: 2015
Subjects:
Online Access:https://search.dataone.org/view/sha256:d6d15fe697a6bf3c214f4570fc5e4fde0873fd931bcc9221fafc24b640dfcf1f
Description
Summary:CountryInfo.txt is a general purpose file intended to facilitate natural language processing of news reports and political texts. It was originally developed to identify states for the text filtering system used in the development of MID4, then extended to incorporate CIA World Factbook and WordNet information for the development of TABARI dictionaries. File contains about 32,000 lines, covering about 240 countries and administrative units (e.g. American Samoa, Christmas Island, Hong Kong, Greenland). It is internally documented and almost but not quite XML: The major fields are delimited with tags of the form ... but elements inside are delimited with line feeds. Converting this to strict XML would be a relatively simple programming exercise for anyone who should be working with the file in the first place. File is UTF-8 with Unix line feeds and will need to be converted if used in a Windows system. Fields include Country name in English Adjectival forms and synonyms of the country name, including some non-English versions of the name ISO-3166 numeric, alpha2 and alpha3 codes, FIPS-10 code, IMF code, COW alpha and numeric codes Capital city Cities with populations over 1-million Regions and geographical features (WordNet meronyms) Leaders, 1960-2008 (rulers.org) Members of government, 2003-2010 (CIA World Leaders) The beginning of the file has fairly extensive documentation on the formats used.