Cogprints

Frequency Value Grammar and Information Theory

Stepak, Asa M (2004) Frequency Value Grammar and Information Theory. [Preprint]

This is the latest version of this eprint.

Full text available as:

[img]
Preview
PDF
161Kb

Abstract

I previously laid the groundwork for Frequency Value Grammar (FVG) in papers I submitted in the proceedings of the 4th International Conference on Cognitive Science (2003), Sydney Australia, and Corpus Linguistics Conference (2003), Lancaster, UK. FVG is a formal syntax theoretically based in large part on Information Theory principles. FVG relies on dynamic physical principles external to the corpus which shape and mould the corpus whereas generative grammar and other formal syntactic theories are based exclusively on patterns (fractals) found occurring within the well-formed portion of the corpus. However, FVG should not be confused with Probability Syntax, (PS), as described by Manning (2003). PS is a corpus based approach that will yield the probability distribution of possible syntax constructions over a fixed corpus. PS makes no distinction between well and ill formed sentence constructions and assumes everything found in the corpus is well formed. In contrast, FVG’s primary objective is to distinguish between well and ill formed sentence constructions and, in so doing, relies on corpus based parameters which determine sentence competency. In PS, a syntax of high probability will not necessarily yield a well formed sentence. However, in FVG, a syntax or sentence construction of high ‘frequency value’ will yield a well-formed sentence, at least, 95% of the time satisfying most empirical standards. Moreover, in FVG, a sentence construction of ‘high frequency value’ could very well be represented by an underlying syntactic construction of low probability as determined by PS. The characteristic ‘frequency values’ calculated in FVG are not measures of probability but rather are fundamentally determined values derived from exogenous principles which impact and determine corpus based parameters serving as an index of sentence competency. The theoretical framework of FVG has broad applications beyond that of formal syntax and NLP. In this paper, I will demonstrate how FVG can be used as a model for improving the upper bound calculation of entropy of written English. Generally speaking, when a function word precedes an open class word, the backward n-gram analysis will be homomorphic with the information source and will result in frequency values more representative of co-occurrences in the information source.

Item Type:Preprint
Keywords:Information theory, n-grams, Natural Language, entropy,probability syntax, well-formedness, frequency value, corpus,iconicity, formal syntax, cognitive science.
Subjects:Neuroscience > Neurolinguistics
Computer Science > Statistical Models
Computer Science > Language
Linguistics > Computational Linguistics
Psychology > Psycholinguistics
Psychology > Cognitive Psychology
Linguistics > Syntax
ID Code:3657
Deposited By:Stepak, Asa M.
Deposited On:05 Jun 2004
Last Modified:11 Mar 2011 08:55

Available Versions of this Item

Metadata

Repository Staff Only: item control page