Author: Steven SullivanSteven Sullivan Date: Jul 14, 2008 22:16
Doug Wedel earthlink.net> wrote:
> Using Claude Shannon's formulas for measuring the redundancy of symbol
> tokens in message strings , and given a large enough text to work with, it
> is possible to identify the language of a text simply from the statistical
> analysis of token use alone, since all languages have unique "signatures" of
> redundancy in symbol token use. It strikes me as possible that different
> organisms (or species or genuses) may also have characteristic redundancy
> levels in their genome, and I was wondering if anyone knows of statistical
> studies of this kind.
look up 'codon bias' for one level of redundancy
Also look up 'sequence logos', Tom Schneider s work primarily, which have been used for years
to represent DNA/protein sequence in terms of Shannon Entropy.
http://www-lmmb.ncifcrf.gov/~toms/
--
-S
A wise man, therefore, proportions his belief to the evidence. -- David Hume, "On Miracles"
(1748)
|