Question about the Shannon "entropy" of genomes
  Home FAQ Contact Sign in
sci.bio.evolution only
 
Advanced search
POPULAR GROUPS

more...

sci.bio.evolution Profile…
 Up
Question about the Shannon "entropy" of genomes         


Author: Doug Wedel
Date: Jul 13, 2008 23:05

Using Claude Shannon's formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.
2 Comments
Re: Question about the Shannon "entropy" of genomes         


Author: Steven Sullivan
Date: Jul 14, 2008 22:16

Doug Wedel earthlink.net> wrote:
> Using Claude Shannon's formulas for measuring the redundancy of symbol
> tokens in message strings , and given a large enough text to work with, it
> is possible to identify the language of a text simply from the statistical
> analysis of token use alone, since all languages have unique "signatures" of
> redundancy in symbol token use. It strikes me as possible that different
> organisms (or species or genuses) may also have characteristic redundancy
> levels in their genome, and I was wondering if anyone knows of statistical
> studies of this kind.

look up 'codon bias' for one level of redundancy

Also look up 'sequence logos', Tom Schneider s work primarily, which have been used for years
to represent DNA/protein sequence in terms of Shannon Entropy.

http://www-lmmb.ncifcrf.gov/~toms/

--
-S
A wise man, therefore, proportions his belief to the evidence. -- David Hume, "On Miracles"
(1748)
no comments
Re: Question about the Shannon "entropy" of genomes         


Date: Jul 15, 2008 13:03

"Doug Wedel" earthlink.net> wrote in message
news:g5eqau$1oak$1@darwin.ediacara.org...
> Using Claude Shannon's formulas for measuring the redundancy of symbol
> tokens in message strings , and given a large enough text to work with, it
> is possible to identify the language of a text simply from the statistical
> analysis of token use alone, since all languages have unique "signatures"
> of
> redundancy in symbol token use. It strikes me as possible that different
> organisms (or species or genuses) may also have characteristic redundancy
> levels in their genome, and I was wondering if anyone knows of statistical
> studies of this kind.
>

Three search terms you may find useful:

Codon usage bias
GC-content
puffer-fish junk-dna

Graham
no comments