[Chimera-users] R-factor
Tom Goddard
goddard at sonic.net
Wed May 11 09:39:21 PDT 2011
Hi Ryo,
Chimera does not calculate the real-space R-factor. The real space
R-factor defined for crystallographic maps in
Branden C. and Jones A., Nature 343 687-689 (1990)
is
RSRF = sum(|d_o - d_c|) / sum(|d_o + d_c|)
where
d_o is the observed (experimental) density
d_c is the calculated density from the atomic model.
and the sum is over grid points in the d_c map, probably using d_o
interpolated values at those exact same points. They also compute RSRF
per-residue and I'm not clear what grid points they use in that case --
maybe just the atom center positions.
This has some immediate problems when applied to EM maps and fit
models. First you need the observed and calculated density maps to have
the same normalization. If the experimental density values range from
-5000 to 10000 and the calculated ones from 0.001 to 0.01 then obviously
you get nonsense. The next problem is that the experimental density
values from single-particle EM reconstructions are often negative in
parts of the map. You can see from the formula above that can cause
havoc. If experimental density at just one grid point is close to being
the negative of the calculated density it will make a huge contribution
to RSRF. X-ray maps also have many negative density values, but their
magnitudes seem to be less.
The idea behind RSRF is to judge the fit by looking at the size of
difference map values d_o - d_c relative to the size of the values in
the observed and calculated maps. The standard cross-correlation
coefficient does something very similar and does it better I think.
Here's how. Consider the sum of the squares of the residuals over all
the grid points and normalize by the sums of squares of the densities in
the experimental and calculated maps
E = sum((d_o - d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))
This has the same problem described above that the maps may have
different normalizations. So put a scale factor f in front of the
calculated map d_c
E = sum((d_o - f*d_c)**2) / (sqrt(sum(d_o**2)) * sqrt(sum((f*d_c)**2)))
and choose the scale factor f so that E is minimized. In other words,
we scale the calculated map to minimize the error between experimental
and calculated maps. It is easy to show that
f = sqrt(sum(d_o**2)) / sqrt(sum(d_c**2))
and then
E = 2 * ( 1 - CCC )
where
CCC = sum(d_o * d_c) / (sqrt(sum(d_o**2)) * sqrt(sum(d_c**2)))
is just the normal cross-correlation coefficient (without mean values
being subtracted).
So the standard cross-correlation coefficient is a direct and
sensible measure of residual error.
If you can give me a sound reason why another measure of residual
error is useful, I'll be happy to add it to Chimera.
Tom
> Hi, Chimera staffs
>
> To evaluate the validity of fitting of the atomic model with the EM map, can I calculate the real-space R factor with chimera?
>
> Ryo
> _______________________________________________
> Chimera-users mailing list
> Chimera-users at cgl.ucsf.edu
> http://plato.cgl.ucsf.edu/mailman/listinfo/chimera-users
>
More information about the Chimera-users
mailing list