Saturday, April 16, 2011
23andMe - Relative Finder improvements
For those of you who may be interested, I found this which was just posted on the 23andMe site:
Improvements to Relative Finder's Relationship Predictions
Today we're making an improvement to the way predicted relationships are estimated in Relative Finder. This should resolve an issue a number of you have reported, in which some RF matches, typically at about the estimated third cousin level and farther out, switched to a more distant prediction. What we've done is to replace the procedure Relative Finder uses to infer that an individual’s ancestors are predominantly Ashkenazi Jewish.
Why would Relative Finder care whether your ancestors were Ashkenazi or not? Relative Finder classifies individuals as having Ashkenazi ancestry or not in order to adjust the relationships and relationship ranges it reports. Since pairs of Ashkenazi individuals have higher background levels of DNA sharing, Ashkenazi cousins of a given degree of relatedness will tend to share more DNA than non-Ashkenazi cousins of the same degree of relatedness; : : conversely, for the same amount of shared DNA, Ashkenazi individuals tend to be less closely-related than non-Ashkenazi individuals. This led us to define separate models for relationship prediction for non-Ashkenazi and Ashkenazi populations, and that is what creates the need for a way to identify who has Ashkenazi ancestry.
Previously we used a very simple method to determine whether someone was genetically Ashkenazi -- whether their number of RF matches exceeded some fixed threshold. If either individual in the pair is inferred to have a high level of Ashkenazi ancestry, the Ashkenazi tables are used. This number-of-matches predictor was effective, but it depended on the size of the database, so as the database got larger, customers' results could change. Recently, as customers without Ashkenazi ancestry have begun to cross the number-of-matches threshold, their distant relationships have shifted to a bit more distant.
We're replacing the old method with a new one that does not depend on database composition, but on the individual's genotype. This will stabilize the relationship predictions. (Some details on the method follow below.)
The main effect of the change is: if your ancestry isn’t predominantly Ashkenazi, and you saw your results shift recently, they should go back as they were and stay there. If you’re a new customer without much Ashkenazi ancestry, who started out above the matches threshold, you would never have seen your results shift to more-distant in the first place, but they would shift to less-distant now. Also, if you've seen individual matches of yours move around, they should also go back as they were.
This change has no effect on the method used to identify shared segments of DNA, so there should be no changes to percent DNA shared or the number of segments shared. The relationship prediction occurs downstream of the DNA segment identification.
[Notes on the new method] The new method uses a special case of the model underlying STRUCTURE/ADMIXTURE. First we defined several reference populations, by looking at clusters in PCA plots of customer genotype data, including full Ashkenazi, (non-Ashkenazi) Northern European and Southern European, and several non-European populations. From these we computed reference allele frequencies for each population. Then we chose the k markers that best differentiated the reference populations from one another (using Noah Rosenberg's I_n statistic; k ~ 20,000 after evaluating several different choices). With these allele frequencies in hand, we could then compute the probability that a customer genotype arises entirely from one population, or from some specified mixture of multiple populations (e.g., both parents Ashkenazi, one parent Northern European-one parent Ashkenazi, both parents South Asian, etc.) Finally, Relative Finder makes use of these probabilities to classify customers as predominantly Ashkenazi, or not.