Warning

 

Close

Confirm Action

Are you sure you wish to do this?

Confirm Cancel
BCM
User Panel

Site Notices
Posted: 2/12/2015 8:23:50 PM EDT
Is there a statistical test to determine if a value is an outlier for a nonparametric data set?



I don't think I can use the Wilcoxon ranked pair test or the kruskal wallis test since it tests two populations rather than a single point against a population.
Link Posted: 2/12/2015 9:07:11 PM EDT
[#1]
 In my graduate work in stats we had just conquered analysis of covariance and were venturing into the realm of multivariate analysis....  Sorry man...
Link Posted: 2/14/2015 1:18:15 AM EDT
[#2]
Check here: http://www.eng.tau.ac.il/~bengal/outlier.pdf

There's a pretty simple technique described here which relies on quartiles...so it'll work as long as your data aren't that complicated: http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Nonparametric/BS704_Nonparametric2.html#whenthereareoutliers
Link Posted: 2/14/2015 2:04:01 PM EDT
[#3]
Do you know what distribution your data set follows?
Link Posted: 2/18/2015 3:53:54 PM EDT
[#4]
Whoops, lost track of this thread.





Data set is most often left skewed, but can vary to be bimodal or even semi-normal.












Anyways, apparently what people sometimes do is log transform the data set to make it normal, and then perform an ANOVA test.







Or as the 2nd post linked: use an IQR range to detect outliers.







.. now back to debugging my R script....


 



ETA: for interested: bootstrapping
Link Posted: 2/18/2015 9:50:36 PM EDT
[#5]
when we were analyzing microarray data, we dumped the ones that were outside the +/- 2xSD.....??? I think. been a year or 2 since I had to do QC.
Otherwise, i just plot points and they tend to jump out and yell "me, me... I'm the wierdo!!"



Enjoy the R.......
Link Posted: 2/19/2015 2:50:08 PM EDT
[#6]


Discussion ForumsJump to Quoted PostQuote History
Quoted:



when we were analyzing microarray data, we dumped the ones that were outside the +/- 2xSD.....??? I think. been a year or 2 since I had to do QC.


Otherwise, i just plot points and they tend to jump out and yell "me, me... I'm the wierdo!!"











Enjoy the R.......
View Quote
That's very naughty of you.  Generally it is verboten to dump data points without a good reason.  What you are doing will get rid of ~5% of your data points that are not actually outliers.
Link Posted: 2/22/2015 10:33:17 PM EDT
[#7]
The data "dump" was done because we were dealing with microarray data... which was/is a bit of a P.I.T.A.
The method we used was a generally accepted protocol at the time.
Nowadays, the procedures have changed a bit and I wouldn't dump as much data.
I was told by a nationally renowned statistician when we first started doing the arrays that if we gave him a set of array data, he could tell if they were created on the same day, if we used the same samples, if different technicians did them, if it was under a full moon..... Basically, they're VERY tricky to replicate and sensitive to all sorts of stuff...

But yah, I know dumping data is generally not the best thing to do. I'll dump 2 or 3 points out of 1000, but more than that gives me the heeby-jeebies nowadays.
Link Posted: 3/11/2015 6:32:59 PM EDT
[#8]
Anyone here know about hierarchy clustering methods?



I think I've arrived at the right distance/dissimilarity matrix to use (Bray-Curtis), but I'm not sure what clustering method I should choose.




If someone can point me in the right direction for reference material, that'd be great.




Otherwise, I have a 2D matrix, the rows are individual bacterial species. The columns are different pools of antibodies. Each cell represents how much of a particular bacteria species a antibody clone enriches for. I expect most clones to be different. But I want to visualize that by heatmap analysis.










That's what I have so far, but I'm not satisfied with the clustering algorithm (currently used Ward and Average)
Link Posted: 4/8/2015 7:52:48 PM EDT
[#9]
Usually when I do clustering, it's on 2D or 3D presumably metric spaces.

If you've got a metric space, you can use clustering based on the Euclidean distances among the points. If you've got an ultrametric space, you would want to use hierarchical clustering.


If the similarity matrix is based on features (which it sounds like it is) then the space is likely ultrametric but there are ways of checking this. In that case, at least in my field, we'd use traditional hierarchical clustering methods (R's hclust package), or an additive similarity tree (see Tversky & Sattath 1977)
Link Posted: 4/9/2015 8:37:08 PM EDT
[#10]

Discussion ForumsJump to Quoted PostQuote History
Quoted:


Usually when I do clustering, it's on 2D or 3D presumably metric spaces.



If you've got a metric space, you can use clustering based on the Euclidean distances among the points. If you've got an ultrametric space, you would want to use hierarchical clustering.





If the similarity matrix is based on features (which it sounds like it is) then the space is likely ultrametric but there are ways of checking this. In that case, at least in my field, we'd use traditional hierarchical clustering methods (R's hclust package), or an additive similarity tree (see Tversky & Sattath 1977)
View Quote




 
Hmm, I seem to be missing something:




When I do Hierarchical clustering, it seems I still have to input a distance matrix.




For my data set, I use Bray-Curtis dissimilarity or the UNIFRAC distances. And then from there, I ask R to give me a cluster tree based on one of several clustering methods: average, point, complete or Ward method.

My trouble right now is I don't know which one of those methods is most biologically relevant. (I've determined that average/point doesn't make sense, and I'm stuck between complete and Ward(
Link Posted: 4/10/2015 12:04:53 AM EDT
[#11]
Discussion ForumsJump to Quoted PostQuote History
Quoted:

  Hmm, I seem to be missing something:


When I do Hierarchical clustering, it seems I still have to input a distance matrix.


For my data set, I use Bray-Curtis dissimilarity or the UNIFRAC distances. And then from there, I ask R to give me a cluster tree based on one of several clustering methods: average, point, complete or Ward method.
My trouble right now is I don't know which one of those methods is most biologically relevant. (I've determined that average/point doesn't make sense, and I'm stuck between complete and Ward(
View Quote View All Quotes
View All Quotes
Discussion ForumsJump to Quoted PostQuote History
Quoted:
Quoted:
Usually when I do clustering, it's on 2D or 3D presumably metric spaces.

If you've got a metric space, you can use clustering based on the Euclidean distances among the points. If you've got an ultrametric space, you would want to use hierarchical clustering.


If the similarity matrix is based on features (which it sounds like it is) then the space is likely ultrametric but there are ways of checking this. In that case, at least in my field, we'd use traditional hierarchical clustering methods (R's hclust package), or an additive similarity tree (see Tversky & Sattath 1977)

  Hmm, I seem to be missing something:


When I do Hierarchical clustering, it seems I still have to input a distance matrix.


For my data set, I use Bray-Curtis dissimilarity or the UNIFRAC distances. And then from there, I ask R to give me a cluster tree based on one of several clustering methods: average, point, complete or Ward method.
My trouble right now is I don't know which one of those methods is most biologically relevant. (I've determined that average/point doesn't make sense, and I'm stuck between complete and Ward(


100% - you still generate a distance matrix from that space. The issue is whether or not the space is metric. If it's not metric, and it's ultrametric, then hierarchical clustering makes sense. If it's not ultrametric then you might try an additive trees type approach.

So this is outside of your field of study but see this article:

http://www.cs.technion.ac.il/~moran/COURSES/papers/SaTv77.pdf

Close Join Our Mail List to Stay Up To Date! Win a FREE Membership!

Sign up for the ARFCOM weekly newsletter and be entered to win a free ARFCOM membership. One new winner* is announced every week!

You will receive an email every Friday morning featuring the latest chatter from the hottest topics, breaking news surrounding legislation, as well as exclusive deals only available to ARFCOM email subscribers.


By signing up you agree to our User Agreement. *Must have a registered ARFCOM account to win.
Top Top