i have 2 density curves based on n=700,000 points plotted on top of each other. these density curves differ in particular location shown below.
the code splits x-axis 10,000 bins varying interval lengths, equal number of points in each bin (in example, there 70 points in each bin). within each bin, count how many points belong red curve (x1) , how many belong black curve (x0), , compute log-ratio.
i include barplot below graph illustrates ratio of "red" points "black" points across entire range of x-axis. example points falling in bin [-0.044821,0.011243] have light color on barplot due negative log-ratio (i.e. low proportion of red points relative black points), points falling in bin (0.9114,0.91396] have dark color on barplot. intervals in between have gradient of light/dark.
> head(df.new) index x lab bin ratio 1 1 -0.018646573 1 [-0.044821,0.011243] -0.5877867 2 1 -0.044820839 0 [-0.044821,0.011243] -0.5877867 3 1 -0.030075158 0 [-0.044821,0.011243] -0.5877867 4 1 0.005859939 0 [-0.044821,0.011243] -0.5877867 5 1 0.009374059 0 [-0.044821,0.011243] -0.5877867 6 1 -0.015107514 1 [-0.044821,0.011243] -0.5877867 > tail(df.new) index x lab bin ratio 699995 9999 0.9120896 1 (0.9114,0.91396] 2.564949 699996 9999 0.9116115 1 (0.9114,0.91396] 2.564949 699997 9999 0.9115013 1 (0.9114,0.91396] 2.564949 699998 9999 0.9123502 0 (0.9114,0.91396] 2.564949 699999 9999 0.9123377 1 (0.9114,0.91396] 2.564949 700000 9999 0.9127925 1 (0.9114,0.91396] 2.564949
something code , image below had in mind. slow process. there way speed up?
require(ks) require(ggplot2) require(gridextra) require(grid) #data set.seed(1234) n1 = 300000 n0 = 400000 x1 = rnorm.mixt(n=n1,mus=c(0.2,0.89),sigmas=c(0.05,0.01),props=c(0.97,0.03)) x0 = rnorm.mixt(n=n0,mus=c(0.2,0.88),sigmas=c(0.05,0.01),props=c(0.97,0.03)) #bin data , convert data frame numbins = 10000 #label points x1 1 , points x0 0 df = data.frame(x=c(x1,x0),lab=c(rep(1,n1),rep(0,n0))) df$bin = with(df, cut_number(df$x, n = numbins)) df$index = factor(df$bin,labels=seq(numbins)) #calculate ratio of counts 1's 0's in each bin (can inf) ns= table(df$index) #number of observations in each bin ones = aggregate(lab~index,df,sum)$lab #number of 1's in each bin ratios = as.numeric((log(ones)-log(ns-ones))) #calculate bin length tmp1 = unique(strsplit(gsub( "[][(]" , "", df$bin) , ",")) tmp2 = lapply(tmp1,as.numeric) tmp3 = matrix(sort(unlist(tmp2)),ncol=2,byrow=true) tmp.df = data.frame(index=factor(seq(numbins)),ratio=ratios,bin.len=tmp3[,2]-tmp3[,1]) df.new = merge(df,tmp.df) #plots #main plot p1 = ggplot(df.new,aes(x=x)) + geom_line(aes(group=lab, colour=factor(lab)),stat="density",show.legend = false) + scale_color_manual(values=c("black","red")) + theme_void() #"gradient" barplot p2 = ggplot(df.new,aes(x=factor(1),y=bin.len,fill=ratio)) + geom_bar(stat="identity") + scale_fill_continuous(low = "black", high = "red",guide = false) + coord_flip() + theme_void() grid.arrange(p1, p2, ncol=1, nrow=2, heights=c(6,1)) #extract legend g_legend<-function(a.gplot){ tmp <- ggplot_gtable(ggplot_build(a.gplot)) leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box") legend <- tmp$grobs[[leg]] return(legend)} p3 = ggplot(df.new,aes(x=factor(1),y=bin.len,fill=ratio)) + geom_bar(stat="identity") + scale_fill_continuous(low = "black", high = "red",name="log ratio") legend <- g_legend(p3) vpleg <- viewport(width = 0.25, height = 0.5, x = 0.85, y = 0.75) upviewport(0) pushviewport(vpleg) grid.draw(legend)
Comments
Post a Comment