nullranges::matchranges()
with BentoBox
vignettes/BioC2021_lightning_talk.Rmd
BioC2021_lightning_talk.Rmd
This lightning talk will attempt to use a toy dataset to generate covariate-matched, null-hypothesis GRanges with nullranges::matchRanges()
and then visualize the results using the BentoBox
framework.
For this example lets use a utility function to create a simulated dataset:
set.seed(123)
x <- Bioc2021::makeExampleMatchedDataSet(type = "GRanges")
x
## GRanges object with 136 ranges and 3 metadata columns:
## seqnames ranges strand | feature1 color length
## <Rle> <IRanges> <Rle> | <logical> <character> <integer>
## [1] chr1 92-200 * | TRUE #4fbe9b 109
## [2] chr1 71-178 * | TRUE #d098d7 108
## [3] chr1 747-902 * | TRUE #e19995 156
## [4] chr1 362-502 * | TRUE #d098d7 141
## [5] chr1 339-406 * | TRUE #d098d7 68
## ... ... ... ... . ... ... ...
## [132] chr1 511-606 * | FALSE #d098d7 96
## [133] chr1 724-847 * | FALSE #4fbe9b 124
## [134] chr1 72-172 * | FALSE #4fbe9b 101
## [135] chr1 341-480 * | FALSE #4fbe9b 140
## [136] chr1 674-734 * | FALSE #d098d7 61
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
This creates a GRanges object with random ranges and 3 metadata colums (feature1
, color
, and length
) that describe “features” of each range.
Let’s demonstrate generating a set of matched ranges by finding ranges that are FALSE
for feature1
but have the same distributions of color
and length
.
Our focal
set of interest is where feature1 is TRUE
, and pool
is all other ranges (where feature1 is FALSE
):
focal <- x[x$feature1]
pool <- x[!x$feature1]
We can use nullranges::matchRanges()
to select a subset from pool that is distributionally matched in color and range length:
library(nullranges)
mgr <- matchRanges(focal = focal,
pool = pool,
covar = ~color + length,
method = 'stratified',
replace = FALSE)
mgr
## MatchedGRanges object with 16 ranges and 3 metadata columns:
## seqnames ranges strand | feature1 color length
## <Rle> <IRanges> <Rle> | <logical> <character> <integer>
## [1] chr1 54-164 * | FALSE #4fbe9b 111
## [2] chr1 624-730 * | FALSE #d098d7 107
## [3] chr1 357-508 * | FALSE #e19995 152
## [4] chr1 755-893 * | FALSE #d098d7 139
## [5] chr1 179-230 * | FALSE #d098d7 52
## ... ... ... ... . ... ... ...
## [12] chr1 394-445 * | FALSE #6eb3d9 52
## [13] chr1 599-668 * | FALSE #4fbe9b 70
## [14] chr1 700-763 * | FALSE #4fbe9b 64
## [15] chr1 514-660 * | FALSE #d098d7 147
## [16] chr1 448-478 * | FALSE #adaf64 31
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
overview
can show how successful our matching was:
overview(mgr)
## MatchedGRanges object:
## set N color.#4fbe9b color.#6eb3d9 color.#adaf64 color.#d098d7
## focal 16 4 2 3 6
## matched 16 5 2 2 6
## pool 120 30 24 22 22
## unmatched 104 25 22 20 16
## color.#e19995 length.mean length.sd ps.mean ps.sd
## 1 110 56 0.15 0.059
## 1 100 58 0.15 0.061
## 22 110 49 0.11 0.057
## 21 110 48 0.11 0.055
## --------
## focal - matched:
## color.#4fbe9b color.#6eb3d9 color.#adaf64 color.#d098d7 color.#e19995
## -1 0 1 0 0
## length.mean length.sd ps.mean ps.sd
## 3.4 -2.1 -0.00098 -0.0017
Defining the region to plot:
We can visualize these ranges by creating a BentoBox
page:
bbPageCreate(width = 8.5, height = 6.5, default.units = 'inches')
Visualize the pool
GRanges:
poolSet <- bbPlotRanges(data = pool,
params = region,
x = 1,
y = 1,
width = 2.5,
height = 2.5,
fill = pool$color)
bbAnnoGenomeLabel(plot = poolSet,
x = 1,
y = 3.55)
bbPlotText(label = "Pool Set",
x = 2.25,
y = 0.9,
just = c("center", "bottom"),
fontcolor = "#33A02C",
fontface = "bold",
fontfamily = 'mono')
Visualize the focal
GRanges of interest:
focalSet <- bbPlotRanges(data = focal,
params = region,
x = 5,
y = 1,
width = 2.5,
height = 1,
fill = focal$color)
bbAnnoGenomeLabel(plot = focalSet,
x = 5,
y = 2.05)
bbPlotText(label = "Focal Set",
x = 6.25,
y = 0.9,
just = c("center", "bottom"),
fontcolor = "#1F78B4",
fontface = "bold",
fontfamily = 'mono')
And finally the matched
GRanges:
## Matched set
matchedSet <- bbPlotRanges(data = mgr,
params = region,
x = 5,
y = 2.5,
width = 2.5,
height = 1,
fill = mgr$color)
bbAnnoGenomeLabel(plot = matchedSet,
x = 5,
y = 3.55)
bbPlotText(label = "Matched Set",
x = 6.25,
y = 2.75,
just = c("center", "bottom"),
fontcolor = "#A6CEE3",
fontface = "bold",
fontfamily = 'mono')
Below our GRanges, we can use BentoBox
to place ggplots created from the nullranges
plotPropensity()
and plotCovariate()
functions.
First lets plot propensity scores:
library(ggplot2)
smallText <- theme(legend.title = element_text(size=8),
legend.text=element_text(size=8),
title = element_text(size=8),
axis.title.x = element_text(size=8),
axis.title.y = element_text(size=8))
## Plot propensity scores ggplot in BentoBox
propensityPlot <-
plotPropensity(mgr, sets=c('f','m','p')) +
smallText +
theme(legend.key.size = unit(0.5, 'lines'),
title = element_blank())
bbPlotGG(plot = propensityPlot,
x = 1, y = 4.5, width = 2.5, height = 1.5,
just = c("left", "top"))
## Text labels
bbPlotText(label = "plotPropensity()",
x = 1.10, y = 4.24,
just = c("left", "bottom"),
fontface = "bold",
fontfamily = 'mono')
bbPlotText(label = "~color + length",
x = 1.25, y = 4.5,
just = c("left", "bottom"),
fontsize = 10,
fontfamily = "mono")
plotting color
and length
covariates:
## Plot "color" covariate
covColor <-
plotCovariate(x=mgr, covar=covariates(mgr)[1], sets=c('f','m','p')) +
smallText +
theme(legend.text = element_blank(),
legend.position = 'none')
bbPlotGG(plot = covColor,
x = 3.50, y = 4.5, width = 1.8, height = 1.5,
just = c("left", "top"))
## Plot "length" covariate
covLength <-
plotCovariate(x=mgr, covar=covariates(mgr)[2], sets=c('f','m','p'))+
smallText +
theme(legend.key.size = unit(0.5, 'lines'))
bbPlotGG(plot = covLength,
x = 5.30, y = 4.5, width = 2.75, height = 1.5,
just = c("left", "top"))
## Text labels
bbPlotText(label = "plotCovariate()",
x = 3.75,
y = 4.24,
just = c("left", "bottom"),
fontface = "bold",
fontfamily = "mono")
bbPlotText(label = covariates(mgr),
x = c(4, 5.9),
y = 4.5,
just = c("left", "bottom"),
fontsize = 10,
fontfamily = "mono")
When the plot is finished you can remove the page guides:
And now your plot is publication-ready!