The dimensions of the pulled submatrix is defined by dividing the widths of anchors in `x` by the `binSize`. When the anchor widths are the same for each interaction, an InteractionArray is returned. However, if the anchor widths differ in `x`, an InteractionJaggedArray is returned instead.
Usage
pullHicMatrices(
x,
files,
binSize,
...,
h5File = tempfile(fileext = ".h5"),
half = "both",
norm = "NONE",
matrix = "observed",
blockSize = 248956422,
onDisk = TRUE,
compressionLevel = 0,
chunkSize = 1
)
# S4 method for GInteractions,character,numeric
pullHicMatrices(
x,
files,
binSize,
h5File,
half,
norm,
matrix,
blockSize,
onDisk,
compressionLevel,
chunkSize
)
Arguments
- x
GInteractions object containing interactions to extract from Hi-C files.
- files
Character file paths to `.hic` files.
- binSize
Integer (numeric) describing the resolution (range widths) of the paired data.
- ...
Additional arguments.
- h5File
Character file path to save `.h5` file.
- half
String (character vector of length one) indicating whether to keep values for the upper triangular (`half="upper"`) where `start1 < start2`, lower triangular (`half="lower"`) where `start1 > start2`, or both (`half="both"`, default). When `half="upper"` all lower triangular values are `NA`. When `half="lower"` all upper triangular values are `NA`. When `half="both"` there are no `NA` values. For interchromosomal interactions there is no inherent directionality between chromosomes, so data is returned regardless of specified order.
- norm
String (length one character vector) describing the Hi-C normalization to apply. Use `strawr::readHicNormTypes()` to see accepted values for each file in `files`.
- matrix
String (length one character vector) Type of matrix to extract. Must be one of "observed", "oe", or "expected". "observed" is observed counts, "oe" is observed/expected counts, "expected" is expected counts.
- blockSize
Number (length one numeric vector) describing the size in base-pairs to pull from each `.hic` file. Default is 248956422 (the length of the longest chromosome in the human hg38 genome). For large `.hic` files `blockSize` can be reduced to conserve the amount of data read in at a time. Larger `blockSize` values speed up performance, but use more memory.
- onDisk
Boolean (length one logical vector that is not NA) indicating whether extracted data should be stored on disk in an HDF5 file. Default is TRUE.
- compressionLevel
Number (length one numeric vector) between 0 (Default) and 9 indicating the compression level used on HDF5 file.
- chunkSize
Number (length one numeric vector) indicating how many values of `x` to chunk for each write to HDF5 stored data. This has downstream implications for accessing subsets later. For small `compressionLevel` values use smaller `chunkSize` values and for large `compressionLevel` values use large (i.e. `length(x)`) values to improve performance.
Value
InteractionSet object with a 4-dimensional array of Hi-C submatrices, rownames, and colnames. Array is stored with the following dimensions: Interactions in `x`, Hi-C `files`, rows of submatrix, columns of submatrix. The submatrices returned have rows cooresponding to anchor1 of `x` and columns correspond to anchor2 of `x`.
Examples
## Load marinerData
if (!require("marinerData", quietly = TRUE))
BiocManager::install("marinerData")
## Read .hic file paths
hicFiles <- c(
marinerData::LEUK_HEK_PJA27_inter_30.hic(),
marinerData::LEUK_HEK_PJA30_inter_30.hic()
)
#> see ?marinerData and browseVignettes('marinerData') for documentation
#> loading from cache
#> see ?marinerData and browseVignettes('marinerData') for documentation
#> loading from cache
names(hicFiles) <- c("FS", "WT")
## Read in loop pixels as GInteractions object
pixels <-
WT_5kbLoops.txt() |>
setNames("WT") |>
read.table(header=TRUE) |>
as_ginteractions(keep.extra.columns=FALSE) |>
assignToBins(binSize=100e3)
#> see ?marinerData and browseVignettes('marinerData') for documentation
#> loading from cache
## Removes the "chr" prefix for compatibility
## with the preprocessed hic files
GenomeInfoDb::seqlevelsStyle(pixels) <- 'ENSEMBL'
## Expand pixels to regions for pulling
## Hi-C submatrices
regions <- pixelsToMatrices(x=pixels, buffer=5)
## Extract 11x11 count matrices from the
## first 100 regions and 2 Hi-C files
iarr <- pullHicMatrices(x=regions[1:100],
files=hicFiles,
binSize=100e3)
iarr
#> class: InteractionArray
#> dim: 100 interaction(s), 2 file(s), 11x11 count matrix(es)
#> metadata(3): binSize norm matrix
#> assays(3): counts rownames colnames
#> rownames: NULL
#> rowData names(0):
#> colnames(2): FS WT
#> colData names(2): files fileNames
#> type: GInteractions
#> regions: 12176
## Access count matrices
counts(iarr)
#> <11 x 11 x 100 x 2> DelayedArray object of type "double":
#> ,,1,FS
#> [,1] [,2] [,3] ... [,10] [,11]
#> [1,] 4 3 4 . 0 0
#> [2,] 6 9 3 . 2 1
#> ... . . . . . .
#> [10,] 1 1 4 . 2 2
#> [11,] 1 1 0 . 5 2
#>
#> ...
#>
#> ,,100,WT
#> [,1] [,2] [,3] ... [,10] [,11]
#> [1,] 0 0 1 . 0 0
#> [2,] 2 0 0 . 1 1
#> ... . . . . . .
#> [10,] 5 8 32 . 0 5
#> [11,] 2 4 8 . 0 1
#>
## Display the start bin of each
## interaction in the count
## matrices
counts(iarr, showDimnames=TRUE)
#> <11 x 11 x 100 x 2> DelayedArray object of type "double":
#> ,,1,FS
#> 14200000 14300000 14400000 ... 15100000 15200000
#> 13900000 4 3 4 . 0 0
#> 14000000 6 9 3 . 2 1
#> 14100000 11 8 4 . 1 1
#> 14200000 38 8 4 . 0 0
#> 14300000 8 31 11 . 0 1
#> 14400000 4 11 35 . 3 2
#> 14500000 4 1 12 . 1 0
#> 14600000 2 0 4 . 3 2
#> 14700000 1 2 6 . 3 3
#> 14800000 1 1 4 . 2 2
#> 14900000 1 1 0 . 5 2
#>
#> ...
#>
#> ,,100,WT
#> 16300000 16400000 16500000 ... 17200000 17300000
#> 15600000 0 0 1 . 0 0
#> 15700000 2 0 0 . 1 1
#> 15800000 1 2 1 . 0 1
#> 15900000 2 3 2 . 1 1
#> 16000000 1 2 0 . 1 1
#> 16100000 6 3 2 . 0 0
#> 16200000 4 4 3 . 1 0
#> 16300000 25 8 5 . 0 1
#> 16400000 8 19 8 . 0 0
#> 16500000 5 8 32 . 0 5
#> 16600000 2 4 8 . 0 1
## InteractionJaggedArray example
gi <- read.table(text="
1 51000000 51300000 1 51000000 51500000
2 52000000 52300000 3 52000000 52500000
1 150000000 150500000 1 150000000 150300000
2 52000000 52300000 2 52000000 52800000") |>
as_ginteractions()
iarr <- pullHicMatrices(gi, hicFiles, 100e03, half="both")
iarr
#> class: InteractionJaggedArray
#> dim: 4 interaction(s), 2 file(s), variable count matrix(es)
#> metadata(3): binSize, norm, matrix
#> colData: FS, WT
#> colData names(2): files, fileNames
#> HDF5: /tmp/RtmpwSTrvs/file17dd3f346bc6.h5
counts(iarr)
#> <n x m x 4 x 2> JaggedArray:
#> ,,1,1
#> <3 x 5> matrix
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 53 15 5 1 4
#> [2,] 15 68 19 8 5
#> [3,] 5 19 69 12 2
#>
#> ...
#>
#> ,,4,2
#> <3 x 8> matrix
#> [,1] [,2] [,3] ... [,7] [,8]
#> [1,] 31 7 2 . 0 0
#> [2,] 7 22 5 . 1 1
#> [3,] 2 5 26 . 0 2
#>