Pull contact frequency from `.hic` files
Source:R/AllGenerics.R
, R/methods-pullHic.R
pullHicPixels.Rd
Pull contact frequency from `.hic` files
Usage
pullHicPixels(
x,
files,
binSize,
...,
h5File = tempfile(fileext = ".h5"),
half = "both",
norm = "NONE",
matrix = "observed",
blockSize = 248956422,
onDisk = TRUE,
compressionLevel = 0,
chunkSize = 1
)
# S4 method for GInteractions,character,numeric
pullHicPixels(
x,
files,
binSize,
h5File,
half,
norm,
matrix,
blockSize,
onDisk,
compressionLevel,
chunkSize
)
Arguments
- x
GInteractions object containing interactions to extract from Hi-C files.
- files
Character file paths to `.hic` files.
- binSize
Integer (numeric) describing the resolution (range widths) of the paired data.
- ...
Additional arguments.
- h5File
Character file path to save `.h5` file.
- half
String (character vector of length one) indicating whether to keep values for the upper triangular (`half="upper"`) where `start1 < start2`, lower triangular (`half="lower"`) where `start1 > start2`, or both (`half="both"`, default). When `half="upper"` all lower triangular values are `NA`. When `half="lower"` all upper triangular values are `NA`. When `half="both"` there are no `NA` values. For interchromosomal interactions there is no inherent directionality between chromosomes, so data is returned regardless of specified order.
- norm
String (length one character vector) describing the Hi-C normalization to apply. Use `strawr::readHicNormTypes()` to see accepted values for each file in `files`.
- matrix
String (length one character vector) Type of matrix to extract. Must be one of "observed", "oe", or "expected". "observed" is observed counts, "oe" is observed/expected counts, "expected" is expected counts.
- blockSize
Number (length one numeric vector) describing the size in base-pairs to pull from each `.hic` file. Default is 248956422 (the length of the longest chromosome in the human hg38 genome). For large `.hic` files `blockSize` can be reduced to conserve the amount of data read in at a time. Larger `blockSize` values speed up performance, but use more memory.
- onDisk
Boolean (length one logical vector that is not NA) indicating whether extracted data should be stored on disk in an HDF5 file. Default is TRUE.
- compressionLevel
Number (length one numeric vector) between 0 (Default) and 9 indicating the compression level used on HDF5 file.
- chunkSize
Number (length one numeric vector) indicating how many values of `x` to chunk for each write to HDF5 stored data. This has downstream implications for accessing subsets later. For small `compressionLevel` values use smaller `chunkSize` values and for large `compressionLevel` values use large (i.e. `length(x)`) values to improve performance.
Value
InteractionSet object with a 2-dimensional array of Hi-C interactions (rows) and Hi-C sample (columns).
Examples
## Load marinerData
if (!require("marinerData", quietly = TRUE))
BiocManager::install("marinerData")
## Read .hic file paths
hicFiles <- c(
marinerData::LEUK_HEK_PJA27_inter_30.hic(),
marinerData::LEUK_HEK_PJA30_inter_30.hic()
)
#> see ?marinerData and browseVignettes('marinerData') for documentation
#> loading from cache
#> see ?marinerData and browseVignettes('marinerData') for documentation
#> loading from cache
names(hicFiles) <- c("FS", "WT")
## Read in loop pixels as GInteractions object
pixels <-
WT_5kbLoops.txt() |>
setNames("WT") |>
read.table(header=TRUE) |>
as_ginteractions(keep.extra.columns=FALSE) |>
assignToBins(binSize=100e3)
#> see ?marinerData and browseVignettes('marinerData') for documentation
#> loading from cache
## Removes the "chr" prefix for compatibility
## with the preprocessed hic files
GenomeInfoDb::seqlevelsStyle(pixels) <- 'ENSEMBL'
## Extract the first 100 pixels
imat <- pullHicPixels(x=pixels[1:100],
files=hicFiles,
binSize=100e3)
imat
#> class: InteractionMatrix
#> dim: count matrix with 100 interactions and 2 file(s)
#> metadata(3): binSize norm matrix
#> assays(1): counts
#> rownames: NULL
#> rowData names(0):
#> colnames(2): FS WT
#> colData names(2): files fileNames
#> type: GInteractions
#> regions: 12176
## Access count matrix
counts(imat)
#> <100 x 2> DelayedMatrix object of type "double":
#> FS WT
#> [1,] 6 4
#> [2,] 6 6
#> [3,] 33 30
#> [4,] 3 2
#> [5,] 6 6
#> ... . .
#> [96,] 3 1
#> [97,] 2 1
#> [98,] 2 1
#> [99,] 10 7
#> [100,] 3 5