Skip to content

BiCroLab/nextflow-gpseq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nextflow pipeline for processing of GPSeq data

Genomic loci positioning by sequencing (GPSeq): genome-wide method for inferring distances to the nuclear lamina all along the nuclear radius.


Getting Started

The whole pipeline can be cloned in your current working directory with:

git clone https://github.com/BiCroLab/nextflow-gpseq
cd nextflow-gpseq

Requirements

conda create -n nextflow -c conda-forge -c bioconda nextflow=23.10.0 singularity=3.8*

  • nextflow (tested onversion 23.10 or higher)
  • singularity (tested on version 3.8.6)

To test the pipeline with default settings and a test dataset: nextflow run main.nf -profile test


Running the pipeline with your own dataset


  1. Make a samplesheet.csv containing the following columns: sample,fastq,barcode,condition.
    The file should be comma separated and sorted by digestion time-point in seconds.

  1. Check and adjust required settings in nextflow.config and igenomes.config.

  1. Run the pipeline with the following command:

    nextflow run main.nf --samplesheet /path/to/samplesheet.csv --outdir /path/to/results --fasta /path/to/reference.fa --bwt2index /path/to/bowtie2/index/folder -resume
    


General Settings:

Setting Description
enzyme / cutsite enzyme name (e.g. DpnII) and recognition motif (e.g. GATC)
binsizes list of comma-separated binsizes for calculating GPSeq scores;
formula: window:smoothing: "1e+05:1e+05,1e+05:1e+04,5e+04:5e+04"
normalization data normalization by library / chromosome ("lib" / "chrom")

"lib": each experiment is normalized based on its total read count
"chrom": normalization is applied independently for each chromosome
site_domain param affecting which restriction sites are considered for computing score
default value is "universe"; also see: "separate"/"union"

for a detailed explanation about all existing site_domain settings,
please refer to the supplementary information of our GPSeq paper.
score_outlier_tag setting to filter out outliers (default: "iqr:1.5")
bed_outlier_tag setting to filter out outliers (default: "chisq:0.01")

Genome Settings:

Genome-related settings can be modified from igenomes.config:

Setting Description
fasta path to reference genome.fa genome sequence
fasta_index path to corresponding genome.fa.fai genome index
bowtie2 path to directory containing bowtie2 index
mask_bed path to bed file containing regions to be masked


Outputs

About

Nextflow pipeline for processing of GPSeq data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 67.5%
  • Nextflow 16.2%
  • Python 15.9%
  • Groovy 0.4%