CHANGES


1.9  - ANOVA and excess variation tests in Shiny test app.
       Loading counts.csv in R now generates an RDS file for faster reloading.
       Options to clip tails at a certain length or require adaptor bases.
       Options to clip reads where the density of high quality bases drops below a certain level. Gs are ignored in this, as two-color sequencing of Gs is dark on both channels and reads may end end with a run of high quality Gs actually representing running off the end of the fragment.
       Tail length weight calibration now uses "well knotted" splines.
       Option to do differential tails on detrended samples.
       Color palette can be changed for tail distribution.
       /1 stripped from read names during clipping, as STAR removes /1 from read names if present.
       Option to change number of As to call a tail from default of 4.
       Option to clip reads based on UMI and barcode (encoded in read name) rather than fixed adaptor sequence.
       UMI counting (assuming UMI encoded in read name).

1.8  - End-shift test extra filtering to ensure no NA coefficients.
       Fix bug in bigwig production when pysam present.

1.7  - R namespace bugfix.

1.6  - tidyr::unnest_ disappeared, use tidyr::unnest.
       Shiny test app allow y limits to be set.
       Add sample option min_match.
       Bigwigs now omit introns (cover not span).
       Some clean up of tidyverse code (_-suffix functions are now deprecated).
       Updated install instructions.
       weitrix based tests.

1.5  - Shiny app supports differential tail length and differential expression.
       R code to produce matrix and weights for shift, tail length. 
       Option to produce peak pairs including non-utr peaks.
       Reduced memory usage in call-peaks.
       do_fragile option for pipeline, to avoid crashes on weird datasets.

1.4  - Use STAR aligner, only annotated introns are used. (bowtie2 still supported also.)

1.3  - Long standing bug in extend-sam-basespace did not extend over lower case letters in reference, fixed.
       Allow up to 60% mismatch on A extension in extend-sam-basespace.
       Filter out peaks with short average tail length.
       Use --sensitive-local mode in bowtie2.

1.2  - Allow raw poly(A) lengthh and templated width download from mPAT shiny report.
       Fix bug setting three prime bounds for cumulative read distribution plots (missing dplyr:: for lead and lag)

1.1  - Remove deduplication.
       poly(A) calling in reads no longer uses quality information,
           a sequence of 10 "A"s and/or adaptor sequence is called as poly(A) tail.

0.43 - Clip Illumina reads to a region containing 90% bases with quality at
       least 20. Poly(A) tails are only called within this region, this
       means we see less long poly(A) tails, so this changes the tail length statistics 
       quite a lot.
       Add shiny reports, end shift test in R.
       Bugfixes to plot-comparison, plot-pooled.
       make-ensembl-reference skips genes with transcripts on inconsistent strands.
       Control of clipping in pipeline.
       peak-polya option in pipeline.

0.42 - When extended, don't extend over another exon on the same strand 
       (previously: another CDS on the same strand).
       Another excess memory usage bug removed from aggregate-tail-counts.

0.41 - Add Ensembl genome downloader.
       Read counting can be restricted to exons of genes.
       Relation between peaks and genes includes more informative information.
       Add "primer-gff" tool.
       Use Varistran for moderated log transformation.
       analyse-polya-sample deletes some unneeded files after running.
       Re-use pickles for dedup counts.
       call-utrs and compare-peaks use output of relate-peaks-to-genes when determining which are in 3'UTR

0.40 - Hack to reduce parallelism on tail_count, reduce memory usage.
       Fix very silly overuse of memory.
       Other memory improvements in aggregate-tail-counts.

0.39 - Bugfix with groups.
       More documentation of pipeline output.
       Include design matrix in report from "test:".

0.38 - Allow more parallelism for tail-count, memory usage doesn't seem to be severe.
       Check for duplicate sample name or duplicate read filename in "analyse-polya-batch:".

0.37 - Use Fitnoise 2 for testing.

0.36 - Prevent aggregate-tail-counts from running in parallel in pipeline.
       Call primary peaks as part of pipeline.
       Output a JSON file for Andrew and Jack's survival curver plotter.

0.35 - Options for sample weighting and empirical controls in "test:".

0.34 - .json files accidentally omitted from report.

0.33 - Add average expression / average tail length column to "test:" degusts.

0.32 - Add "AD" property to BAM files, number of adaptor sequence bases observed.
       Don't put a lower limit on number of reads required to call proportion, tail.
       Improved tail length testing.
       Dropped Chi-Sq testing from "compare-peaks:", "test:" hopefully makes this obsolete.
       "compare-peaks:" outputs "NA"s rather than "None"s in peak-pair file.
       Cleaned up reports.

0.31 - Remove collapse-features from call-peaks, not needed and causes later stage to fail.

0.30 - Bugfix in report generation for "analyse-polya-batch:",
       correctly link to genewise-dedup/peakwise-dedup

0.29 - Bugfix to "make-tt-reference:".

0.28 - Bugfix to "make-tt-reference:".

0.27 - Peaks were being incorrectly extended in pipeline, fixed.
       Added facility to perform a set of tests to analyse-polya-batch.
       "compare-peaks:" option to output longer pre-peak and inter-peak sequences.

0.26 - Show p.value in "test:".
0.25 - Bugfix in "test:" tool.
0.24 - Add "test:" tool.

0.23 - Add --peak-min-depth option to workflow.
       Default raised from 10 to 50.
       compare-peaks output table lists relevant peaks.
       Gene viewer shows relevant peaks.
       Gene viewer shows 3' UTR position.
       Don't nuke pooled.csv.

0.22 - "compare-peaks:" outputs utr before first peak as well as inter-peak sequence.
       Fix bugs in workflow.

0.21 - Major refactoring
       - define reference annotation to be GFF3
       - UCSC genome downloader

0.20 - "nodedup" -> "dup"
       Changes to reports.

0.19 - Fix setup.py

0.18 - Basespace read support.
       "call-peaks:" pipeline.
       "compare-peaks:" tool.
       "geneview-webapp:" gene expression viewer.

0.17 - Assignment of reads to features made more deterministic in "tail-lengths:"

0.16 - Thread tail statistics through outputs.
       "tail_length:" now only ever counts reads once, 
           uses only the feature with the greatest overlap of a read 
           rather than all features with overlaps.
       "compare-peaks:" now ranks by "interestingness".

0.15 - Added "compare-peaks:" tool.

0.14 - Extend_sam now requires 4 bases of tail rather than 3 to be marked AA,
       to be consistent with tail length tools (and observation that this is a better cutoff).

0.13 - Don't call file igv-plots.tar.gz.tar.gz,
       a single .tar.gz is sufficient.

0.12 - Bugfixes

0.11 - Update to use newer nesoni features, such as sample tags

0.10 - use distribute

0.9 - R+ doesn't like trailing newlines in text in plots sometimes, apparently

0.8 - use report from correct log files when using consensus=False

0.7 - don't re-run everything if number of cores changes

0.6 - analysis is now performed both with and without deduplication

0.5 - add "tail-stats:" tool
      extend_sam: fix out by one error for AA:i: attribute on reverse strand

0.4 - plot-pooled now produces a spreadsheet