Difference between revisions of "HLab:Rscripts"

From CCGB
Jump to: navigation, search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Miscellaneous R Scripts ==
 
== Miscellaneous R Scripts ==
  
Quantile normalization
+
'''Quantile normalization'''
*This can be used directly or as an example.  It takes an input and output file names (tab-delimited files with header and row names).  It is set up for RNA and adds 1.1 to the expression levels to avoid zeros before taking the log2(normalized expr). [[Media:normalize.r.txt]]
+
*This can be used directly or as an example.  It takes an input and output file names (tab-delimited files with header and row names).  It is set up for RNA and adds 1.1 to the expression levels to avoid zeros before taking the log2(normalized expr). [[Media:normalize.r.txt|normalize.r]]
  
Scatter plots for replicates
+
'''Scatter plots for replicates'''
 
*This is an example.  File names and number of columns are hard coded and will need changed to match your usage.  The first one plots the points all the same color.  For graphs that are very dense the second one changes the colors depending on the number of bins in the vicinity of the one being plotted (red is dense).
 
*This is an example.  File names and number of columns are hard coded and will need changed to match your usage.  The first one plots the points all the same color.  For graphs that are very dense the second one changes the colors depending on the number of bins in the vicinity of the one being plotted (red is dense).
 
** [[Media:plotPairs.r.txt|plotPairs.r]]
 
** [[Media:plotPairs.r.txt|plotPairs.r]]
 
** [[Media:plotPairsDensity.r.txt|plotPairsDensity.r]]
 
** [[Media:plotPairsDensity.r.txt|plotPairsDensity.r]]
  
PCA
+
'''PCA'''
 +
*This can be used directly or as an example.  It takes the input filename, that should be tab-separated with rownames and a header for column names.  It plots the first few principle components and the variance.  It plots to the default RPlots.pdf.
 +
** [[Media:pca.r.txt|pca.r]]
  
GEDI
+
'''Heatmaps'''
 +
*Kmeans clustering and heatmap for differential expression. This script takes inputs of input table, outfile, column names, and k.  The input table is expected to have rownames (genes).  The column name input is comma separated text.  The output includes the pdf named using outfile and geneClusters.txt.
 +
** [[Media:kmeans.r.txt|kmeans.r]]
 +
*This is an example script for doing a heatmap of a data matrix with many data types.  This is for DNase peaks data.  This script would need to be copied and edited for each different data matrix you wish to plot.
 +
** [[Media:peakHeatmap.r.txt|peakHeatmap.r]]
 +
 
 +
'''GEDI'''
 +
*The GEDI maps are low resolution.  You can use the text output (mapCentroids.txt) of the values in each of the boxes and then plot the map with R.  Inputs are the text file, number of rows, cols, plus optional parameters described in the script.  The output is files named GEDI<colname>.pdf in the current working directory. 
 +
** [[Media:gedi.r.txt|gedi.r]]
  
 
Return to [[HLab:Main|main]]
 
Return to [[HLab:Main|main]]

Latest revision as of 10:53, 7 January 2015

Miscellaneous R Scripts

Quantile normalization

  • This can be used directly or as an example. It takes an input and output file names (tab-delimited files with header and row names). It is set up for RNA and adds 1.1 to the expression levels to avoid zeros before taking the log2(normalized expr). normalize.r

Scatter plots for replicates

  • This is an example. File names and number of columns are hard coded and will need changed to match your usage. The first one plots the points all the same color. For graphs that are very dense the second one changes the colors depending on the number of bins in the vicinity of the one being plotted (red is dense).

PCA

  • This can be used directly or as an example. It takes the input filename, that should be tab-separated with rownames and a header for column names. It plots the first few principle components and the variance. It plots to the default RPlots.pdf.

Heatmaps

  • Kmeans clustering and heatmap for differential expression. This script takes inputs of input table, outfile, column names, and k. The input table is expected to have rownames (genes). The column name input is comma separated text. The output includes the pdf named using outfile and geneClusters.txt.
  • This is an example script for doing a heatmap of a data matrix with many data types. This is for DNase peaks data. This script would need to be copied and edited for each different data matrix you wish to plot.

GEDI

  • The GEDI maps are low resolution. You can use the text output (mapCentroids.txt) of the values in each of the boxes and then plot the map with R. Inputs are the text file, number of rows, cols, plus optional parameters described in the script. The output is files named GEDI<colname>.pdf in the current working directory.

Return to main