Package 'bayesbio' reference manual

Title:	Miscellaneous Functions for Bioinformatics and Bayesian Statistics
Description:	A hodgepodge of hopefully helpful functions. Two of these perform shrinkage estimation: one using a simple weighted method where the user can specify the degree of shrinkage required, and one using James-Stein shrinkage estimation for the case of unequal variances.
Authors:	Andrew McKenzie [aut, cre]
Maintainer:	Andrew McKenzie <[email protected]>
License:	GPL-3
Version:	1.0.1.9000
Built:	2025-03-14 04:14:00 UTC
Source:	https://github.com/andymckenzie/bayesbio

Likelihood function of the James-Stein shrinkage factor.

Description

To be used in MLE computation of the James-Stein shrinkage factor.

Usage

a_hat_mle(stat, vars, a_hat)
a_hat_mle(stat, vars, a_hat)

Arguments

`stat`	Input statistics to be shrinkage estimated.
`vars`	Corresponding variances of equal length.
`a_hat`	Shrinkage intensity to be estimated.

Value

The likelihood of the function given the parameters.

References

http://projecteuclid.org/euclid.ss/1331729986

Identify all duplicates values in a vector.

Description

By default the base R function duplicated only identifies the duplicated values after the first in a vector as TRUE. This function identifies all of the duplicates as true.

Usage

allDups(x)
allDups(x)

Arguments

`x`	The input vector.

Value

A logical vector.

bayesbio: Miscellaneous functions useful in bioinformatics and Bayesian statistics

Description

A hodgepodge of hopefully helpful functions. Two of these perform shrinkage estimation: one using a simple weighted method where the user can specify the degree of shrinkage required, and one using James-Stein shrinkage estimation for the case of unequal variances.

cbind while converting missing entries to NA.

Description

cbind usually malfunctions on vector of unequal lengths; this function allows vectors of unequal length to be combined, while filling the missing entries with NAs.

Usage

cbindFill(...)
cbindFill(...)

Arguments

...

A set of vectors separated by commas.

Value

A matrix that combines the inputted vectors.

References

http://r.789695.n4.nabble.com/How-to-join-matrices-of-different-row-length-from-a-list-td3177212.html; http://stackoverflow.com/a/7962286/560791

Create a table summarizing covariates segregated by levels of a diagnosis.

Description

Take a data frame with a diagnosis column and a number of covariate columns and specify the percentage of specified covariate levels in each group and/or the mean +/- sd for quantitative variables for each covariate desired. Although it was designed for generating sample summary tables in the context of bioinformatics experiments and the terminology refers to this, it can be used more generally as well.

Usage

covariatesTable(df, dg_col, percent_cols = NULL, quant_cols = NULL,
  percent_col_cats = NULL, group_names = NULL, row_names = NULL)
covariatesTable(df, dg_col, percent_cols = NULL, quant_cols = NULL,
  percent_col_cats = NULL, group_names = NULL, row_names = NULL)

Arguments

`df`	The data frame containing the columns to be extracted, both diagnosis and covariates.
`dg_col`	Column specifying the diagnosis column name, which is used to split the table. Levels of this value will be used to generate
`percent_cols`	Character vector of column names specifying the
`percent_col_cats`	Character vector specifying the values for which the percentage should be calculated for each percent column.
`group_names`	Optional character vector specifying the groups within the dg_col, which will be used to order the resulting table.
`row_names`	Optional character vector specifying what the rownames of the resulting table should be.

Value

A table summarizing the covariates.

Creates random, unique character strings.

Description

Makes them unique by randomly choosing the character strings; and, in case it is necessary, adding numbers to the end using make.unique.

Usage

createStrings(number, length, upper = FALSE)
createStrings(number, length, upper = FALSE)

Arguments

`number`	Specifies the number of character strings that should be created.
`length`	Specifies the length of each character string in letters.
`upper`	Binary parameter specifying whether the character strings should be uppercase. Default = FALSE, so the character strings are all lowercase.

References

http://stackoverflow.com/a/1439541/560791

Create a color-labeled horizontal bar plot in ggplot2.

Description

This function takes a data frame and creates a horizontal (by default) bar plot from it while ordering the values.

Usage

ggHorizBar(data_df, dataCol, namesCol, labelsCol, decreasing = TRUE)
ggHorizBar(data_df, dataCol, namesCol, labelsCol, decreasing = TRUE)

Arguments

`data_df`	Data frame with columns to specify the data values, the row names, and the fill colors of each of the bars.
`dataCol`	The column name that specifies the values to be plotted.
`namesCol`	The column name that specifies the corresponding names for each of the bar plots to be plotted.
`labelsCol`	The column name that specifies the groups of the labels.
`decreasing`	Logical specifying whether the values in dataCol should be in decreasing order.

Value

A ggplot2 object, which can be plotted via the plot() function or saved via the ggsave() function.

Jaccard index of two character vectors.

Description

This function compares the elements in two character vectors to find the Jaccard index, i.e. the number of intersections divided by the total number of elements in both sets.

Usage

jaccardSets(set1, set2)
jaccardSets(set1, set2)

Arguments

`set1`	Character vector.
`set2`	Character vector.

Value

A number (one-element numeric vector) specifying the Jaccard index from comparing the two sets.

References

https://en.wikipedia.org/wiki/Jaccard_index

Replace the upper or lower triangle of a matrix with the other to make it symmetric.

Description

The replaced values will be lost following the operation of this function.

Usage

makeMatSym(mat, replaceUpper = TRUE)
makeMatSym(mat, replaceUpper = TRUE)

Arguments

`mat`	The matrix to be made symmetric.
`replaceUpper`	Whether the upper triangle of the matrix should be replaced by the lower triangle. Default = TRUE; if FALSE, the lower triangle of the matrix is replaced by the upper triangle.

Value

A matrix that has been made symmetric.

Multiple pattern gsub.

Description

An extension to gsub that handles vectors of patterns and replacements, avoiding recursion problems associated with overlap at the extense of computation time.

Usage

mgsub(pattern, replacement, x, ...)
mgsub(pattern, replacement, x, ...)

Arguments

`pattern`	Character vector of patterns to match.
`replacement`	Character vector of replacements for each pattern.
`x`	Character vector in which the gsub should be performed.
`...`	Additional arguments to grep.

References

http://stackoverflow.com/a/15254254/560791

Merge data frames based on the nearest datetime differences.

Description

Takes two data frames each with time/date columns in date-time or date format (i.e., able to be compared using the function difftime), finds the rows of df2 that minimize the absolute value of the datetime for each of the rows in df1, and merges the corresponding rows of df2 into df1 for downstream processing.

Usage

nearestTime(df1, df2, timeCol1, timeCol2)
nearestTime(df1, df2, timeCol1, timeCol2)

Arguments

`df1`	Data frame containing the dates for which the differences between the other data frame's date column should be minimized for each row.
`df2`	Data frame containing the dates which should be compared to, as well as other values that should be merged to df1 per minimized date time.
`timeCol1`	Character vector specifying the date/time column in df1.
`timeCol2`	Character vector specifying the date/time column in df2.

Value

A merged data frame that minimizes datetime differences.

Merge data frames based on the nearest datetime differences and an ID column. Also removes duplicate column names from the result.

Description

Usage

nearestTimeandID(df1, df2, timeCol1, timeCol2, IDcol)
nearestTimeandID(df1, df2, timeCol1, timeCol2, IDcol)

Arguments

`df1`	Data frame containing the dates for which the differences between the other data frame's date column should be minimized for each row.
`df2`	Data frame containing the dates which should be compared to, as well as other values that should be merged to df1 per minimized date time.
`timeCol1`	Character vector specifying the date/time column in df1.
`timeCol2`	Character vector specifying the date/time column in df2.
`IDcol`	Must be unique by row in df1. Multiple versions are allowed (and expected at least in some rows, as that is the point of the function) in df2.

Value

A merged data frame that minimizes datetime differences.

Adjust p-values where n is less than p.

Description

This function recapitulates p.adjust but allows the number of hypothesis tests n to be less than the number of p-values p. Statistical properties of the p-value adjustments may not hold.

Usage

p.adjust.nlp(p, method = p.adjust.methods, n = length(p))
p.adjust.nlp(p, method = p.adjust.methods, n = length(p))

Arguments

`p`	Numeric vector of p-values.
`method`	Correction method.
`n`	Number of comparisons to be made.

References

http://stackoverflow.com/a/30110186/560791

Perform PubMed queries on 2x2 combinations of term vectors.

Description

Perform PubMed queries on the intersections of two character vectors. This function is a wrapper to RISmed::EUtilsSummary with type = 'esearch', db = 'pubmed'.

Usage

pubmedQuery(rowTerms, colTerms, sleepTime = 0.01, ...)
pubmedQuery(rowTerms, colTerms, sleepTime = 0.01, ...)

Arguments

`rowTerms`	Character vector of terms that should make up the rows of the resulting mention count data frame.
`colTerms`	Character vector of terms for the columns.
`sleepTime`	How much time (in seconds) to sleep between successive PubMed queries. If you set this too low, PubMed may shut down your connection to prevent overloading their servers.
`...`	Additional arguments to RISmed::EUtilsSummary

Value

A data frame of the number of mentions for each combination of terms.

Find the standard error of the sampling distribution of a statistic.

Description

Finds the standard error of a numeric vector (i.e., the standard deviation divided by the square root of the sample size); by default, removes NAs prior to calculation.

Usage

std_error(x, na.rm = TRUE)
std_error(x, na.rm = TRUE)

Arguments

`x`	The numeric vector whose standard error should be calculated.
`na.rm`	Logical; TRUE indicates that NAs should be removed from the vector prior to calculating the standard error, and vice versa for FALSE.

Value

A one-element numeric vector giving the standard error.

Add values to the super- and sub-diagonals of a matrix.

Description

Takes a matrix and adds values to the values that are one above the diagonal (ie the superdiagonal) and the values that are one below the diagonal (ie the subdiagonal).

Usage

subsupDiag(matrix, x)
subsupDiag(matrix, x)

Arguments

`matrix`	Matrix whose super- and sub-diagonals values should be replaced.
`x`	Numeric vector used to replace values in the matrix. If the inputted vector is not of the same length as both the super- and sub-diagonals of the matrix, then short vector recycling will occur (e.g., x can be one value to replace all of the super- and sub-diagonals of the matrix with that one value).

Value

The original matrix with the values added.

References

http://stackoverflow.com/a/9885186/560791

Perform James-Stein shrinkage estimation using unequal variances

Description

Traditional JS shrinkage estimation assumes equal variances for each of the data points, while this algorithm extends JS shrinkage estimation to entries with different variances.

Usage

unequalVarShrink(stat, vars, verbose = TRUE)
unequalVarShrink(stat, vars, verbose = TRUE)

Arguments

`stat`	Input statistics to be shrinkage estimated.
`vars`	Corresponding variances of equal length.
`verbose`	Whether information about the algorithm should be reported.

Value

A data frame containing the shrinkage estimated statistics.

References

http://projecteuclid.org/euclid.ss/1331729986

Weighted shrinkage estimation.

Description

Shrink values towards the mean (in the sample or the overall cohort) to an inverse degree to the confidence you assign to that observation.

Usage

weightedShrink(x, n, m = NULL, meanVal = NULL)
weightedShrink(x, n, m = NULL, meanVal = NULL)

Arguments

`x`	Numeric vector of values to be shrunken towards the mean.
`n`	Numeric vector with corresponding entries to x, specifying the number of observations used to calculate x, or some other confidence weight to associate with x.
`m`	Number specifying weight of the shrinkage estimation, relative to the number of observations in the input vector n. Defaults to the minimum of n, but this is an arbitrary value and should be explored to find an optimal value for your use case.
`meanVal`	Number specifying the overall mean towards which the values should be shrunken. Defaults to NULL, in which case it is calculated as the (non-weighted) arithmetic mean of the values in the inputted vector x.

Value

A numeric vector with shrunken data values.

References

http://math.stackexchange.com/a/41513

Write a data frame to a file with delimiter style.

Description

A wrapper function for write.table that has the same options as read.delim.

Usage

write.delim(df, file, row.names = FALSE, col.names = TRUE, sep = "\t",
  quote = FALSE, ...)
write.delim(df, file, row.names = FALSE, col.names = TRUE, sep = "\t",
  quote = FALSE, ...)

Arguments

`df`	Data frame to be written.
`file`	Full or relative path to file to be written.
`row.names`	Logical indicating whether to include row names.
`col.names`	Logical indicating whether to include column names.
`sep`	Deliter to separate fields in the resulting file. Default is tab separation.
`quote`	Logical indicating whether to put quotes around the resulting values.
`...`	Additional arguments to write.table.

Value

None; side-effect is to write to a file.

Package 'bayesbio'

Help Index

Likelihood function of the James-Stein shrinkage factor.

Description

Usage

Arguments

Value

References

Identify all duplicates values in a vector.

Description

Usage

Arguments

Value

bayesbio: Miscellaneous functions useful in bioinformatics and Bayesian statistics

Description

cbind while converting missing entries to NA.

Description

Usage

Arguments

Value

References

Create a table summarizing covariates segregated by levels of a diagnosis.

Description

Usage

Arguments

Value

Creates random, unique character strings.

Description

Usage

Arguments

References

Create a color-labeled horizontal bar plot in ggplot2.

Description

Usage

Arguments

Value

Jaccard index of two character vectors.

Description

Usage

Arguments

Value

References

Replace the upper or lower triangle of a matrix with the other to make it symmetric.

Description

Usage

Arguments

Value

Multiple pattern gsub.

Description

Usage

Arguments

References

Merge data frames based on the nearest datetime differences.

Description

Usage

Arguments

Value

Merge data frames based on the nearest datetime differences and an ID column. Also removes duplicate column names from the result.

Description

Usage

Arguments

Value

Adjust p-values where n is less than p.

Description

Usage

Arguments

References

Perform PubMed queries on 2x2 combinations of term vectors.

Description

Usage

Arguments

Value

Find the standard error of the sampling distribution of a statistic.

Description

Usage

Arguments

Value

Add values to the super- and sub-diagonals of a matrix.

Description

Usage