Title: | Descriptive Statistics, Transition Plots, and More |
---|---|
Description: | Tools for making the descriptive "Table 1" used in medical articles, a transition plot for showing changes between categories (also known as a Sankey diagram), flow charts by extending the grid package, a method for variable selection based on the SVD, Bézier lines with arrows complementing the ones in the 'grid' package, and more. |
Authors: | Max Gordon <[email protected]> |
Maintainer: | Max Gordon <[email protected]> |
License: | GPL (>= 3) |
Version: | 3.0.3 |
Built: | 2024-12-31 04:02:52 UTC |
Source: | https://github.com/gforge/gmisc |
This is a collection of functions that I've found useful in my research. The package is inspired by Frank Harrell's Hmisc package. The main focus is on tables, plots, and knitr-integration.
The getDescriptionStatsBy
is a straight forward function that
aims at helping you to generate descriptive table stratified by different
variables. In other words, the function returns everything you need for generating
a Table 1 ready for publication. This function is accompanied by the
describeMean
, describeMedian
, describeProp
,
and describeFactors
functions.
The mergeDesc
allows you to merge a set of outputs getDescriptionStatsBy
into a htmlTable
with the rgroup arguments automatically
generated, see vignette("descriptives", package = "Gmisc")
for a
detailed workflow description.
The transition plot function, transitionPlot
, is for
descriptive purposes. It tries to illustrate the size of change
between one state and the next, i.e. a transition. This is
basically a graph of based upon table(var1, var2)
.
The Singular value decomposition
is a common method for reducing the number of variables. Unfortunately
this compression can reduce the interpretability of the model. The getSvdMostInfluential
function tries to remedy that by identifying the most influential
elements from the V
-matrix.
The insertRowAndKeepAttr
simply adds a row while remembering
all the attributes previously set by using the copyAllNewAttributes
.
The mergeLists
tries to merge lists that do not have identical
elements.
Aligns a set of boxGrob
/boxPropGrob
according to the first positional argument.
alignVertical(reference, ..., .position = c("center", "top", "bottom")) alignHorizontal( reference, ..., .position = c("center", "left", "right"), .sub_position = c("none", "left", "right") )
alignVertical(reference, ..., .position = c("center", "top", "bottom")) alignHorizontal( reference, ..., .position = c("center", "left", "right"), .sub_position = c("none", "left", "right") )
reference |
A |
... |
A set of boxes. |
.position |
How to align the boxes, differs slightly for vertical and horizontal alignment see the accepted arguments |
.sub_position |
When the box is a |
list
with the boxes that are to be aligned
Other flowchart components:
boxGrob()
,
boxPropGrob()
,
connectGrob()
,
coords()
,
distance()
,
moveBox()
,
spread
library(grid) grid.newpage() box <- boxGrob("A cool\nreference\nbox", x = .5, y = .8, box_gp = gpar(fill = "#ADB5C7")) another_box <- boxGrob("A horizontal box", x = .1, y = .5) yet_another_box <- boxGrob("Another horizontal box", x = .8, y = .3) alignedBoxes <- alignHorizontal(box, another_box, yet_another_box, .position = "right") box for (b in alignedBoxes) { print(b) } vert_box <- boxGrob("Vert", x = .8, y = .3, box_gp = gpar(fill = "darkgreen"), txt_gp = gpar(col = "white")) another_vert_box <- boxGrob("Another vertical", x = .1, y = .5, box_gp = gpar(fill = "darkgreen"), txt_gp = gpar(col = "white")) alignedBoxes <- alignVertical(box, vert_box, another_vert_box, .position = "bottom") for (b in alignedBoxes) { print(b) }
library(grid) grid.newpage() box <- boxGrob("A cool\nreference\nbox", x = .5, y = .8, box_gp = gpar(fill = "#ADB5C7")) another_box <- boxGrob("A horizontal box", x = .1, y = .5) yet_another_box <- boxGrob("Another horizontal box", x = .8, y = .3) alignedBoxes <- alignHorizontal(box, another_box, yet_another_box, .position = "right") box for (b in alignedBoxes) { print(b) } vert_box <- boxGrob("Vert", x = .8, y = .3, box_gp = gpar(fill = "darkgreen"), txt_gp = gpar(col = "white")) another_vert_box <- boxGrob("Another vertical", x = .1, y = .5, box_gp = gpar(fill = "darkgreen"), txt_gp = gpar(col = "white")) alignedBoxes <- alignVertical(box, vert_box, another_vert_box, .position = "bottom") for (b in alignedBoxes) { print(b) }
This is an experimental addition to the original bezierArrowSmpl
with the addition of a gradient in the center of the arrow that fades.
bezierArrowGradient( x = c(0.2, 0.7, 0.3, 0.9), y = c(0.2, 0.2, 0.9, 0.9), width = 0.05, clr = "#000000", default.units = "npc", align_2_axis = TRUE, grdt_type = c("triangle", "rectangle"), grdt_prop = 0.8, grdt_decrease_prop = 0.5, grdt_clr_prop = 0.7, grdt_line_width, grdt_clr = "#2F4F2F", vp = NULL, gp = gpar(), rm_intersect = 3L, ... )
bezierArrowGradient( x = c(0.2, 0.7, 0.3, 0.9), y = c(0.2, 0.2, 0.9, 0.9), width = 0.05, clr = "#000000", default.units = "npc", align_2_axis = TRUE, grdt_type = c("triangle", "rectangle"), grdt_prop = 0.8, grdt_decrease_prop = 0.5, grdt_clr_prop = 0.7, grdt_line_width, grdt_clr = "#2F4F2F", vp = NULL, gp = gpar(), rm_intersect = 3L, ... )
x |
A numeric vector or unit object specifying x-locations of spline control points. |
y |
A numeric vector or unit object specifying y-locations of spline control points. |
width |
The width of the arrow, either a numeric single number or a unit. Note:
The arrow does not rely on |
clr |
The color of the arrow. This is the main color of the arrow and not the gradient color. |
default.units |
A string indicating the default units to use
if |
align_2_axis |
Indicates if the arrow should be vertically/horizontally aligned. This is useful for instance if the arrow attaches to a box. |
grdt_type |
The type of growth and gradient that is to be used, currently it only supports triangle (I'm considering adding bezier curves but currently I'm a little tired of coding) |
grdt_prop |
The proportion of the full length that should be a the gradient. The gradient consists of three things: (1) the central band, (2) the slimming of the central band, (3) the color shift into the arrow color. Note that the the slimming and color proportions can be overlapping. |
grdt_decrease_prop |
The proportion of the gradient that should be decreasing,
i.e. narrowing according to the |
grdt_clr_prop |
The proportion of the gradient that should be converging to the arrow color. |
grdt_line_width |
The width of the border line. If not specified it defaults to 5 % of the original width, note the gradient's width is thus 90 %. |
grdt_clr |
The color of the gradient. |
vp |
A Grid viewport object (or NULL). |
gp |
An object of class |
rm_intersect |
Set to 0 if you want to skip intersection removal, 1 only to remove left or 2 to only remove right. See details for why. @section Remove intersections: When the line is wide and the arrow has a narrow curve there may appear an empty triangle due to polygon cancellation (two polygons within the same are cancel out). This behaviour may be ugly and the function therefor tries to remove these. Note: it is expensive to check if there are the lineas may intersect at one point, remove those unexpected, and then adjust the line to the new situation so that the top and bottom lines match. It can also cause some unexpected behaviour why you may want to remove this feature if the arrow behaves erratically. |
... |
Passed on to |
A grob of gList
-type
The triangle section of the arrow is not currently included in the gradient.
library(grid) grid.newpage() arrowGrob <- bezierArrowGradient( x = c(.1, .3, .6, .9), y = c(0.2, 0.2, 0.9, 0.9) ) grid.draw(arrowGrob)
library(grid) grid.newpage() arrowGrob <- bezierArrowGradient( x = c(.1, .3, .6, .9), y = c(0.2, 0.2, 0.9, 0.9) ) grid.draw(arrowGrob)
This is an alternative to the grid packages bezierGrob
with the advantage that it allows you to draw an arrow with a specific
unit width. Note, it has only a end-arrow at this point.
bezierArrowSmpl( x = c(0.2, 0.7, 0.3, 0.9), y = c(0.2, 0.2, 0.9, 0.9), width = 0.05, clr = "#000000", default.units = "npc", arrow = list(), rez = 200, align_2_axis = TRUE, name = NULL, rm_intersect = 3L, gp = gpar(), vp = NULL )
bezierArrowSmpl( x = c(0.2, 0.7, 0.3, 0.9), y = c(0.2, 0.2, 0.9, 0.9), width = 0.05, clr = "#000000", default.units = "npc", arrow = list(), rez = 200, align_2_axis = TRUE, name = NULL, rm_intersect = 3L, gp = gpar(), vp = NULL )
x |
A numeric vector or unit object specifying x-locations of spline control points. |
y |
A numeric vector or unit object specifying y-locations of spline control points. |
width |
The width of the arrow, either a numeric single number or a unit. Note:
The arrow does not rely on |
clr |
The color of the arrow. |
default.units |
A string indicating the default units to use
if |
arrow |
This is a list with all the base (width) and the desired
length for the arrow. Note: This differs from the original
|
rez |
The resolution of the arrow. This specifies how many points to retrieve from the
|
align_2_axis |
Indicates if the arrow should be vertically/horizontally aligned. This is useful for instance if the arrow attaches to a box. |
name |
A character identifier. |
rm_intersect |
Set to 0 if you want to skip intersection removal, 1 only to remove left or 2 to only remove right. See details for why. @section Remove intersections: When the line is wide and the arrow has a narrow curve there may appear an empty triangle due to polygon cancellation (two polygons within the same are cancel out). This behaviour may be ugly and the function therefor tries to remove these. Note: it is expensive to check if there are the lineas may intersect at one point, remove those unexpected, and then adjust the line to the new situation so that the top and bottom lines match. It can also cause some unexpected behaviour why you may want to remove this feature if the arrow behaves erratically. |
gp |
An object of class |
vp |
A Grid viewport object (or NULL). |
grid::grob
A grob of the class polygonGrob with attributes that
correspond to the bezier points.
library(grid) grid.newpage() arrowGrob <- bezierArrowSmpl( x = c(.1, .3, .6, .9), y = c(0.2, 0.2, 0.9, 0.9) ) grid.draw(arrowGrob)
library(grid) grid.newpage() arrowGrob <- bezierArrowSmpl( x = c(.1, .3, .6, .9), y = c(0.2, 0.2, 0.9, 0.9) ) grid.draw(arrowGrob)
Creates a grob
box with text inside it.
boxGrob( label, y = unit(0.5, "npc"), x = unit(0.5, "npc"), width, height, just = "center", bjust = "center", txt_gp = getOption("boxGrobTxt", default = gpar(color = "black", cex = 1)), box_gp = getOption("boxGrob", default = gpar(fill = "white")), box_fn = roundrectGrob, name = NULL ) ## S3 method for class 'box' print(x, ...) ## S3 method for class 'box' plot(x, ...) ## S3 method for class 'box' widthDetails(x) ## S3 method for class 'box' heightDetails(x)
boxGrob( label, y = unit(0.5, "npc"), x = unit(0.5, "npc"), width, height, just = "center", bjust = "center", txt_gp = getOption("boxGrobTxt", default = gpar(color = "black", cex = 1)), box_gp = getOption("boxGrob", default = gpar(fill = "white")), box_fn = roundrectGrob, name = NULL ) ## S3 method for class 'box' print(x, ...) ## S3 method for class 'box' plot(x, ...) ## S3 method for class 'box' widthDetails(x) ## S3 method for class 'box' heightDetails(x)
label |
The label to print - should be a number, text or expression. |
y |
The y position to put the box at. Can be either in |
x |
The x position to put the box at. Can be either in |
width |
The box automatically adapts the size but you can force by specifying the width |
height |
The box automatically adapts the size but you can force by specifying the height |
just |
The justification for the text: left, center or right. |
bjust |
The justification for the box: left, center, right, top or bottom.
See the |
txt_gp |
The |
box_gp |
The |
box_fn |
Function to create box for the text. Parameters of 'x=0.5', 'y=0.5' and 'box_gp' will
be passed to this function and return a |
name |
a character identifier for the |
... |
Passed to |
A grob
plot
/print
To output the grob
objects to the plot either call plot
on the object or print
it. Note that R automatically prints any object that is outputted
to the console. The function calls in turn the grid.draw
function on the object.
Width and height functions address the coords
attribute for the corresponding information.
The widthDetails
and heightDetails
that provide information on an object.
Other flowchart components:
align
,
boxPropGrob()
,
connectGrob()
,
coords()
,
distance()
,
moveBox()
,
spread
library(grid) grid.newpage() boxGrob("My box")
library(grid) grid.newpage() boxGrob("My box")
Creates a grob box with text inside it and a color split in the horizontal axes that allow indicating different proportions. The box can also have a title that spanse the two color areas and that has its own background.
boxPropGrob( label, label_left, label_right, prop, y = unit(0.5, "npc"), x = unit(0.5, "npc"), width, height, just = "center", bjust = "center", txt_gp = getOption("boxPropGrobTxt", default = gpar(color = "black")), txt_left_gp = getOption("boxPropGrobLeftTxt", default = gpar(col = "black")), txt_right_gp = getOption("boxPropGrobRightTxt", default = gpar(col = "black")), box_left_gp = getOption("boxPropGrobLeft", default = gpar(fill = "#E6E8EF")), box_right_gp = getOption("boxPropGrobRight", default = gpar(fill = "#FFFDF6")), box_highlight_gp = getOption("boxPropGrobHighlight", default = gpar(fill = "#ffffff55", col = NA)), name = NULL )
boxPropGrob( label, label_left, label_right, prop, y = unit(0.5, "npc"), x = unit(0.5, "npc"), width, height, just = "center", bjust = "center", txt_gp = getOption("boxPropGrobTxt", default = gpar(color = "black")), txt_left_gp = getOption("boxPropGrobLeftTxt", default = gpar(col = "black")), txt_right_gp = getOption("boxPropGrobRightTxt", default = gpar(col = "black")), box_left_gp = getOption("boxPropGrobLeft", default = gpar(fill = "#E6E8EF")), box_right_gp = getOption("boxPropGrobRight", default = gpar(fill = "#FFFDF6")), box_highlight_gp = getOption("boxPropGrobHighlight", default = gpar(fill = "#ffffff55", col = NA)), name = NULL )
label |
The label to print - should be a number, text or expression. |
label_left |
The label for the left area |
label_right |
The label for the right area |
prop |
The proportion to split along |
y |
The y position to put the box at. Can be either in |
x |
The x position to put the box at. Can be either in |
width |
The box automatically adapts the size but you can force by specifying the width |
height |
The box automatically adapts the size but you can force by specifying the height |
just |
The justification for the text: left, center or right. |
bjust |
The justification for the box: left, center, right, top or bottom.
See the |
txt_gp |
The |
txt_left_gp |
The |
txt_right_gp |
The |
box_left_gp |
The |
box_right_gp |
The |
box_highlight_gp |
The |
name |
a character identifier for the |
A box grob
Other flowchart components:
align
,
boxGrob()
,
connectGrob()
,
coords()
,
distance()
,
moveBox()
,
spread
library(grid) grid.newpage() boxPropGrob("Main label", "Left text", "Right text", prop = .3)
library(grid) grid.newpage() boxPropGrob("Main label", "Left text", "Right text", prop = .3)
The function calculates new points according to the offset that lie to the left/right of the provided line.
calculateLinesAndArrow( x, y, offset, end_x = -1, end_y = -1, arrow_offset = -1, rm_intersect = 3L )
calculateLinesAndArrow( x, y, offset, end_x = -1, end_y = -1, arrow_offset = -1, rm_intersect = 3L )
x |
A numeric vector containing all the x-elements |
y |
A numeric vector containing all the y-elements |
offset |
The offset to add to the line, can be a vector if you want to use different offsets. |
end_x |
The x end of the line where the arrow occurrs (if < 0 arrow is skipped) |
end_y |
The y end of the line where the arrow occurrs (if < 0 arrow is skipped) |
arrow_offset |
The offset to add to the arrow section if any (if <= 0 arrow is skipped) |
rm_intersect |
Set to 0 if you want to skip intersection removal, 1 only to remove left or 2 to only remove right. See details for why. @section Remove intersections: When the line is wide and the arrow has a narrow curve there may appear an empty triangle due to polygon cancellation (two polygons within the same are cancel out). This behaviour may be ugly and the function therefor tries to remove these. Note: it is expensive to check if there are the lineas may intersect at one point, remove those unexpected, and then adjust the line to the new situation so that the top and bottom lines match. It can also cause some unexpected behaviour why you may want to remove this feature if the arrow behaves erratically. |
list(list(x = ..., y = ...))
Returns a list with the right/left
lines that in turn lists with x and y elements
The function creates a grob that links two boxes together. It looks for which side it should attach the arrow, e.g. if the start is on top of the bottom it should attach to the bottom edge of ther start box and then to the top at the end.
connectGrob( start, end, type = c("vertical", "horizontal", "L", "-", "Z", "N"), subelmnt = c("right", "left"), lty_gp = getOption("connectGrob", default = gpar(fill = "black")), arrow_obj = getOption("connectGrobArrow", default = arrow(ends = "last", type = "closed")) ) ## S3 method for class 'connect_boxes' print(x, ...) ## S3 method for class 'connect_boxes' plot(x, ...)
connectGrob( start, end, type = c("vertical", "horizontal", "L", "-", "Z", "N"), subelmnt = c("right", "left"), lty_gp = getOption("connectGrob", default = gpar(fill = "black")), arrow_obj = getOption("connectGrobArrow", default = arrow(ends = "last", type = "closed")) ) ## S3 method for class 'connect_boxes' print(x, ...) ## S3 method for class 'connect_boxes' plot(x, ...)
start |
The start box |
end |
The end box |
type |
How the boxes are stacked. The |
subelmnt |
If we have a split box we can specify the right/left x as the connector point. |
lty_gp |
The |
arrow_obj |
The arrow spec according to |
x |
The grob to print/plot |
... |
Passed to |
The exact positions of the line is stored at the attr(..., "line")
.
If you want to draw your own custom line all you need to do is check which
attr(my_line, "line")$x
and attr(my_line, "line")$y
you want
to attach to and then create your own custom linesGrob
.
grob with an arrow
Other flowchart components:
align
,
boxGrob()
,
boxPropGrob()
,
coords()
,
distance()
,
moveBox()
,
spread
library(grid) grid.newpage() # Initiate the boxes that we want to connect start <- boxGrob("Top", x = .5, y = .8) end <- boxGrob("Bottom", x = .5, y = .2) side <- boxPropGrob("Side", "Left", "Right", prop = .3, x = .2, y = .8) sub_side_left <- boxGrob("Left", x = attr(side, "coords")$left_x, y = .5) sub_side_right <- boxGrob("Right", x = attr(side, "coords")$right_x, y = .5) exclude <- boxGrob("Exclude:\n - Too sick\n - Prev. surgery", x = .8, y = .5, just = "left") # Connect the boxes and print/plot them connectGrob(start, end, "vertical") connectGrob(start, side, "horizontal") connectGrob(side, sub_side_left, "v", "l") connectGrob(side, sub_side_right, "v", "r") connectGrob(start, exclude, "L") # Print the grobs start end side exclude sub_side_left sub_side_right
library(grid) grid.newpage() # Initiate the boxes that we want to connect start <- boxGrob("Top", x = .5, y = .8) end <- boxGrob("Bottom", x = .5, y = .2) side <- boxPropGrob("Side", "Left", "Right", prop = .3, x = .2, y = .8) sub_side_left <- boxGrob("Left", x = attr(side, "coords")$left_x, y = .5) sub_side_right <- boxGrob("Right", x = attr(side, "coords")$right_x, y = .5) exclude <- boxGrob("Exclude:\n - Too sick\n - Prev. surgery", x = .8, y = .5, just = "left") # Connect the boxes and print/plot them connectGrob(start, end, "vertical") connectGrob(start, side, "horizontal") connectGrob(side, sub_side_left, "v", "l") connectGrob(side, sub_side_right, "v", "r") connectGrob(start, exclude, "L") # Print the grobs start end side exclude sub_side_left sub_side_right
The variable is supposed to be directly compatible with
table
(..., useNA = show_missing). It throws an error
if not compatible. It is mostly useful for custom describe functions.
convertShowMissing(show_missing)
convertShowMissing(show_missing)
show_missing |
Boolean or "no", "ifany", "always" |
Deprecated: This function will be deprecated as all functions now use the useNA style in order to comply with standard R naming.
string
Retrieves the boxes "coords"
attribute.
coords(box)
coords(box)
box |
The |
A list with the coordinates
Other flowchart components:
align
,
boxGrob()
,
boxPropGrob()
,
connectGrob()
,
distance()
,
moveBox()
,
spread
box <- boxGrob("A test box") coords(box)
box <- boxGrob("A test box") coords(box)
Skips the attributes that the to object already has to avoid overwriting dim and other important attributes
copyAllNewAttributes(from, to, attr2skip = c(), attr2force = c())
copyAllNewAttributes(from, to, attr2skip = c(), attr2force = c())
from |
The from object |
to |
The to object |
attr2skip |
An optional lists of attributes that you may want to avoid having copied |
attr2force |
An optional lists of attributes that you may want to force copy even if they already exist in the new object |
object
The to
argument object
a <- "test" attr(a, 'wow') <- 1000 b <- a b <- copyAllNewAttributes(a, b) print(attr(b, 'wow'))
a <- "test" attr(a, 'wow') <- 1000 b <- a b <- copyAllNewAttributes(a, b) print(attr(b, 'wow'))
This function calculates the amount of missing per row for describeMean
,
describeMedian
and custom description functions.
It will return invisibly when no missing values are present.
descGetMissing( x, html = TRUE, number_first = TRUE, percentage_sign = TRUE, language = "en", useNA.digits = 1, ... )
descGetMissing( x, html = TRUE, number_first = TRUE, percentage_sign = TRUE, language = "en", useNA.digits = 1, ... )
x |
The variable that you want the statistics for |
html |
If HTML compatible output should be used. If |
number_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). This is only used together with the useNA variable. |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. Note, this is only used when combined with the missing information. |
language |
The ISO-639-1 two-letter code for the language of
interest. Currently only english is distinguished from the ISO
format using a ',' as the separator in the |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
... |
Passed on to |
vector
A vector with the missing estimate
A function that returns a description of proportions in a factor that contains the number of times a level occurs and the percentage
describeFactors( x, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, horizontal_proportions, percentage_sign = TRUE, language = "en", ... )
describeFactors( x, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, horizontal_proportions, percentage_sign = TRUE, language = "en", ... )
x |
The variable that you want the statistics for |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
number_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). This is only used together with the useNA variable. |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
horizontal_proportions |
Is only active if useNA since this is
the only case of a proportion among continuous variables. This is default NULL and indicates
that the proportions are to be interpreted in a vertical manner.
If we want the data to be horizontal, i.e. the total should be shown
and then how these differ in the different groups then supply the
function with the total number in each group, i.e. if done in a by
manner as in |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. Note, this is only used when combined with the missing information. |
language |
The ISO-639-1 two-letter code for the language of
interest. Currently only english is distinguished from the ISO
format using a ',' as the separator in the |
... |
Passed on to |
A string formatted for printing either latex by HTML
Other descriptive functions:
describeMean()
,
describeMedian()
,
describeProp()
,
getDescriptionStatsBy()
,
getPvalWilcox()
set.seed(1) describeFactors(sample(50, x = c("A", "B", "C"), replace = TRUE)) n <- 500 my_var <- factor(sample(size = n, x = c("A", "B", "C", NA), replace = TRUE)) my_exp <- rbinom(n = n, size = 1, prob = 0.2) total <- table(my_var, useNA = "ifany") by(my_var, INDICES = my_exp, FUN = describeFactors, useNA = "ifany", horizontal_proportions = total )
set.seed(1) describeFactors(sample(50, x = c("A", "B", "C"), replace = TRUE)) n <- 500 my_var <- factor(sample(size = n, x = c("A", "B", "C", NA), replace = TRUE)) my_exp <- rbinom(n = n, size = 1, prob = 0.2) total <- table(my_var, useNA = "ifany") by(my_var, INDICES = my_exp, FUN = describeFactors, useNA = "ifany", horizontal_proportions = total )
A function that returns a description of a continuous variable using the mean together with the standard deviation. The standard deviation is used as it is "industry standard" to use mean with standard deviation and not because it's the only option.
describeMean( x, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, percentage_sign = TRUE, plusmin_str, language = "en", ... )
describeMean( x, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, percentage_sign = TRUE, plusmin_str, language = "en", ... )
x |
The variable that you want the statistics for |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
number_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). This is only used together with the useNA variable. |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. Note, this is only used when combined with the missing information. |
plusmin_str |
Provide if you want anything other than the plus minus sign suited for the given output format. |
language |
The ISO-639-1 two-letter code for the language of
interest. Currently only english is distinguished from the ISO
format using a ',' as the separator in the |
... |
Passed on to |
string
Returns a string formatted for either LaTeX or HTML
Other descriptive functions:
describeFactors()
,
describeMedian()
,
describeProp()
,
getDescriptionStatsBy()
,
getPvalWilcox()
describeMean(1:10) describeMean(c(1:10, NA), useNA = "always") describeMean(c(1:10, NA), useNA = "no")
describeMean(1:10) describeMean(c(1:10, NA), useNA = "always") describeMean(c(1:10, NA), useNA = "no")
A function that returns a description median that contains the interquartile range or the full range
describeMedian( x, iqr = TRUE, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, percentage_sign = TRUE, language = "en", ... )
describeMedian( x, iqr = TRUE, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, percentage_sign = TRUE, language = "en", ... )
x |
The variable that you want the statistics for |
iqr |
If interquartile range should be used |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
number_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). This is only used together with the useNA variable. |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. Note, this is only used when combined with the missing information. |
language |
The ISO-639-1 two-letter code for the language of
interest. Currently only english is distinguished from the ISO
format using a ',' as the separator in the |
... |
Passed on to |
string
A string formatted for either LaTeX or HTML
Other descriptive functions:
describeFactors()
,
describeMean()
,
describeProp()
,
getDescriptionStatsBy()
,
getPvalWilcox()
describeMedian(1:10) describeMedian(c(1:10, NA), useNA = "ifany")
describeMedian(1:10) describeMedian(c(1:10, NA), useNA = "ifany")
A function that returns a description proportion that contains the number and the percentage
describeProp( x, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, default_ref = NULL, percentage_sign = TRUE, language = "en", ... )
describeProp( x, html = TRUE, digits = 1, digits.nonzero = NA, number_first = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, default_ref = NULL, percentage_sign = TRUE, language = "en", ... )
x |
The variable that you want the statistics for |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
number_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). This is only used together with the useNA variable. |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
default_ref |
The default reference, either first, the level name or a number within the levels. If left out it defaults to the first value. |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. Note, this is only used when combined with the missing information. |
language |
The ISO-639-1 two-letter code for the language of
interest. Currently only english is distinguished from the ISO
format using a ',' as the separator in the |
... |
Passed on to |
string
A string formatted for either LaTeX or HTML
Other descriptive functions:
describeFactors()
,
describeMean()
,
describeMedian()
,
getDescriptionStatsBy()
,
getPvalWilcox()
describeProp(factor(sample(50, x = c("A", "B", NA), replace = TRUE)))
describeProp(factor(sample(50, x = c("A", "B", NA), replace = TRUE)))
Retrieves the distance between two boxes as absolute "mm"
units. The function also
accepts coords
objects as well as a unit
or a numeric input.
distance( box1, box2, type = c("vertical", "horizontal", "euclidean"), half = FALSE, center = FALSE ) ## S3 method for class 'Gmisc_unit' print(x, ...)
distance( box1, box2, type = c("vertical", "horizontal", "euclidean"), half = FALSE, center = FALSE ) ## S3 method for class 'Gmisc_unit' print(x, ...)
box1 |
The first |
box2 |
The second object to calculate the distance to. Same type as for |
type |
Whether we should retrieve the vertical, horizontal or euclidean distance |
half |
If set to true it returns half the distance. This is convenient when positioning boxes between each other. |
center |
Calculate the distance from the center of each object |
x |
A unit with from the |
... |
Passed on to print |
A unit
in "mm"
with an absolute value. The attribute
positive
indicates the direction of the value, i.e. if it is TRUE
the
distance was calculated from the first to the second, otherwise it is FALSE
.
For euclidean
distance the positive
attribute is NA
. There is also the
from
and to
attributes that has the coordinates that were used for the
calculations, for euclidean
distance this is NA
.
Other flowchart components:
align
,
boxGrob()
,
boxPropGrob()
,
connectGrob()
,
coords()
,
moveBox()
,
spread
box1 <- boxGrob("A test box", y = .8) box2 <- boxGrob("Another test box", y = .2) distance(box1, box2, "v")
box1 <- boxGrob("A test box", y = .8) box2 <- boxGrob("Another test box", y = .2) distance(box1, box2, "v")
html_document
, facilitates easier porting to docxThis function adds the option of having adaptations needed for seemless integration with
MS Word for importing html-documents in the .docx-format. The advantage of html documents is
the ability to create advanced formatting frequently needed in medical publications
and that is available in the htmlTable
function. You can view
the series
for more details regarding how to achieve
fast-track-publishing (ftp) together with knitr.
docx_document( ..., self_contained = FALSE, mathjax = NULL, theme = NULL, highlight = NULL, css = "rmarkdown/docx.css", h1_style = "margin: 24pt 0pt 0pt 0pt;", other_h_style = "margin: 10pt 0pt 0pt 0pt;", remove_scripts = TRUE, force_captions = FALSE, css_max_width )
docx_document( ..., self_contained = FALSE, mathjax = NULL, theme = NULL, highlight = NULL, css = "rmarkdown/docx.css", h1_style = "margin: 24pt 0pt 0pt 0pt;", other_h_style = "margin: 10pt 0pt 0pt 0pt;", remove_scripts = TRUE, force_captions = FALSE, css_max_width )
... |
Passed onto |
self_contained |
Overrides the default |
mathjax |
The advanced mathjax does not work with with Word/LibreOffice. |
theme |
No theme should be used for the output as the custom CSS should take care of everything. |
highlight |
By default turn off highlighting as scripts are difficult to import. This does though work somewhat OK when copy-pasting from the web-browser. |
css |
The CSS if other that the default within the package |
h1_style |
You can choose any css style formatting here that you
want to be applied to all h1 elements. Note: this is only applied
if LibreOffice_adapt is |
other_h_style |
This is the formatting applied to any other
h elements not included to the first. Note: this is only applied
if LibreOffice_adapt is |
remove_scripts |
|
force_captions |
Since out.width and out.height remove the option of having captions this allows a workaround through some processing via the XML-package |
css_max_width |
The max width of the body element. Defaults to "40em" if not specified. Any CSS-compliant width format works. |
If you want to get equations into Word the currently best way is to
use the word_document
format.
R Markdown output format to pass to render
Max Gordon
# Possible yaml configuration at the top of the Rmd doc ## Not run: --- title: "Test" author: "Max Gordon" output: Gmisc::docx_document --- ## End(Not run)
# Possible yaml configuration at the top of the Rmd doc ## Not run: --- title: "Test" author: "Max Gordon" output: Gmisc::docx_document --- ## End(Not run)
do.call
The do.call
can be somewhat slow, especially when working
with large objects. This function is based upon the suggestions from Hadley
Wickham on the R mailing list.
Also thanks to Tommy at StackOverflow for
suggesting
how to handle double and triple colon operators, ::
, further
enhancing the function.
fastDoCall(what, args, quote = FALSE, envir = parent.frame())
fastDoCall(what, args, quote = FALSE, envir = parent.frame())
what |
either a function or a non-empty character string naming the function to be called. |
args |
a list of arguments to the function call. The
|
quote |
a logical value indicating whether to quote the arguments. |
envir |
an environment within which to evaluate the call. This
will be most useful if |
While the function attempts to do most of what do.call
can it has limitations. It can currently not parse the example code from the
original function: do.call(paste, list(as.name("A"), as.name("B")), quote = TRUE)
and the functionality of quote
has not been thoroughly tested.
fastDoCall("complex", list(imaginary = 1:3)) ## if we already have a list (e.g. a data frame) ## we need c() to add further arguments tmp <- expand.grid(letters[1:2], 1:3, c("+", "-")) fastDoCall("paste", c(tmp, sep = "")) ## examples of where objects will be found. A <- 2 f <- function(x) print(x^2) env <- new.env() assign("A", 10, envir = env) assign("f", f, envir = env) f <- function(x) print(x) f(A) # 2 fastDoCall("f", list(A)) # 2 fastDoCall("f", list(A), envir = env) # 4 fastDoCall(f, list(A), envir = env) # 2 fastDoCall("f", list(quote(A)), envir = env) # 100 fastDoCall(f, list(quote(A)), envir = env) # 10 fastDoCall("f", list(as.name("A")), envir = env) # 100 eval(call("f", A)) # 2 eval(call("f", quote(A))) # 2 eval(call("f", A), envir = env) # 4 eval(call("f", quote(A)), envir = env) # 100
fastDoCall("complex", list(imaginary = 1:3)) ## if we already have a list (e.g. a data frame) ## we need c() to add further arguments tmp <- expand.grid(letters[1:2], 1:3, c("+", "-")) fastDoCall("paste", c(tmp, sep = "")) ## examples of where objects will be found. A <- 2 f <- function(x) print(x^2) env <- new.env() assign("A", 10, envir = env) assign("f", f, envir = env) f <- function(x) print(x) f(A) # 2 fastDoCall("f", list(A)) # 2 fastDoCall("f", list(A), envir = env) # 4 fastDoCall(f, list(A), envir = env) # 2 fastDoCall("f", list(quote(A)), envir = env) # 100 fastDoCall(f, list(quote(A)), envir = env) # 10 fastDoCall("f", list(as.name("A")), envir = env) # 100 eval(call("f", A)) # 2 eval(call("f", quote(A))) # 2 eval(call("f", A), envir = env) # 4 eval(call("f", quote(A)), envir = env) # 100
The function relies on options("fig_caption_no")
in order to keep track of the last number. If
you want to force the caption function to skip
captions while still using it in the knitr fig.cap option
then simply set options(fig_caption_no = FALSE)
figCapNo( str, roman = getOption("fig_caption_no_roman", default = FALSE), sprintf_str = getOption("fig_caption_no_sprintf", default = "Fig. %s: %s") )
figCapNo( str, roman = getOption("fig_caption_no_roman", default = FALSE), sprintf_str = getOption("fig_caption_no_sprintf", default = "Fig. %s: %s") )
str |
The string that is to be prepended with string |
roman |
Whether or not to use roman numbers instead
of Arabic. Can also be set through |
sprintf_str |
An |
Other figure caption functions:
figCapNoLast()
,
figCapNoNext()
## Not run: ```{r, fig.cap = pigCapNo("My nice plot")} plot(1:10 + rnorm(10), 1:10) ``` ## End(Not run) org_opts <- options(fig_caption_no = 2, fig_caption_no_sprintf = "Figure %s: %s") figCapNo("A plot with caption number = 3") org_opts <- options(fig_caption_no = TRUE) figCapNo("A plot with caption number = 1") # Use default setting options(fig_caption_no_sprintf = NULL) figCapNo("A plot with caption number = 2") # Return the original settings options(org_opts)
## Not run: ```{r, fig.cap = pigCapNo("My nice plot")} plot(1:10 + rnorm(10), 1:10) ``` ## End(Not run) org_opts <- options(fig_caption_no = 2, fig_caption_no_sprintf = "Figure %s: %s") figCapNo("A plot with caption number = 3") org_opts <- options(fig_caption_no = TRUE) figCapNo("A plot with caption number = 1") # Use default setting options(fig_caption_no_sprintf = NULL) figCapNo("A plot with caption number = 2") # Return the original settings options(org_opts)
The function relies on options("fig_caption_no")
in order to keep track of the last number.
figCapNoLast(roman = getOption("fig_caption_no_roman", FALSE))
figCapNoLast(roman = getOption("fig_caption_no_roman", FALSE))
roman |
Whether or not to use roman numbers instead
of Arabic. Can also be set through |
Other figure caption functions:
figCapNoNext()
,
figCapNo()
org_opts <- options(fig_caption_no = 1) figCapNoLast() options(org_opts)
org_opts <- options(fig_caption_no = 1) figCapNoLast() options(org_opts)
The function relies on options("fig_caption_no")
in order to keep track of the last number.
figCapNoNext(roman = getOption("fig_caption_no_roman", default = FALSE))
figCapNoNext(roman = getOption("fig_caption_no_roman", default = FALSE))
roman |
Whether or not to use roman numbers instead
of Arabic. Can also be set through |
Other figure caption functions:
figCapNoLast()
,
figCapNo()
org_opts <- options(fig_caption_no = 1) figCapNoNext() options(org_opts)
org_opts <- options(fig_caption_no = 1) figCapNoNext() options(org_opts)
Gets the bezier points adjusted for an arrow
getBezierAdj4Arrw(x, y, arrow_length, length_out = 100)
getBezierAdj4Arrw(x, y, arrow_length, length_out = 100)
x |
The x start and end points |
y |
The spline control points |
arrow_length |
The desired length of the arrow |
length_out |
Increases the resolution for the final bezier points, i.e. generating more fine-grained intervals |
list
A function that returns a description statistic that can be used for creating a publication "table 1" when you want it by groups. The function identifies if the variable is a continuous, binary or a factored variable. The format is inspired by NEJM, Lancet & BMJ.
getDescriptionStatsBy( x, ..., by, digits = 1, digits.nonzero = NA, html = TRUE, numbers_first = TRUE, statistics = FALSE, statistics.sig_lim = 10^-4, statistics.two_dec_lim = 10^-2, statistics.suppress_warnings = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, continuous_fn = describeMean, prop_fn = describeProp, factor_fn = describeFactors, show_all_values = FALSE, hrzl_prop = FALSE, add_total_col, total_col_show_perc = TRUE, use_units = FALSE, units_column_name = "Units", default_ref = NULL, NEJMstyle = FALSE, percentage_sign = TRUE, header_count = NULL, missing_value = "-", names_of_missing = NULL ) ## S3 method for class 'Gmisc_getDescriptionStatsBy' htmlTable(x, ...) ## S3 method for class 'Gmisc_getDescriptionStatsBy' print(x, ...) ## S3 method for class 'Gmisc_getDescriptionStatsBy' knit_print(x, ...) ## S3 method for class 'Gmisc_getDescriptionStatsBy' length(x)
getDescriptionStatsBy( x, ..., by, digits = 1, digits.nonzero = NA, html = TRUE, numbers_first = TRUE, statistics = FALSE, statistics.sig_lim = 10^-4, statistics.two_dec_lim = 10^-2, statistics.suppress_warnings = TRUE, useNA = c("ifany", "no", "always"), useNA.digits = digits, continuous_fn = describeMean, prop_fn = describeProp, factor_fn = describeFactors, show_all_values = FALSE, hrzl_prop = FALSE, add_total_col, total_col_show_perc = TRUE, use_units = FALSE, units_column_name = "Units", default_ref = NULL, NEJMstyle = FALSE, percentage_sign = TRUE, header_count = NULL, missing_value = "-", names_of_missing = NULL ) ## S3 method for class 'Gmisc_getDescriptionStatsBy' htmlTable(x, ...) ## S3 method for class 'Gmisc_getDescriptionStatsBy' print(x, ...) ## S3 method for class 'Gmisc_getDescriptionStatsBy' knit_print(x, ...) ## S3 method for class 'Gmisc_getDescriptionStatsBy' length(x)
x |
If a data.frame it will be used as the data source for the variables in the |
... |
The variables that you want you statistic for. In the print all thes parameters are passed on as [htmlTable::htmlTable] arguments. |
by |
The variable that you want to split into different columns |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
html |
If HTML compatible output should be used. If |
numbers_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). |
statistics |
Add statistics, fisher test for proportions and Wilcoxon for continuous variables. See details below for more customization. |
statistics.sig_lim |
The significance limit for < sign, i.e. p-value 0.0000312 should be < 0.0001 with the default setting. |
statistics.two_dec_lim |
The limit for showing two decimals. E.g.
the p-value may be 0.056 and we may want to keep the two decimals in order
to emphasize the proximity to the all-mighty 0.05 p-value and set this to
|
statistics.suppress_warnings |
Hide warnings from the statistics function. |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
continuous_fn |
The method to describe continuous variables. The
default is |
prop_fn |
The method used to describe proportions, see |
factor_fn |
The method used to describe factors, see |
show_all_values |
Show all values in proportions. For factors with only two values
it is most sane to only show one option as the other one will just be a complement
to the first, i.e. we want to convey a proportion. For instance sex - if you know
gender then automatically you know the distribution of the other sex as it's 100 % - other %.
To choose which one you want to show then set the |
hrzl_prop |
This is default FALSE and indicates that the proportions are to be interpreted in a vertical manner. If we want the data to be horizontal, i.e. the total should be shown and then how these differ in the different groups then set this to TRUE. |
add_total_col |
This adds a total column to the resulting table. You can also specify if you want the total column "first" or "last" in the column order. |
total_col_show_perc |
This is by default true but if requested the percentages are suppressed as this sometimes may be confusing. |
use_units |
If the Hmisc package's units() function has been employed
it may be interesting to have a column at the far right that indicates the
unit measurement. If this column is specified then the total column will
appear before the units (if specified as last). You can also set the value to
|
units_column_name |
The name of the units column. Used if use_units = TRUE |
default_ref |
The default reference when dealing with proportions. When using 'dplyr' syntax ('tidyselect') you can specify a named vector/list for each column name. |
NEJMstyle |
Adds - no (%) at the end to proportions |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. |
header_count |
Set to |
missing_value |
Value that is substituted for empty cells. Defaults to "-" |
names_of_missing |
Optional character vector containing the names of returned statistics,
in case all returned values for a given |
Returns matrix
if a single value was provided, otherwise a list
of matrices with the class "Gmisc_getDescriptionStatsBy"
.
You can specify what function that you want for statistic by providing a function
that takes two arguments x
and by
and returns a p-value. There are
a few functions already prepared for this see getPvalAnova
,
getPvalChiSq
getPvalFisher
getPvalKruskal
getPvalWilcox
.
The default functions used are getPvalFisher
and getPvalWilcox
(unless the by
argument has more than three unique levels where it defaults to getPvalAnova
).
If you want the function to select functions depending on the type of input
you can provide a list with the names 'continuous'
, 'proportion'
, 'factor'
and
the function will choose accordingly. If you fail to define a certain category
it will default to the above.
You can also use a custom function that returns a string with the attribute 'colname'
set that will be appended to the results instead of the p-value column.
Other descriptive functions:
describeFactors()
,
describeMean()
,
describeMedian()
,
describeProp()
,
getPvalWilcox()
library(magrittr) library(dplyr) library(htmlTable) data(mtcars) mtcars %<>% mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")), vs = factor(vs, levels = 0:1, labels = c("V-shaped", "straight")), drat_prop = drat > median(drat), drat_prop = factor(drat_prop, levels = c(FALSE, TRUE), labels = c("High ratio", "Low ratio")), carb_prop = carb > 2, carb_prop = factor(carb_prop, levels = c(FALSE, TRUE), labels = c("≤ 2", "> 2")), across(c(gear, carb, cyl), factor)) # A simple bare-bone example mtcars %>% getDescriptionStatsBy(`Miles per gallon` = mpg, Weight = wt, `Carborators ≤ 2` = carb_prop, by = am) %>% htmlTable(caption = "Basic continuous stats from the mtcars dataset") invisible(readline(prompt = "Press [enter] to continue")) # For labeling & units we use set_column_labels/set_column_unit that use # the Hmisc package annotation functions mtcars %<>% set_column_labels(am = "Transmission", mpg = "Gas", wt = "Weight", gear = "Gears", disp = "Displacement", vs = "Engine type", drat_prop = "Rear axel ratio", carb_prop = "Carburetors") %>% set_column_units(mpg = "Miles/(US) gallon", wt = "10<sup>3</sup> lbs", disp = "cu.in.") mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, show_all_values = TRUE) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # Using the default parameter we can mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, default_ref = c(drat_prop = "Low ratio", carb_prop = "> 2")) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # We can also use lists tll <- list() tll[["Gear (3 to 5)"]] <- getDescriptionStatsBy(mtcars$gear, mtcars$am) tll <- c(tll, list(getDescriptionStatsBy(mtcars$disp, mtcars$am))) mergeDesc(tll, htmlTable_args = list(caption = "Factored variables")) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) tl_no_units <- list() tl_no_units[["Gas (mile/gallons)"]] <- getDescriptionStatsBy(mtcars$mpg, mtcars$am, header_count = TRUE) tl_no_units[["Weight (10<sup>3</sup> kg)"]] <- getDescriptionStatsBy(mtcars$wt, mtcars$am, header_count = TRUE) mergeDesc(tl_no_units, tll) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) # Other settings mtcars$mpg[sample(1:NROW(mtcars), size = 5)] <- NA getDescriptionStatsBy(mtcars$mpg, mtcars$am, statistics = TRUE) invisible(readline(prompt = "Press [enter] to continue")) # Do the horizontal version getDescriptionStatsBy(mtcars$gear, mtcars$am, statistics = TRUE, hrzl_prop = TRUE) invisible(readline(prompt = "Press [enter] to continue")) mtcars$wt_with_missing <- mtcars$wt mtcars$wt_with_missing[sample(1:NROW(mtcars), size = 8)] <- NA getDescriptionStatsBy(mtcars$wt_with_missing, mtcars$am, statistics = TRUE, hrzl_prop = TRUE, total_col_show_perc = FALSE) invisible(readline(prompt = "Press [enter] to continue")) ## Not run: ## There is also a LaTeX wrapper tll <- list( getDescriptionStatsBy(mtcars$gear, mtcars$am), getDescriptionStatsBy(mtcars$col, mtcars$am)) latex(mergeDesc(tll), caption = "Factored variables", file = "") ## End(Not run)
library(magrittr) library(dplyr) library(htmlTable) data(mtcars) mtcars %<>% mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")), vs = factor(vs, levels = 0:1, labels = c("V-shaped", "straight")), drat_prop = drat > median(drat), drat_prop = factor(drat_prop, levels = c(FALSE, TRUE), labels = c("High ratio", "Low ratio")), carb_prop = carb > 2, carb_prop = factor(carb_prop, levels = c(FALSE, TRUE), labels = c("≤ 2", "> 2")), across(c(gear, carb, cyl), factor)) # A simple bare-bone example mtcars %>% getDescriptionStatsBy(`Miles per gallon` = mpg, Weight = wt, `Carborators ≤ 2` = carb_prop, by = am) %>% htmlTable(caption = "Basic continuous stats from the mtcars dataset") invisible(readline(prompt = "Press [enter] to continue")) # For labeling & units we use set_column_labels/set_column_unit that use # the Hmisc package annotation functions mtcars %<>% set_column_labels(am = "Transmission", mpg = "Gas", wt = "Weight", gear = "Gears", disp = "Displacement", vs = "Engine type", drat_prop = "Rear axel ratio", carb_prop = "Carburetors") %>% set_column_units(mpg = "Miles/(US) gallon", wt = "10<sup>3</sup> lbs", disp = "cu.in.") mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, show_all_values = TRUE) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # Using the default parameter we can mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, default_ref = c(drat_prop = "Low ratio", carb_prop = "> 2")) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # We can also use lists tll <- list() tll[["Gear (3 to 5)"]] <- getDescriptionStatsBy(mtcars$gear, mtcars$am) tll <- c(tll, list(getDescriptionStatsBy(mtcars$disp, mtcars$am))) mergeDesc(tll, htmlTable_args = list(caption = "Factored variables")) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) tl_no_units <- list() tl_no_units[["Gas (mile/gallons)"]] <- getDescriptionStatsBy(mtcars$mpg, mtcars$am, header_count = TRUE) tl_no_units[["Weight (10<sup>3</sup> kg)"]] <- getDescriptionStatsBy(mtcars$wt, mtcars$am, header_count = TRUE) mergeDesc(tl_no_units, tll) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) # Other settings mtcars$mpg[sample(1:NROW(mtcars), size = 5)] <- NA getDescriptionStatsBy(mtcars$mpg, mtcars$am, statistics = TRUE) invisible(readline(prompt = "Press [enter] to continue")) # Do the horizontal version getDescriptionStatsBy(mtcars$gear, mtcars$am, statistics = TRUE, hrzl_prop = TRUE) invisible(readline(prompt = "Press [enter] to continue")) mtcars$wt_with_missing <- mtcars$wt mtcars$wt_with_missing[sample(1:NROW(mtcars), size = 8)] <- NA getDescriptionStatsBy(mtcars$wt_with_missing, mtcars$am, statistics = TRUE, hrzl_prop = TRUE, total_col_show_perc = FALSE) invisible(readline(prompt = "Press [enter] to continue")) ## Not run: ## There is also a LaTeX wrapper tll <- list( getDescriptionStatsBy(mtcars$gear, mtcars$am), getDescriptionStatsBy(mtcars$col, mtcars$am)) latex(mergeDesc(tll), caption = "Factored variables", file = "") ## End(Not run)
getDescriptionStatsBy
These functions are the base functions for getting the description p-values. You can provide your own functions but all functions should take two arguments and return a p-value (numeric, non-formatted)
getPvalWilcox(x, by) getPvalAnova(x, by) getPvalFisher(x, by) getPvalChiSq(x, by) getPvalKruskal(x, by)
getPvalWilcox(x, by) getPvalAnova(x, by) getPvalFisher(x, by) getPvalChiSq(x, by) getPvalKruskal(x, by)
x |
The main variable of interest |
by |
The variable for the stratification |
numeric
Returns the p-value from that particular test
Performs a two-sample two-sided Wilcoxon test (also known as the Mann-Whitney test),
see wilcox.test
.
Performs a standard Analysis of Variance model
through anova(lm(x ~ by))
Performs Fisher's exact test through the fisher.test
.
Performs a standard Chi-Squares analysis
through chisq.test
Performs a Kruskal-Wallis rank sum test
through kruskal.test
Other descriptive functions:
describeFactors()
,
describeMean()
,
describeMedian()
,
describeProp()
,
getDescriptionStatsBy()
set.seed(123) getPvalFisher( sample(letters[1:3], size = 100, replace = TRUE), sample(LETTERS[1:3], size = 100, replace = TRUE) ) getPvalWilcox( rnorm(100), sample(LETTERS[1:2], size = 100, replace = TRUE) )
set.seed(123) getPvalFisher( sample(letters[1:3], size = 100, replace = TRUE), sample(LETTERS[1:3], size = 100, replace = TRUE) ) getPvalWilcox( rnorm(100), sample(LETTERS[1:2], size = 100, replace = TRUE) )
This function is inspired by Jeff Leeks Data Analysis course where
he suggests that one way to use the svd
is to look
at the most influential rows for first columns in the V matrix.
getSvdMostInfluential( mtrx, quantile, similarity_threshold, plot_selection = TRUE, plot_threshold = 0.05, varnames = NULL )
getSvdMostInfluential( mtrx, quantile, similarity_threshold, plot_selection = TRUE, plot_threshold = 0.05, varnames = NULL )
mtrx |
A matrix or data frame with the variables. Note: if it contains missing
variables make sure to impute prior to this function as the |
quantile |
The SVD D-matrix gives an estimate for the amount that is explained. This parameter is used for selecting the columns that have that quantile of explanation. |
similarity_threshold |
A quantile for how close other variables have to be in value to maximum contributor of that particular column. If you only want the maximum value then set this value to 1. |
plot_selection |
As this is all about variable exploring it is often interesting to see how the variables were distributed among the vectors |
plot_threshold |
The threshold of the plotted bars, measured as percent explained by the D-matrix. By default it is set to 0.05. |
varnames |
A vector with alternative names to the colnames |
This function expands on that idea and adds the option of choosing more than just the most contributing variable for each row. For instance two variables may have a major impact on a certain component where the second variable has 95 important in that particular component it makes sense to include it in the selection.
It is of course useful when you have many continuous variables and you want to determine a subgroup to look at, i.e. finding the needle in the haystack.
Returns a list with vector with the column numbers that were picked in the "most_influential" variable and the svd caluclation in the "svd"
org_par <- par(ask = TRUE) set.seed(1345) # Simulate data with a pattern dataMatrix <- matrix(rnorm(15 * 160), ncol = 15) colnames(dataMatrix) <- c( paste("Pos.3:", 1:3, sep = " #"), paste("Neg.Decr:", 4:6, sep = " #"), paste("No pattern:", 7:8, sep = " #"), paste("Pos.Incr:", 9:11, sep = " #"), paste("No pattern:", 12:15, sep = " #")) for (i in 1:nrow(dataMatrix)) { # flip a coin coinFlip1 <- rbinom(1, size = 1, prob = 0.5) coinFlip2 <- rbinom(1, size = 1, prob = 0.5) coinFlip3 <- rbinom(1, size = 1, prob = 0.5) # if coin is heads add a common pattern to that row if (coinFlip1) { cols <- grep("Pos.3", colnames(dataMatrix)) dataMatrix[i, cols] <- dataMatrix[i, cols] + 3 } if (coinFlip2) { cols <- grep("Neg.Decr", colnames(dataMatrix)) dataMatrix[i, cols] <- dataMatrix[i, cols] - seq(from = 5, to = 15, length.out = length(cols)) } if (coinFlip3) { cols <- grep("Pos.Incr", colnames(dataMatrix)) dataMatrix[i, cols] <- dataMatrix[i, cols] + seq(from = 3, to = 15, length.out = length(cols)) } } # Illustrate data heatmap(dataMatrix, Colv = NA, Rowv = NA, margins = c(7, 2), labRow = "") svd_out <- svd(scale(dataMatrix)) library(lattice) b_clr <- c("steelblue", "darkred") key <- simpleKey( rectangles = TRUE, space = "top", points = FALSE, text = c("Positive", "Negative") ) key$rectangles$col <- b_clr b1 <- barchart(as.table(svd_out$v[, 1]), main = "First column", horizontal = FALSE, col = ifelse(svd_out$v[, 1] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) b2 <- barchart(as.table(svd_out$v[, 2]), main = "Second column", horizontal = FALSE, col = ifelse(svd_out$v[, 2] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) b3 <- barchart(as.table(svd_out$v[, 3]), main = "Third column", horizontal = FALSE, col = ifelse(svd_out$v[, 3] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) b4 <- barchart(as.table(svd_out$v[, 4]), main = "Fourth column", horizontal = FALSE, col = ifelse(svd_out$v[, 4] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) # Note that the fourth has the no pattern columns as the # chosen pattern, probably partly because of the previous # patterns already had been identified print(b1, position = c(0, 0.5, .5, 1), more = TRUE) print(b2, position = c(0.5, 0.5, 1, 1), more = TRUE) print(b3, position = c(0, 0, .5, .5), more = TRUE) print(b4, position = c(0.5, 0, 1, .5)) # Let's look at how well the SVD identifies # the most influential columns getSvdMostInfluential(dataMatrix, quantile = .8, similarity_threshold = .9, plot_threshold = .05, plot_selection = TRUE) par(org_par)
org_par <- par(ask = TRUE) set.seed(1345) # Simulate data with a pattern dataMatrix <- matrix(rnorm(15 * 160), ncol = 15) colnames(dataMatrix) <- c( paste("Pos.3:", 1:3, sep = " #"), paste("Neg.Decr:", 4:6, sep = " #"), paste("No pattern:", 7:8, sep = " #"), paste("Pos.Incr:", 9:11, sep = " #"), paste("No pattern:", 12:15, sep = " #")) for (i in 1:nrow(dataMatrix)) { # flip a coin coinFlip1 <- rbinom(1, size = 1, prob = 0.5) coinFlip2 <- rbinom(1, size = 1, prob = 0.5) coinFlip3 <- rbinom(1, size = 1, prob = 0.5) # if coin is heads add a common pattern to that row if (coinFlip1) { cols <- grep("Pos.3", colnames(dataMatrix)) dataMatrix[i, cols] <- dataMatrix[i, cols] + 3 } if (coinFlip2) { cols <- grep("Neg.Decr", colnames(dataMatrix)) dataMatrix[i, cols] <- dataMatrix[i, cols] - seq(from = 5, to = 15, length.out = length(cols)) } if (coinFlip3) { cols <- grep("Pos.Incr", colnames(dataMatrix)) dataMatrix[i, cols] <- dataMatrix[i, cols] + seq(from = 3, to = 15, length.out = length(cols)) } } # Illustrate data heatmap(dataMatrix, Colv = NA, Rowv = NA, margins = c(7, 2), labRow = "") svd_out <- svd(scale(dataMatrix)) library(lattice) b_clr <- c("steelblue", "darkred") key <- simpleKey( rectangles = TRUE, space = "top", points = FALSE, text = c("Positive", "Negative") ) key$rectangles$col <- b_clr b1 <- barchart(as.table(svd_out$v[, 1]), main = "First column", horizontal = FALSE, col = ifelse(svd_out$v[, 1] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) b2 <- barchart(as.table(svd_out$v[, 2]), main = "Second column", horizontal = FALSE, col = ifelse(svd_out$v[, 2] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) b3 <- barchart(as.table(svd_out$v[, 3]), main = "Third column", horizontal = FALSE, col = ifelse(svd_out$v[, 3] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) b4 <- barchart(as.table(svd_out$v[, 4]), main = "Fourth column", horizontal = FALSE, col = ifelse(svd_out$v[, 4] > 0, b_clr[1], b_clr[2] ), ylab = "Impact value", scales = list(x = list(rot = 55, labels = colnames(dataMatrix), cex = 1.1)), key = key ) # Note that the fourth has the no pattern columns as the # chosen pattern, probably partly because of the previous # patterns already had been identified print(b1, position = c(0, 0.5, .5, 1), more = TRUE) print(b2, position = c(0.5, 0.5, 1, 1), more = TRUE) print(b3, position = c(0, 0, .5, .5), more = TRUE) print(b4, position = c(0.5, 0, 1, .5)) # Let's look at how well the SVD identifies # the most influential columns getSvdMostInfluential(dataMatrix, quantile = .8, similarity_threshold = .9, plot_threshold = .05, plot_selection = TRUE) par(org_par)
This is a general form of bezier line that can be used for cubic, quadratic, and more advanced Bézier lines.
gnrlBezierPoints(ctrl_points, length_out = 100L)
gnrlBezierPoints(ctrl_points, length_out = 100L)
ctrl_points |
The ctrl_points for the bezier control points. This should
either be a |
length_out |
The length of the return points, i.e. how fine detailed the points should be. |
library(grid) grid.newpage() l <- gnrlBezierPoints(data.frame(x = c(.1, -.1, .7, 1, 1, 0.1), y = c(.9, 0, 1, .8, .4, .1)), length_out = 100) grid.lines(l[,1], l[,2], gp=gpar(col="#550000", lwd = 4)) out_sizes <- 4:20 clrs <- colorRampPalette(c("orange", "darkblue"))(length(out_sizes)) for (i in out_sizes){ l <- gnrlBezierPoints(data.frame(x = c(.1, -.1, .7, 1, 1, 0.1), y = c(.9, 0, 1, .8, .4, .1)), length_out = i) grid.lines(l[,1], l[,2], gp=gpar(col=clrs[which(i == out_sizes)])) }
library(grid) grid.newpage() l <- gnrlBezierPoints(data.frame(x = c(.1, -.1, .7, 1, 1, 0.1), y = c(.9, 0, 1, .8, .4, .1)), length_out = 100) grid.lines(l[,1], l[,2], gp=gpar(col="#550000", lwd = 4)) out_sizes <- 4:20 clrs <- colorRampPalette(c("orange", "darkblue"))(length(out_sizes)) for (i in out_sizes){ l <- gnrlBezierPoints(data.frame(x = c(.1, -.1, .7, 1, 1, 0.1), y = c(.9, 0, 1, .8, .4, .1)), length_out = i) grid.lines(l[,1], l[,2], gp=gpar(col=clrs[which(i == out_sizes)])) }
has
in JavaScriptThis is a handy function for checking if item exist in a nested structure
has(sourceList, path)
has(sourceList, path)
sourceList |
The |
path |
A string that can be separated by [,] or ., the string |
Returns a boolean.
Other lodash similar functions:
retrieve()
has(list(a = list(b = 1)), "a.b")
has(list(a = list(b = 1)), "a.b")
Inserts a row and keeps the attributes copyAllNewAttributes
insertRowAndKeepAttr(m, r, v = NA, rName = "")
insertRowAndKeepAttr(m, r, v = NA, rName = "")
m |
matrix |
r |
row number where the new row should be inserted |
v |
optional values for the new row |
rName |
optional character string: the name of the new row. |
matrix
Returns a matrix with one more row than the provided matrix m
Max Gordon, Arne Henningsen
test <- matrix(1:4, ncol = 2) attr(test, "wow") <- 1000 test <- insertRowAndKeepAttr(test, 2) print(attr(test, "wow"))
test <- matrix(1:4, ncol = 2) attr(test, "wow") <- 1000 test <- insertRowAndKeepAttr(test, 2) print(attr(test, "wow"))
htmlTable
from a listBy putting all the output from the getDescriptionStatsBy
into a list, naming each element that we want in an rgroup
we can
automatically merge everything and create an object ready for the
htmlTable
.
mergeDesc(..., htmlTable_args = list())
mergeDesc(..., htmlTable_args = list())
... |
One or more elements coming from |
htmlTable_args |
Any arguments that should be passed to
|
matrix
Returns a matrix object of class descList
rgroup
valueThe value for the rgroup
is by default the name of the list element. If you have
passed a list without a name for that particular element or if you have passed a
matrix it will look for a label set by the Hmisc::label
function.
For those elements that have only one row no rgroup
is set, and the naming sequence
is the same as above but with an additional rownames
if the previous
two turn out empty. All this behavior is exemplified in the example.
The rgroup
value can be overridden by simply specifying a custom rgroup
when
calling the htmlTable
function.
colnames
of the matrixThe function chooses the colnames
from the first element in
the tlist
.
library(magrittr) library(dplyr) library(htmlTable) data(mtcars) mtcars %<>% mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")), vs = factor(vs, levels = 0:1, labels = c("V-shaped", "straight")), drat_prop = drat > median(drat), drat_prop = factor(drat_prop, levels = c(FALSE, TRUE), labels = c("High ratio", "Low ratio")), carb_prop = carb > 2, carb_prop = factor(carb_prop, levels = c(FALSE, TRUE), labels = c("≤ 2", "> 2")), across(c(gear, carb, cyl), factor)) # A simple bare-bone example mtcars %>% getDescriptionStatsBy(`Miles per gallon` = mpg, Weight = wt, `Carborators ≤ 2` = carb_prop, by = am) %>% htmlTable(caption = "Basic continuous stats from the mtcars dataset") invisible(readline(prompt = "Press [enter] to continue")) # For labeling & units we use set_column_labels/set_column_unit that use # the Hmisc package annotation functions mtcars %<>% set_column_labels(am = "Transmission", mpg = "Gas", wt = "Weight", gear = "Gears", disp = "Displacement", vs = "Engine type", drat_prop = "Rear axel ratio", carb_prop = "Carburetors") %>% set_column_units(mpg = "Miles/(US) gallon", wt = "10<sup>3</sup> lbs", disp = "cu.in.") mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, show_all_values = TRUE) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # Using the default parameter we can mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, default_ref = c(drat_prop = "Low ratio", carb_prop = "> 2")) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # We can also use lists tll <- list() tll[["Gear (3 to 5)"]] <- getDescriptionStatsBy(mtcars$gear, mtcars$am) tll <- c(tll, list(getDescriptionStatsBy(mtcars$disp, mtcars$am))) mergeDesc(tll, htmlTable_args = list(caption = "Factored variables")) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) tl_no_units <- list() tl_no_units[["Gas (mile/gallons)"]] <- getDescriptionStatsBy(mtcars$mpg, mtcars$am, header_count = TRUE) tl_no_units[["Weight (10<sup>3</sup> kg)"]] <- getDescriptionStatsBy(mtcars$wt, mtcars$am, header_count = TRUE) mergeDesc(tl_no_units, tll) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) # Other settings mtcars$mpg[sample(1:NROW(mtcars), size = 5)] <- NA getDescriptionStatsBy(mtcars$mpg, mtcars$am, statistics = TRUE) invisible(readline(prompt = "Press [enter] to continue")) # Do the horizontal version getDescriptionStatsBy(mtcars$gear, mtcars$am, statistics = TRUE, hrzl_prop = TRUE) invisible(readline(prompt = "Press [enter] to continue")) mtcars$wt_with_missing <- mtcars$wt mtcars$wt_with_missing[sample(1:NROW(mtcars), size = 8)] <- NA getDescriptionStatsBy(mtcars$wt_with_missing, mtcars$am, statistics = TRUE, hrzl_prop = TRUE, total_col_show_perc = FALSE) invisible(readline(prompt = "Press [enter] to continue")) ## Not run: ## There is also a LaTeX wrapper tll <- list( getDescriptionStatsBy(mtcars$gear, mtcars$am), getDescriptionStatsBy(mtcars$col, mtcars$am)) latex(mergeDesc(tll), caption = "Factored variables", file = "") ## End(Not run)
library(magrittr) library(dplyr) library(htmlTable) data(mtcars) mtcars %<>% mutate(am = factor(am, levels = 0:1, labels = c("Automatic", "Manual")), vs = factor(vs, levels = 0:1, labels = c("V-shaped", "straight")), drat_prop = drat > median(drat), drat_prop = factor(drat_prop, levels = c(FALSE, TRUE), labels = c("High ratio", "Low ratio")), carb_prop = carb > 2, carb_prop = factor(carb_prop, levels = c(FALSE, TRUE), labels = c("≤ 2", "> 2")), across(c(gear, carb, cyl), factor)) # A simple bare-bone example mtcars %>% getDescriptionStatsBy(`Miles per gallon` = mpg, Weight = wt, `Carborators ≤ 2` = carb_prop, by = am) %>% htmlTable(caption = "Basic continuous stats from the mtcars dataset") invisible(readline(prompt = "Press [enter] to continue")) # For labeling & units we use set_column_labels/set_column_unit that use # the Hmisc package annotation functions mtcars %<>% set_column_labels(am = "Transmission", mpg = "Gas", wt = "Weight", gear = "Gears", disp = "Displacement", vs = "Engine type", drat_prop = "Rear axel ratio", carb_prop = "Carburetors") %>% set_column_units(mpg = "Miles/(US) gallon", wt = "10<sup>3</sup> lbs", disp = "cu.in.") mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, show_all_values = TRUE) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # Using the default parameter we can mtcars %>% getDescriptionStatsBy(mpg, wt, `Gear†` = gear, drat_prop, carb_prop, vs, by = am, header_count = TRUE, use_units = TRUE, default_ref = c(drat_prop = "Low ratio", carb_prop = "> 2")) %>% addHtmlTableStyle(pos.caption = "bottom") %>% htmlTable(caption = "Stats from the mtcars dataset", tfoot = "† Number of forward gears") invisible(readline(prompt = "Press [enter] to continue")) # We can also use lists tll <- list() tll[["Gear (3 to 5)"]] <- getDescriptionStatsBy(mtcars$gear, mtcars$am) tll <- c(tll, list(getDescriptionStatsBy(mtcars$disp, mtcars$am))) mergeDesc(tll, htmlTable_args = list(caption = "Factored variables")) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) tl_no_units <- list() tl_no_units[["Gas (mile/gallons)"]] <- getDescriptionStatsBy(mtcars$mpg, mtcars$am, header_count = TRUE) tl_no_units[["Weight (10<sup>3</sup> kg)"]] <- getDescriptionStatsBy(mtcars$wt, mtcars$am, header_count = TRUE) mergeDesc(tl_no_units, tll) %>% htmlTable::addHtmlTableStyle(css.rgroup = "") invisible(readline(prompt = "Press [enter] to continue")) # Other settings mtcars$mpg[sample(1:NROW(mtcars), size = 5)] <- NA getDescriptionStatsBy(mtcars$mpg, mtcars$am, statistics = TRUE) invisible(readline(prompt = "Press [enter] to continue")) # Do the horizontal version getDescriptionStatsBy(mtcars$gear, mtcars$am, statistics = TRUE, hrzl_prop = TRUE) invisible(readline(prompt = "Press [enter] to continue")) mtcars$wt_with_missing <- mtcars$wt mtcars$wt_with_missing[sample(1:NROW(mtcars), size = 8)] <- NA getDescriptionStatsBy(mtcars$wt_with_missing, mtcars$am, statistics = TRUE, hrzl_prop = TRUE, total_col_show_perc = FALSE) invisible(readline(prompt = "Press [enter] to continue")) ## Not run: ## There is also a LaTeX wrapper tll <- list( getDescriptionStatsBy(mtcars$gear, mtcars$am), getDescriptionStatsBy(mtcars$col, mtcars$am)) latex(mergeDesc(tll), caption = "Factored variables", file = "") ## End(Not run)
The merge allows for a recursive component where the lists are compared on the subelement. If one does not contain that element it will get NA in for those parameters.
mergeLists( ..., lapplyOutput = NULL, sortNames = getOption("Gmisc.mergeList.sort", default = TRUE) )
mergeLists( ..., lapplyOutput = NULL, sortNames = getOption("Gmisc.mergeList.sort", default = TRUE) )
... |
Any number of lists that you want to merge |
lapplyOutput |
The |
sortNames |
Set to false if you don't want the names to be sorted. This can also be done via the option 'Gmisc.mergeList.sort'. |
Returns a list with all the given lists.
v1 <- list("a" = c(1, 2), b = "test 1", sublist = list(one = 20:21, two = 21:22)) v2 <- list("a" = c(3, 4), b = "test 2", sublist = list(one = 10:11, two = 11:12, three = 1:2)) mergeLists(v1, v2)
v1 <- list("a" = c(1, 2), b = "test 1", sublist = list(one = 20:21, two = 21:22)) v2 <- list("a" = c(3, 4), b = "test 2", sublist = list(one = 10:11, two = 11:12, three = 1:2)) mergeLists(v1, v2)
Moves a boxGrob
/boxPropGrob
by modifying it's viewport
.
This can be useful if you want to create a series of boxes whose position are relative
to each other and depend on each box's width/height.
moveBox( element, x = NULL, y = NULL, space = c("absolute", "relative"), just = NULL )
moveBox( element, x = NULL, y = NULL, space = c("absolute", "relative"), just = NULL )
element |
A |
x |
A |
y |
A |
space |
We can provide |
just |
The justification of an argument as used by |
The element with updated
Other flowchart components:
align
,
boxGrob()
,
boxPropGrob()
,
connectGrob()
,
coords()
,
distance()
,
spread
library(grid) grid.newpage() box <- boxGrob("A simple box", x = .5, y = .8) moveBox(box, x = -.2, space = "relative")
library(grid) grid.newpage() box <- boxGrob("A simple box", x = .5, y = .8) moveBox(box, x = -.2, space = "relative")
This function joins strings into a valid path. It is a simple version of python's
os.path.join
and fixes simple problems such as having/not having trailing /
in each section.
pathJoin(...)
pathJoin(...)
... |
A set of strings to join. Each may be a single string or a vector. If you provide vectors they can either be all of the same length or where there are two lengths where one is equal to 1. |
string
A string with the merged path
pathJoin("my_base_path/helpers", "superfunction.R") # 'my_base_path/helpers/superfunction.R' base_dir <- "/home/tester/images" out <- data.frame(filename = c("file1.png", "file2.png", "file3.png")) |> dplyr::mutate(full_path = pathJoin(base_dir, filename))
pathJoin("my_base_path/helpers", "superfunction.R") # 'my_base_path/helpers/superfunction.R' base_dir <- "/home/tester/images" out <- data.frame(filename = c("file1.png", "file2.png", "file3.png")) |> dplyr::mutate(full_path = pathJoin(base_dir, filename))
Add a p-value column to the results
prAddDescStats( results, x, by, statistics, statistics.suppress_warnings, statistics.sig_lim, statistics.two_dec_lim, html )
prAddDescStats( results, x, by, statistics, statistics.suppress_warnings, statistics.sig_lim, statistics.two_dec_lim, html )
results |
The results that we want to add the column to |
x |
If a data.frame it will be used as the data source for the variables in the |
by |
The variable that you want to split into different columns |
statistics |
Add statistics, fisher test for proportions and Wilcoxon for continuous variables. See details below for more customization. |
statistics.suppress_warnings |
Hide warnings from the statistics function. |
statistics.sig_lim |
The significance limit for < sign, i.e. p-value 0.0000312 should be < 0.0001 with the default setting. |
statistics.two_dec_lim |
The limit for showing two decimals. E.g.
the p-value may be 0.056 and we may want to keep the two decimals in order
to emphasize the proximity to the all-mighty 0.05 p-value and set this to
|
html |
If HTML compatible output should be used. If |
results with added column
Add a units column to the results
prAddDescUnitColumn(results, x, use_units, units_column_name)
prAddDescUnitColumn(results, x, use_units, units_column_name)
results |
The results that we want to add the column to |
x |
If a data.frame it will be used as the data source for the variables in the |
use_units |
If the Hmisc package's units() function has been employed
it may be interesting to have a column at the far right that indicates the
unit measurement. If this column is specified then the total column will
appear before the units (if specified as last). You can also set the value to
|
units_column_name |
The name of the units column. Used if use_units = TRUE |
results with added column
Helper for [getDescriptionStatsBy] that fixes empty values in matrix so that they are compatible with the matrix
prAddEmptyVals(t, missing_value)
prAddEmptyVals(t, missing_value)
t |
Output from [prNumericDescs], [prPropDescs], or [prFactorDescs]. |
missing_value |
Value that is substituted for empty cells. Defaults to "-" |
A fixed list
Add a total column to the results
prAddTotalDescColumn( results, x, by, numbers_first, total_col_show_perc, show_all_values, useNA, useNA.digits, html, digits, continuous_fn, factor_fn, prop_fn, percentage_sign, default_ref, header_count = NULL, add_total_col )
prAddTotalDescColumn( results, x, by, numbers_first, total_col_show_perc, show_all_values, useNA, useNA.digits, html, digits, continuous_fn, factor_fn, prop_fn, percentage_sign, default_ref, header_count = NULL, add_total_col )
results |
The results that we want to add the column to |
x |
If a data.frame it will be used as the data source for the variables in the |
by |
The variable that you want to split into different columns |
numbers_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). |
total_col_show_perc |
This is by default true but if requested the percentages are suppressed as this sometimes may be confusing. |
show_all_values |
Show all values in proportions. For factors with only two values
it is most sane to only show one option as the other one will just be a complement
to the first, i.e. we want to convey a proportion. For instance sex - if you know
gender then automatically you know the distribution of the other sex as it's 100 % - other %.
To choose which one you want to show then set the |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
continuous_fn |
The method to describe continuous variables. The
default is |
factor_fn |
The method used to describe factors, see |
prop_fn |
The method used to describe proportions, see |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. |
default_ref |
The default reference when dealing with proportions. When using 'dplyr' syntax ('tidyselect') you can specify a named vector/list for each column name. |
header_count |
Set to |
add_total_col |
This adds a total column to the resulting table. You can also specify if you want the total column "first" or "last" in the column order. |
results with added column
boxPropGrob
Add a sub-label to boxPropGrob
prBuildSubLabel(label, prop, txt_gp, side = c("left", "right"))
prBuildSubLabel(label, prop, txt_gp, side = c("left", "right"))
label |
The text of the label |
prop |
The proportion |
txt_gp |
The style as defined by |
side |
The side that the label belongs to |
A textGrob
with he additional attributes width
and height
.
Sometimes we have an object that can be either a box,
a coprdinate, a unit or a numerical value and all we
want is a list
of coordinates that we can use for
calculating distance, alignment and other things.
prConvert2Coords(obj)
prConvert2Coords(obj)
obj |
A |
A list
with all the points that coords
returns
Creates coordinates for box
prCreateBoxCoordinates(viewport_data, extra_coordinate_functions = NULL)
prCreateBoxCoordinates(viewport_data, extra_coordinate_functions = NULL)
viewport_data |
The arguments that will be used for generating the |
extra_coordinate_functions |
A list with named functions if we want additional parameters |
list
of class coords
Helper to [getDescriptionStatsBy()]
prFactorDescs( x, by, factor_fn, hrzl_prop, html, digits, digits.nonzero, numbers_first, useNA, useNA.digits, percentage_sign, missing_value, names_of_missing )
prFactorDescs( x, by, factor_fn, hrzl_prop, html, digits, digits.nonzero, numbers_first, useNA, useNA.digits, percentage_sign, missing_value, names_of_missing )
x |
If a data.frame it will be used as the data source for the variables in the |
by |
The variable that you want to split into different columns |
factor_fn |
The method used to describe factors, see |
hrzl_prop |
This is default FALSE and indicates that the proportions are to be interpreted in a vertical manner. If we want the data to be horizontal, i.e. the total should be shown and then how these differ in the different groups then set this to TRUE. |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
numbers_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. |
missing_value |
Value that is substituted for empty cells. Defaults to "-" |
names_of_missing |
Optional character vector containing the names of returned statistics,
in case all returned values for a given |
A [base::by] list
Helper for [getDescriptionStatsBy] that fixes row names
prFixDescRownames(results, t, name)
prFixDescRownames(results, t, name)
results |
A matrix with the results |
t |
The [base::by()] output |
name |
Name if row names are missing or the results is a single row |
The results with fixed names
Helper for [getDescriptionStatsBy] that retrieves the basic header names.
prGetDescHeader(by, html, header_count, already_table_format = FALSE)
prGetDescHeader(by, html, header_count, already_table_format = FALSE)
by |
The variable that you want to split into different columns |
html |
If HTML compatible output should be used. If |
header_count |
Set to |
already_table_format |
Just a boolean as we use this in the total column |
A vector with basic headers
Outputs a list of boxes as produced by either the spread or align functions for boxGrobs.
## S3 method for class 'Gmisc_list_of_boxes' print(x, ...)
## S3 method for class 'Gmisc_list_of_boxes' print(x, ...)
x |
A list of a set of ['boxGrob']/['boxPropGrob'] to plot |
... |
Ignored argument |
Helper to [getDescriptionStatsBy()]
prNumericDescs( x, by, hrzl_prop, continuous_fn, html, digits, digits.nonzero, numbers_first, useNA, useNA.digits, percentage_sign, missing_value, names_of_missing )
prNumericDescs( x, by, hrzl_prop, continuous_fn, html, digits, digits.nonzero, numbers_first, useNA, useNA.digits, percentage_sign, missing_value, names_of_missing )
x |
If a data.frame it will be used as the data source for the variables in the |
by |
The variable that you want to split into different columns |
hrzl_prop |
This is default FALSE and indicates that the proportions are to be interpreted in a vertical manner. If we want the data to be horizontal, i.e. the total should be shown and then how these differ in the different groups then set this to TRUE. |
continuous_fn |
The method to describe continuous variables. The
default is |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
numbers_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. |
missing_value |
Value that is substituted for empty cells. Defaults to "-" |
names_of_missing |
Optional character vector containing the names of returned statistics,
in case all returned values for a given |
A [base::by] list
The function collapses a vector into an output useful when throwing errors, e.g. 1:3 becomes '1', '2', '3'
prPasteVec(x)
prPasteVec(x)
x |
The vector |
Helper to [getDescriptionStatsBy()]
prPropDescs( x, by, name, default_ref, prop_fn, html, digits, digits.nonzero, numbers_first, useNA, useNA.digits, percentage_sign, missing_value, names_of_missing, NEJMstyle )
prPropDescs( x, by, name, default_ref, prop_fn, html, digits, digits.nonzero, numbers_first, useNA, useNA.digits, percentage_sign, missing_value, names_of_missing, NEJMstyle )
x |
If a data.frame it will be used as the data source for the variables in the |
by |
The variable that you want to split into different columns |
name |
The name of the row |
default_ref |
The default reference when dealing with proportions. When using 'dplyr' syntax ('tidyselect') you can specify a named vector/list for each column name. |
prop_fn |
The method used to describe proportions, see |
html |
If HTML compatible output should be used. If |
digits |
The number of decimals used |
digits.nonzero |
The number of decimals used for values that are close to zero |
numbers_first |
If the number should be given or if the percentage should be presented first. The second is encapsulated in parentheses (). |
useNA |
This indicates if missing should be added as a separate
row below all other. See |
useNA.digits |
The number of digits to use for the
missing percentage, defaults to the overall |
percentage_sign |
If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable. |
missing_value |
Value that is substituted for empty cells. Defaults to "-" |
names_of_missing |
Optional character vector containing the names of returned statistics,
in case all returned values for a given |
NEJMstyle |
Adds - no (%) at the end to proportions |
A [base::by] list
get
in JavaScriptThis is a handy function for retrieving items deep in a nested structure without causing error if not found
retrieve(sourceList, path, default = NA)
retrieve(sourceList, path, default = NA)
sourceList |
The |
path |
A string that can be separated by [,] or ., the string |
default |
The value to return if the element isn't found |
Returns a sub-element from sourceList
or the default
value.
Other lodash similar functions:
has()
source <- list(a = list(b = 1, `odd.name` = 'I hate . in names', c(1,2,3))) retrieve(source, "a.b") retrieve(source, "a.b.1") retrieve(source, "a.odd\\.name") retrieve(source, "a.not_in_list")
source <- list(a = list(b = 1, `odd.name` = 'I hate . in names', c(1,2,3))) retrieve(source, "a.b") retrieve(source, "a.b.1") retrieve(source, "a.odd\\.name") retrieve(source, "a.not_in_list")
Add label attribute using 'dplyr' syntax using the [Hmisc::label()]
set_column_labels(x, ...)
set_column_labels(x, ...)
x |
The data frame that we want to label |
... |
Variable names with their intended label, e.g. 'mpg = "Miles per gallon"'. |
The original data.frame
Other Hmisc helpers:
set_column_units()
library(magrittr) data(mtcars) mtcars_with_labels <- mtcars %>% set_column_labels(mpg = "Gas", cyl = "Cylinders", hp = "Strength") Hmisc::label(mtcars_with_labels$mpg)
library(magrittr) data(mtcars) mtcars_with_labels <- mtcars %>% set_column_labels(mpg = "Gas", cyl = "Cylinders", hp = "Strength") Hmisc::label(mtcars_with_labels$mpg)
Add label attribute using 'dplyr' syntax using the [Hmisc::unit()]
set_column_units(x, ...)
set_column_units(x, ...)
x |
The data frame that we want to define units on |
... |
Variable names with their intended unit, e.g. 'hp = "Hp"'. |
The original data.frame
Other Hmisc helpers:
set_column_labels()
library(magrittr) data(mtcars) mtcars_with_units <- mtcars %>% set_column_units(wt = "1000 lbs") Hmisc::units(mtcars_with_units$wt)
library(magrittr) data(mtcars) mtcars_with_units <- mtcars %>% set_column_units(wt = "1000 lbs") Hmisc::units(mtcars_with_units$wt)
Spreads a set of boxGrob
/boxPropGrob
in either horizontal or vertical direction.
spreadVertical(..., .from = NULL, .to = NULL, .type = c("between", "center")) spreadHorizontal(..., .from = NULL, .to = NULL, .type = c("between", "center"))
spreadVertical(..., .from = NULL, .to = NULL, .type = c("between", "center")) spreadHorizontal(..., .from = NULL, .to = NULL, .type = c("between", "center"))
... |
A set of boxes to spread. Can also be a |
.from |
A box that the spread originates from. If left empty the entire |
.to |
A box that the spread ends at. If left empty the entire |
.type |
If |
list
with the boxes that have been spread
Other flowchart components:
align
,
boxGrob()
,
boxPropGrob()
,
connectGrob()
,
coords()
,
distance()
,
moveBox()
library(grid) grid.newpage() box1 <- boxGrob("B1", x = .2, y = .8) box2 <- boxGrob("B2\n\n\neach\nbox\neven\nspace\nbetween", x = .2, y = .8) box3 <- boxGrob("B3", x = .2, y = .8) box4 <- boxGrob("B4", x = .2, y = .8) box5 <- boxGrob("B5", x = .2, y = .8) spread_boxes <- spreadVertical(box1, box2, box3, a = box4, box5, .type = "between") for (b in spread_boxes) { print(b) } box1 <- boxGrob("B1\n\nanother group\ncenter oriented", x = .6, y = .8) box2 <- boxGrob("B2", x = .6, y = .8) box3 <- boxGrob("B3", x = .6, y = .8) box4 <- boxGrob("B4", x = .6, y = .8) box5 <- boxGrob("B5", x = .6, y = .8) spread_boxes <- spreadVertical(box1, box2, box3, a = box4, box5, .type = "center") for (b in spread_boxes) { print(b) }
library(grid) grid.newpage() box1 <- boxGrob("B1", x = .2, y = .8) box2 <- boxGrob("B2\n\n\neach\nbox\neven\nspace\nbetween", x = .2, y = .8) box3 <- boxGrob("B3", x = .2, y = .8) box4 <- boxGrob("B4", x = .2, y = .8) box5 <- boxGrob("B5", x = .2, y = .8) spread_boxes <- spreadVertical(box1, box2, box3, a = box4, box5, .type = "between") for (b in spread_boxes) { print(b) } box1 <- boxGrob("B1\n\nanother group\ncenter oriented", x = .6, y = .8) box2 <- boxGrob("B2", x = .6, y = .8) box3 <- boxGrob("B3", x = .6, y = .8) box4 <- boxGrob("B4", x = .6, y = .8) box5 <- boxGrob("B5", x = .6, y = .8) spread_boxes <- spreadVertical(box1, box2, box3, a = box4, box5, .type = "center") for (b in spread_boxes) { print(b) }
When adding a time span text we often don't want to write 3 jun - 10 jun but shorten it to 3 - 10 jun while retaining month and year info only if the span crosses between months or years.
time2spanTxt( times, day_month_glue_txt = getOption("Gmisc_time2spanTxt_day_month", default = "{mday(time)} {month(time, label = TRUE)}"), full_year_format = getOption("Gmisc_time2spanTxt_full_year", default = "{mday(time)} {month(time, label = TRUE)} {year(time)}"), start_stop_glue_txt = getOption("Gmisc_time2spanTxt_template", default = "{start} to {stop}") )
time2spanTxt( times, day_month_glue_txt = getOption("Gmisc_time2spanTxt_day_month", default = "{mday(time)} {month(time, label = TRUE)}"), full_year_format = getOption("Gmisc_time2spanTxt_full_year", default = "{mday(time)} {month(time, label = TRUE)} {year(time)}"), start_stop_glue_txt = getOption("Gmisc_time2spanTxt_template", default = "{start} to {stop}") )
times |
The dates or POSIX timestamps to used for time span |
day_month_glue_txt |
The |
full_year_format |
The |
start_stop_glue_txt |
The string used in the |
There are options that can be set using the options
:
Gmisc_time2spanTxt_day_month
The date with day + month as formatted by glue
where the time is passed as time
.
Gmisc_time2spanTxt_full_year
The full date with day + month + year as formatted by glue
where the time is passed as time
.
Gmisc_time2spanTxt_template
The merge of the stop & start elements using glue
.
string
A string describing the time span
time2spanTxt(as.POSIXct(c("2020-01-02", "2020-03-01", NA))) # 2 jan to 1 mar
time2spanTxt(as.POSIXct(c("2020-01-02", "2020-03-01", NA))) # 2 jan to 1 mar
This class simplifies the creating of transition plots. It also allows for advanced multi-column transitions.
Transition plots are a type of Sankey diagrams. These are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity. See Wikipedia for details.
id
Optional id. The render uses named viewports that require a unique id if multiple transition plots
are combined. In order to avoid having overlapping graphs we need to generate a unique id for each viewport
and thus this variable exists. If left empty it will create a counter that is stored in the options
("Gmisc.transitionClassCounter"
) and each viewport will have the name preceded with tc_[0-9]+
. Set this
if you intend to use seekViewport
.
transitions
This is a >= 3 dimensional array with the transitions. Should not be directly accessed.
box_width
The box width
box_txt
The texts of each box
box_label
Box labels
box_label_pos
The label's positions, either "top"/"bottom"
box_label_cex
The size of the box labels
box_cex
The font-size multiplier for the text within the boxes
arrow_type
The type of arrow to use, defaults to "gradient", but can also be "simple".
The corresponding functions are bezierArrowGradient
, and
bezierArrowSmpl
. Note The bezierGrob
has been deprecated
as it is no longer faster than the bezier arrows and there is a difference in design.
arrow_clr
The arrow color
arrow_rez
The resolution of the arrow
vertical_space
The space between the boxes
fill_clr
The box fill color
clr_bar
Shows a color bar if there are proportions. Can be "none"
, "top"
, "bottom"
clr_bar_clrs
Extracts the colors for the color bar from the fill_clr
if none is provided
clr_bar_cex
The size of the ticks in the color bar
clr_bar_subspace
If little or no difference exists at the low/high proportions of the spectrum then it can be of interest to focus the color change to the center leaving the tails constant
clr_bar_labels
The labels of the color bars. Defaults to the dim names for the proportions.
txt_clr
The text color within the boxes
txt_gpar
Similar to 'txt_clr' but for more advanced styling with fontfamily (see [grid::gpar()]). *Note* that col & cex are overridden.
title
The plot title if any
title_cex
The font-size multiplier for the title
skip_shadows
Skip the shadow effect on the boxes
mar
The margins for the plot.
min_lwd
The minimum line width that is still shown. The pixels will most likely not have the same fine resolution as the data and you therefore may want to hide lines that are smaller than a certain amount.
max_lwd
The maximum line width to show
lwd_prop_type
The line can either be proportional to the "set"
of transitions
(group of two box columns), to "all"
transitions, or to each "box"
. It defaults
to "all"
.
data
Internal storage variable. Should not be accessed directly.
addBoxStyle(fill, txt, gpar)
Adds colors or extends existing one so that they match the transition matrix. The fill corresponds to the fill_clr and txt corresponds to the txt_clr. If the colors are missing and the transitions consist of only two columns the default colors will be used. If the matrix is being extended and these values are missing the values from the previous last column will be used for the default columns.
addTransitions(mtrx, label, txt, fill_clr, txt_clr, txt_gpar)
Add a transition matrix. The input has to be a numerical matrix between 2 and 3 dimensions.
If you don't provide the txt field the box' text field will be deduced from the transition matrix'
dimnames. The fill_clr and txt_clr are passed on to the addBoxStyle
function.
arrowWidths(set_no, add_width)
Retrieves the details regarding arrow sizes for each arrow within the transition group
boxPositions(col)
The box positions as a list with scalars for the positions:
x The center x-position
y The center y-position
right The right edge
left The left edge
top The top edge
bottom The bottom edge
height The box height
width The box width
unit The unit used for the values (npc)
boxSizes(col)
Gets the size of the boxes. The col
argument should
be either an integer or 'last'
getDim()
Gets the current dimensions of the transitions
getTransitionSet(no, reduce_dim = FALSE)
Gets a specific set of transitions. If the reduce_dim
is set
to TRUE it will only return a 2-dimensional matrix even if the original
has a 3rd proportions dimension
getYProps(col)
Gets the proportions after removing the vertical_space
between the boxes
initialize(transitions, label, txt, fill_clr, txt_clr, txt_gpar, id, ...)
Set up a Transition object. The transitions
should be a 2D or 3D matrix
as defined in the $addTransitions
section and not as later internally stored.
noCols()
Gets the number of columns, i.e. the number of transitions
noRows(no)
Gets the number of boxes in each row. If multiple rows the number of rows may differ betwen each transition matrix we therefore need to specify what transitions that we refer to. If no value is specified it returns all of them.
render(new_page = TRUE)
Call this to render the full graph. The new_page
argument
is for creating a new plot, set this to FALSE
if you want to combine this plot with another or if you have
additional viewports that you intend to use.
trnstnSizes(set_no)
Gets the transitions per box as a 2D matrix. For the proportions
it also adds an attribute attr('props', prop_mtrx)
that
is a 2D matrix with the corresponding proportions.
# Transitions set.seed(1) n <- 10 my_data <- data.frame( Var_a = sample(c( "Test 1", "Test 2", "Test 3" ), size = n, replace = TRUE, prob = 3:1 ), Var_b = sample(c( "Test 1", "Test 2", "Test 3" ), size = n, replace = TRUE, prob = 1:3 ) ) mtrx <- with( my_data, table(Var_a, Var_b) ) # Initialize the transition plot transitions <- getRefClass("Transition")$new(mtrx, label = c("Before", "After")) # Render the plot transitions$render()
# Transitions set.seed(1) n <- 10 my_data <- data.frame( Var_a = sample(c( "Test 1", "Test 2", "Test 3" ), size = n, replace = TRUE, prob = 3:1 ), Var_b = sample(c( "Test 1", "Test 2", "Test 3" ), size = n, replace = TRUE, prob = 1:3 ) ) mtrx <- with( my_data, table(Var_a, Var_b) ) # Initialize the transition plot transitions <- getRefClass("Transition")$new(mtrx, label = c("Before", "After")) # Render the plot transitions$render()
This plot's purpose is to illustrate how states change before and after. In my research I use it before surgery and after surgery but it can be used in any situation where you have a change from one state to another
transitionPlot( transition_flow, type_of_arrow = c("grid", "simple", "gradient"), box_txt = rownames(transition_flow), tot_spacing = 0.2, box_width = 1/4, fill_start_box = "darkgreen", txt_start_clr = "white", fill_end_box = fill_start_box, txt_end_clr = txt_start_clr, cex = 2, min_lwd = if (type_of_arrow == "grid") 1 else unit(0.1, "mm"), max_lwd = if (type_of_arrow == "grid") 6 else unit(5, "mm"), lwd_prop_total = TRUE, arrow_clr = "#000000", abs_arrow_width = FALSE, overlap_bg_clr = "#FFFFFF", overlap_order = 1:nrow(transition_flow), overlap_add_width = if (type_of_arrow == "grid") 1.5 else unit(1, "mm"), box_prop, mar = unit(rep(3, times = 4), "mm"), main = NULL, box_label = NULL, box_label_pos = "top", box_label_cex = cex, color_bar = TRUE, color_bar_cex = cex * 0.33, color_bar_labels, color_bar_subspace = NULL, new_page = FALSE )
transitionPlot( transition_flow, type_of_arrow = c("grid", "simple", "gradient"), box_txt = rownames(transition_flow), tot_spacing = 0.2, box_width = 1/4, fill_start_box = "darkgreen", txt_start_clr = "white", fill_end_box = fill_start_box, txt_end_clr = txt_start_clr, cex = 2, min_lwd = if (type_of_arrow == "grid") 1 else unit(0.1, "mm"), max_lwd = if (type_of_arrow == "grid") 6 else unit(5, "mm"), lwd_prop_total = TRUE, arrow_clr = "#000000", abs_arrow_width = FALSE, overlap_bg_clr = "#FFFFFF", overlap_order = 1:nrow(transition_flow), overlap_add_width = if (type_of_arrow == "grid") 1.5 else unit(1, "mm"), box_prop, mar = unit(rep(3, times = 4), "mm"), main = NULL, box_label = NULL, box_label_pos = "top", box_label_cex = cex, color_bar = TRUE, color_bar_cex = cex * 0.33, color_bar_labels, color_bar_subspace = NULL, new_page = FALSE )
transition_flow |
This should be a matrix with the size of the transitions.
The unit for each cell should be number of observations, row/column-proportions
will show incorrect sizes. The matrix needs to be square. The best way to generate
this matrix is probably just do a |
type_of_arrow |
The types of arrow may be grid, simple, or gradient. Simple grid
arrows are the |
box_txt |
The text to appear inside of the boxes. If you need line breaks then you need to manually add a \n inside the string. |
tot_spacing |
The proportion of the vertical space that is to be left empty. It is then split evenly between the boxes. |
box_width |
The width of the box. By default the box is one fourth of the plot width. |
fill_start_box |
The fill color of the start boxes. This can either be a single value or a vector if you desire different colors for each box. If you specify box_prop then this has to be a 2 column matrix. |
txt_start_clr |
The text color of the start boxes. This can either be a single value or a vector if you desire different colors for each box. If you specify box_prop then this has to be a 2 column matrix. |
fill_end_box |
The fill color of the end boxes. This can either be a single value or a vector if you desire different colors for each box. If you specify box_prop then this has to be a 2 column matrix. |
txt_end_clr |
The text color of the end boxes. This can either be a single value or a vector if you desire different colors for each box. If you specify box_prop then this has to be a 2 column matrix. |
cex |
The cex |
min_lwd |
The minimum width of the line that we want to illustrate the tranisition with. |
max_lwd |
The maximum width of the line that we want to illustrate the tranisition with. |
lwd_prop_total |
The width of the lines may be proportional to either the other flows from that box, or they may be related to all flows. This is a boolean parameter that is set to true by default, i.e. relating to all flows. |
arrow_clr |
The color of the arrows. Usually black, can be a vector indicating each arrow from first to last arrow (counting from the top). If the vector is of the same length as the boxes then all box arrows will have the same color (that is all the arrows stemming from the left boxes) |
abs_arrow_width |
The width can either be absolute, i.e. each arrow headed for a box has the exact same width. The alternative is that the width is related to the line width. |
overlap_bg_clr |
In order to enhance the 3D perspective and to make it easier to follow arrows the arrows have a background color to separate them from those underneath. |
overlap_order |
The order from first->last for the lines. This means that the last line will be on top while the first one will appear at the bottom. This should be provided as a vector. |
overlap_add_width |
The width of the white cross-over line. You can specify this as a scalar multiplication of the current line width. In case of non-grid arrows then you can also have this as a unit which is recommended as it looks better. If the scalar is < 1 then the overlap is ignored. |
box_prop |
If you want the boxes to have proportions indicating some other factors then input
a matrix with quantiles for the proportions. Note the size must be |
mar |
A numerical vector of the form c(bottom, left, top, right) of the type |
main |
The title of the plot if any, default |
box_label |
A vector of length 2 if you want to label each box column |
box_label_pos |
The position of the label, either |
box_label_cex |
The cex of the label, defaults to the default cex |
color_bar |
If you have proportions inside the transition_flow variable
then the color_bar will automatically appear at the bottom unless you set
this to |
color_bar_cex |
The size of the tick labels for the color bar |
color_bar_labels |
The labels of the two proportions that make up the color bar.
Defaults to the labels of the third dimension for the |
color_bar_subspace |
If there is little or no difference at the low/high proportions of the spectrum then it can be of interest to focus the color change to the center leaving the tails constant |
new_page |
If you want the plot to appear on a new blank page then set this to |
void
# This example does not run since it # takes a little while to assemble the # arrows and RMD Check complains that this # is more than allowed for library(grid) par_org <- par(ask = TRUE) # Settings no_boxes <- 3 # Generate test setting transition_matrix <- matrix(NA, nrow = no_boxes, ncol = no_boxes) transition_matrix[1, ] <- 200 * c(.5, .25, .25) transition_matrix[2, ] <- 540 * c(.75, .10, .15) transition_matrix[3, ] <- 340 * c(0, .2, .80) grid.newpage() transitionPlot(transition_matrix, box_txt = c("First", "Second", "Third"), type_of_arrow = "simple", min_lwd = unit(1, "mm"), max_lwd = unit(6, "mm"), overlap_add_width = unit(1, "mm") ) # Setup proportions box_prop <- cbind(c(1, 0, 0.5), c(.52, .2, .8)) # From the Set2 Colorbrewer start_box_clr <- c("#8DA0CB", "#FC8D62") # Darken the colors slightly end_box_clr <- c( colorRampPalette(c(start_box_clr[1], "#000000"))(10)[2], colorRampPalette(c(start_box_clr[2], "#000000"))(10)[2] ) # Create a new grid grid.newpage() transitionPlot(transition_matrix, box_prop = box_prop, fill_start_box = start_box_clr, fill_end_box = end_box_clr, txt_start_clr = c("#FFFFFF", "#000000"), txt_end_clr = c("#FFFFFF", "#000000"), box_txt = c("First", "Second", "Third"), type_of_arrow = "gradient", min_lwd = unit(1, "mm"), max_lwd = unit(10, "mm"), overlap_add_width = unit(1, "mm") ) par(par_org)
# This example does not run since it # takes a little while to assemble the # arrows and RMD Check complains that this # is more than allowed for library(grid) par_org <- par(ask = TRUE) # Settings no_boxes <- 3 # Generate test setting transition_matrix <- matrix(NA, nrow = no_boxes, ncol = no_boxes) transition_matrix[1, ] <- 200 * c(.5, .25, .25) transition_matrix[2, ] <- 540 * c(.75, .10, .15) transition_matrix[3, ] <- 340 * c(0, .2, .80) grid.newpage() transitionPlot(transition_matrix, box_txt = c("First", "Second", "Third"), type_of_arrow = "simple", min_lwd = unit(1, "mm"), max_lwd = unit(6, "mm"), overlap_add_width = unit(1, "mm") ) # Setup proportions box_prop <- cbind(c(1, 0, 0.5), c(.52, .2, .8)) # From the Set2 Colorbrewer start_box_clr <- c("#8DA0CB", "#FC8D62") # Darken the colors slightly end_box_clr <- c( colorRampPalette(c(start_box_clr[1], "#000000"))(10)[2], colorRampPalette(c(start_box_clr[2], "#000000"))(10)[2] ) # Create a new grid grid.newpage() transitionPlot(transition_matrix, box_prop = box_prop, fill_start_box = start_box_clr, fill_end_box = end_box_clr, txt_start_clr = c("#FFFFFF", "#000000"), txt_end_clr = c("#FFFFFF", "#000000"), box_txt = c("First", "Second", "Third"), type_of_arrow = "gradient", min_lwd = unit(1, "mm"), max_lwd = unit(10, "mm"), overlap_add_width = unit(1, "mm") ) par(par_org)
Manually viewing a list object can be tricky where the natural print can be hard to work through. The config format *yaml* is increadibly dense and useful not only for writing configs but also viewing them which 'yamlDump' helps with.
yamlDump(x)
yamlDump(x)
x |
An object that |
void
some_fancy_list <- list(complex = list(some_data = 1:3, other_data = list(name = "Max")), simple = "awesome overview") yamlDump(some_fancy_list) #complex: # some_data: # - 1 # - 2 # - 3 # other_data: # name: Max #simple: awesome overview # If you got a character json you can also input it directly # and the function will automatically convert it to a list yamlDump('{"a":{"b":["1"]}}')
some_fancy_list <- list(complex = list(some_data = 1:3, other_data = list(name = "Max")), simple = "awesome overview") yamlDump(some_fancy_list) #complex: # some_data: # - 1 # - 2 # - 3 # other_data: # name: Max #simple: awesome overview # If you got a character json you can also input it directly # and the function will automatically convert it to a list yamlDump('{"a":{"b":["1"]}}')