Skip to Content

strapply {gsubfn}

Apply a function over a string or strings.
Package: 
gsubfn
Version: 
0.6-5

Description

Similar to "'gsubfn'" except instead of performing substitutions it returns the output of "'FUN'".

Usage

strapply(X, pattern, FUN = function(x, ...) x, backref = NULL, ..., 
    empty = NULL,
    ignore.case = FALSE, perl = FALSE, engine = getOption("gsubfn.engine"),
    simplify = FALSE, USE.NAMES = FALSE, combine = c)
strapplyc(X, pattern, backref, ignore.case = FALSE, simplify = FALSE, USE.NAMES = FALSE, engine = getOption("gsubfn.engine"))

Arguments

X
list or (atomic) vector of character strings to be used.
pattern
character string containing a regular expression (or character string for "'fixed = TRUE')" to be matched in the given character vector.
FUN
a function, formula, character string, list or proto object to be applied to each element of "'X'". See discussion in gsubfn.
backref
See gsubfn.
empty
If there is no match to a string return this value.
ignore.case
If TRUE then case is ignored in the pattern         argument.
perl
If TRUE then engine="R" is used with         perl regular expressions.
engine
Specifies which engine to use. If the R installation             has tcltk capability then the tcl engine is used             unless FUN is a proto object in which case the             "R" engine is used (regardless of the setting of this             argument).
...
optional arguments to "'gsubfn'".
simplify
logical or function. If logical, should the result be simplified to a vector or matrix, as in "sapply" if possible? If function, that function is applied to the result with each component of the result passed as a separate argument. Typically if the form is used it will typically be specified as rbind.
USE.NAMES
logical; if "'TRUE'" and if "'X'" is     character, use "'X'" as 'names' for the result unless it had names already.
combine
combine is a function applied to the components of      the result of FUN. The default is "c". "list" is another common choice. The default may change to be "list" in the future.

Details

If FUN is a function then for each character string in "X" the pattern is repeatedly matched, each such match along with back references, if any, are passed to the function "FUN" and the output of FUN is returned as a list. If FUN is a formula or proto object then it is interpreted to the way discussed in gsubfn.

If FUN is a proto object or if perl=TRUE is specified then engine="R" is used and the engine argument is ignored.

If backref is not specified and engine="R" is specified or implied then a heuristic is used to calculate the number of backreferences. The primary situation that can fool it is if there are parentheses in the string that are not back references. In those cases the user will have to specify backref. If engine="tcl" then an exact algorithm is used and the problem sentence never occurs.

strapplyc is like strapply but specialized to FUN=c for speed. If the "tcl" engine is not available then it calls strapply and there will be no speed advantage.

Values

A list of character strings.

See Also

See gsubfn. For regular expression syntax used in tcl see http://www.tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm and for regular expression syntax used in R see the help page for regex.

Examples

strapply("12;34:56,89,,12", "[0-9]+")
 
# separate leading digits from rest of string
# creating a 2 column matrix: digits, rest
s <- c("123abc", "12cd34", "1e23")
t(strapply(s, "^([[:digit:]]+)(.*)", c, simplify = TRUE)) 
 
# same but create matrix
strapply(s, "^([[:digit:]]+)(.*)", c, simplify = rbind)
 
# running window of 5 characters using 0-lookahead perl regexp
# Note that the three ( in the regexp will fool it into thinking there
# are three backreferences so specify backref explicitly.
x <- "abcdefghijkl"
strapply(x, "(.)(?=(....))",  paste0, backref = -2, perl = TRUE)[[1]]
 
# Note difference.  First gives character vector.  Second is the same.
# Third has same elements but is a list.
# Fourth gives list of two character vectors. Fifth is the same.
strapply("a:b c:d", "(.):(.)", c)[[1]]
strapply("a:b c:d", "(.):(.)", list, simplify = unlist) # same
 
strapply("a:b c:d", "(.):(.)", list)[[1]]
 
strapply("a:b c:d", "(.):(.)", c, combine = list)[[1]]
strapply("a:b c:d", "(.):(.)", c, combine = list, simplify = c) # same
 
# find second CPU_SPEED value given lines of config file
Lines <- c("DEVICE = 'PC'", "CPU_SPEED = '1999', '233'")
parms <- strapply(Lines, "[^ ',=]+", c, USE.NAMES = TRUE, 
    simplify = ~ lapply(list(...), "[", -1))
parms$CPU_SPEED[2]
 
# return first two words in each string
p <- proto(fun = function(this, x) if (count <=2) x)
strapply(c("the brown fox", "the eager beaver"), "\\w+", p)
 
 
## Not run:
# convert to chron
library(chron)
x <- c("01/15/2005 23:32:45", "02/27/2005 01:22:30")
x.chron <- strapply(x, "(../../....) (..:..:..)",  chron, simplify = c)
 
# time parsing of all 275,546 words from James Joyce's Ulysses
joyce <- readLines("http://www.gutenberg.org/files/4300/4300-8.txt") 
joycec <- paste(joyce, collapse = " ") 
system.time(s <- strapplyc(joycec, "\w+")[[1]]) 
length(s) # 275546 
## End(Not run)

Documentation reproduced from package gsubfn, version 0.6-5. License: GPL (>= 2)