Functions

Introduction to R: Functions

Functions are blocks of R code that compute things. R comes with many built in functions, some of which you have already seen. For most simple mathematical calculations that you can think of, R probably has a built in function that will do them.

# take the square root of a number
sqrt(4)

## [1] 2

# compute the mean of a vector of numbers
a = c(1, 2, 4, 8, 16, 32)
mean(a)

## [1] 10.5

# compute the variance of a vector of numbers
var(a)

## [1] 140.7

# compute the sum of a vector
sum(a)

## [1] 63

# compute the difference between each item in a vector and the item preceding it (i.e., a[i]-a[i-1] for every index i except i=1)
diff(a)

## [1]  1  2  4  8 16

To find out more about any function, you can simply use ‘?’, so ‘?mean’ will bring up the help page for the function mean. If you are not sure what the name of the function is, but you know what you want it to do, you can use ‘??’. So, for example, if you wanted to do an ANOVA, but you weren’t sure what the function was, you could type the command ‘??anova’ and it would bring up a list of help pages that might contain relevant information.

One of the most important things to know about a function are its arguments; that is, what information the function requires in order to do what it is supposed to do. So, for example, if you look at the help page for the mean function, there is a “Description” section saying what the function does, followed by a “Usage” section showing how to call the function (and giving the default values of the arguments), and then an “Arguments” section explaining what is required for the function to work properly. So, for the mean function, there are three arguments, ‘x’, ‘trim’, and ‘na.rm’. Looking at the “Usage” section tells you that ‘trim’ and ‘na.rm’ are, by default, assigned to be 0 and FALSE, respectively. What that means is that you only need to set values for ‘trim’ and ‘na.rm’ if you want to modify them from their default values. Otherwise, all you need to specify is ‘x’, the first argument of the function. The “Arguments” section tells you that ‘x’ is supposed to a vector and that it can be numeric or logical. There is also a “Value” section explaining what is returned by the function.

This is well and good, but most of the time when we are wanting to compute things, there is no function that will do that computation for us. In such a case you must define your own function. As with any R function, you must indicate what the arguments of the function are, and the function must return something to you.

Let’s start simple. Let’s define a function that takes a single numeric value as an argument, adds nine to that number and squares the result. We can give this function any name we like, so it is better to give it a meaningful name. In this case, we will call our function ‘example.fn’. This function will have a single argument ‘x’.

example.fn = function(x) {
    # this bracket is necessary - it tells R that this function will be multiple
    # lines long manipulate x and assign it to a new variable y
    y = (x + 9)^2
    # return y
    return(y)
}

Now, when we call ‘example.fn’, whatever value we pass in will be assigned to the value ‘x’ inside the function and ‘y’ will be computed and returned. For example,

example.fn(3)

## [1] 144

example.fn(11)

## [1] 400

It is important to note that ‘x’ and ‘y’ do not exist outside of the function, so you cannot, for example, call ‘example.fn’ and then reference ‘y’ in a later calculation. If you do, you will get an error. Instead, you must store the result of the function in a variable, and then use that variable instead. So, if you wanted to use the result of ‘example.fn’ in another calculation, you would do

# call the function
z = example.fn(5)
z/5  # divide the result by 5

## [1] 39.2

# you could also do the following, as example.fn(5) returns a numeric value
# that can be manipulated like any other numeric value
example.fn(5)/5

## [1] 39.2

You can also write functions that combine multiple functions together. For example, assume that you want to compute the frequency of an allele in the next time step, given its frequency in the current time step. To do this, you need to first calculate the marginal fitness of the allele, and also the mean population fitness.

# a function that calculates marginal fitness of the allele
marg.fit = function(p) {
    # fitness of each genotype
    W_AA = 1
    W_Aa = 2
    W_aa = 0.1
    
    mf = p * W_AA + (1 - p) * W_Aa
    return(mf)
}

# a function that calculates mean population fitness
mean.fit = function(p) {
    W_AA = 1
    W_Aa = 2
    W_aa = 0.1
    
    mf = p^2 * W_AA + 2 * p * (1 - p) * W_Aa + (1 - p)^2 * W_aa
    return(mf)
}

# calculate allele frequency in the next time step
incr.allele = function(p) {
    na = p * marg.fit(p)/mean.fit(p)
    return(na)
}

incr.allele(0.5)

## [1] 0.5882

Notice that it was not a problem having both ‘marg.fit’ and ‘mean.fit’ return variables called ‘mf’ - that is because variables with those names live only inside the functions, so they cannot interfere with or overwrite one another.

You can also pass many values for arguments to functions at the same time. Going back to the allele frequency example, what if you want to know the allele frequency change for many values of p, not just a single value?

p = seq(0, 1, 0.1)  # set p to be vector of values from 0 to 1 in steps of 0.1
pt = incr.allele(p)  # calculate the allele frequency in the next time step, given different values of p
pt

##  [1] 0.0000 0.4213 0.4839 0.5209 0.5536 0.5882 0.6287 0.6796 0.7477 0.8454
## [11] 1.0000

You can also set as many variables as you want. For example, maybe you want to have ‘W_AA’, ‘W_Aa’, and ‘W_aa’ be specified as arguments to the function, rather than having them be set inside the function.

marg.fit.2 = function(p, W_AA, W_Aa) {
    mf = p * W_AA + (1 - p) * W_Aa
    return(mf)
}

mean.fit.2 = function(p, W_AA, W_Aa, W_aa) {
    mf = p^2 * W_AA + 2 * p * (1 - p) * W_Aa + (1 - p)^2 * W_aa
    return(mf)
}

incr.allele.2 = function(p, W_AA, W_Aa, W_aa) {
    na = p * marg.fit.2(p, W_AA, W_Aa)/mean.fit.2(p, W_AA, W_Aa, W_aa)
    return(na)
}

incr.allele.2(p = seq(0, 1, 0.1), W_AA = 1, W_Aa = 2, W_aa = 1)

##  [1] 0.0000 0.1610 0.2727 0.3592 0.4324 0.5000 0.5676 0.6408 0.7273 0.8390
## [11] 1.0000

incr.allele.2(p = seq(0, 1, 0.1), W_AA = 1, W_Aa = 0.01, W_aa = 1)

##  [1] 0.00000 0.01326 0.06089 0.15765 0.30945 0.50000 0.69055 0.84235
##  [9] 0.93911 0.98674 1.00000