Saturday, March 16, 2019

Writing Functions (part 1)

Writing functions

This post outlines the writing of a basic function. Writing functions in R [@R-base] is fairly simple, and the usefulness of function writing cannot be conveyed in a single post. I have included “Part one” in the title, and I will add follow-up posts in time.

The basic code to write a function looks like this:

function_name <- function(){}

The code for the task you want your function to perform goes inside the curly brackets {}, and the object you wish the function to work on goes inside the parenthesis().

The problem

I have often found myself using a number of different functions together for multiple variables. For each variable, I need re-type each function. For example, when looking at a variable, I would often run the functions mean(), sd(), min(), max(), and length() together. Each time I wanted to inspect a new variable, I had to type all five functions for the variable in question. For example, looking at the Temp variable, from the airquality dataset in the datasets package, would require typing the following: mean(airquality$Temp), sd(airquality$Temp), min(airquality$Temp), max(airquality$Temp), length(airquality$Temp). This can get very tedious and repetitive.

The solution

In response to repeatedly typing these functions together, I created the descriptives() function which combines these frequently used functions into a single function.

[1]: Most of what descriptives() does can also be achieved by the summary() function, however sd() and length() are missing.

The descriptives() function

The descriptives() function combines the functions mean(), sd(), min(), max(), and length() to return a table displaying the mean, standard deviation, minimum, maximum, and length of a vector.[1] The code for creating this function is below, each line of code within the function is explained in the comment above (denoted with the # symbol). The code below can be copied and pasted into your R session to create the descriptives() function.

descriptives <- function(x){

      # create an object "mean" which contains the mean of x
  mean <- mean(x, na.rm = TRUE)

      # create an object "sd" which contains the sd of x
  sd <- sd(x, na.rm = TRUE)

      # create an object "min" which contains the min of x
  min <- min(x, na.rm = TRUE)

      # create an object "max" which contains the max of x
  max <- max(x, na.rm = TRUE)

      # create an object "len" which contains the length of x
  len <- length(x)

      # combine the objects created into a table
  data.frame(mean, sd, min, max, len)
}

When you pass a vector x through the function descriptives(), it creates 5 objects which are then combined into a table. Running the function returns the table:

descriptives(airquality$Temp)
##       mean      sd min max len
## 1 77.88235 9.46527  56  97 153

Things to bear in mind when writing functions

  1. Try to give your function a name that is short and easy to remember.
  2. If you are writing a longer more complex function, it may be useful to test it line-by-line, before seeing if it “works”; this will help to identify any errors before they cause your function to fail.
  3. If the function returns an error, testing the code line by line will help you find the source of the error.
  4. The final line of code in a function will be the “output” of the function.
  5. Objects created within the function are not saved in the global environment: in the descriptives() function, all that is returned is a table containing the variables specified. The individual objects that were created disappear when the function has finished running.
  6. The disappearing of objects created within a function described above can be very useful for keeping a tidy working environment.

Conclusion

I find myself writing functions regularly, for various tasks. Often a function may be specific to a particular task, or even to a particular dataset. One example of such a function builds on the previous post, in which I described how to create a dataframe from multiple files. In practice, I rarely create data frames exactly as described. I usually nest the “read.csv” function within a larger function that also sorts the data, creating a more manageable dataframe, better suited to my purposes; e.g., removing variables that are of no interest or computing/recoding variables. I can then run this function to build my dataframe at the start of a session.

References

Writing Functions (part 2)

The current post will follow on from the previous post and describe another use for writing functions.

R Markdown and reporting p values in APA format

The function described here is designed for use with R Markdown. I would write a post about how great R Markdown is, and how to use it, but there is already a wealth of information out there; see here, here, and here for a sample. This post relates to producing an APA formatted pdf using the papaja package [@aust_papaja_2017]. Specifically, I describe a function that can be used to report p values correctly according to APA guidelines.

The problem

One of the great things about R Markdown is the “in-line code” option, whereby, instead of typing numbers, you can insert the code for the value you wish to report, and when the document is compiled, the correct number is reported.

However, the reporting of a p value in APA format varies depending on what the p value actually is. It is consistently reported to three decimal places, with no “zero” preceding the decimal point. Values less than “.001” are reported as: “p < .001.” For example, a p value of “.8368621” would be reported as “p = .837”; while a p value of “.0000725” would be reported as “p < .001”.

The specific formatting requirements, and the variation in the reporting of the p value depending on the value being reported means that simply including in-line code to generate the p value is not always sufficient.

The solution

In order to remove the need tweak the formatting each time I report a new p value, I have created a function to do it for me.[1]

[1]: The function described here, along with the descriptives() function described in the previous post, are part of a package I created called desnum [@R-desnum]. Writing functions as part of a package means that instead of writing the function anew for each session, you can just load the package. Follow up posts will probably describe more functions in the desnum package. If you wish to install the desnum package run the following code:

devtools::install_github("cillianmiltown/R_desnum")

The p_report() function

The p_report() function takes any number less than 1, and reports it as an APA formatted p value. Let's say you run a test, and save the p value from that test in the object p1, all you need to type in your R Markdown document then is

*p* `r paste(p_report(p1))`

The p_report() function will remove the preceding zero, correctly identify whether “=” or “<” is needed, and report p1 to three decimal places. Nesting it within paste() ensures that its output is included in the compiled pdf.

As in the previous post, the code for creating the function is below, and each line of code within the function is explained in the comment above (denoted with the # symbol). Again, this code can be copied and pasted into your R session to create the p_report() function.

p_report <- function(x){

      # create an object "e" which contains x, the p value you are reporting,
      # rounded to 3 decimal places

  e <- round(x, digits = 3)

      # the next two lines of code prints "< .001" if x is indeed less than .001

  if (x < 0.001) 
    print(paste0("<", " ", ".001"))

      # if x is greater than .001, the code below prints the object "e"
      # with an "=" sign, and with the preceeding zero removed

  else
    print(
      paste0("=",
                 " ",
                 sub("^(-?)0.", "\\1.", sprintf("%.3f",e))))

}

Usage

The best way to illustrate the usage of p_report() is through examples. We will use the airquality dataset and compare the variation in temperature (Temp) and wind speed (Wind) depending on the month.

Preparing the dataset

First we need to load the dataset and make it (more) usable.

      # create a dataframe df, containing the airquality dataset

df <- airquality

      # change the class of df$Month from "integer" to "factor"

df$Month <- as.factor(df$Month)

Wind

We can test for differences in wind speed depending on Month. Run an anova and save the p value in an object b.

    # create an object "aov" containing the summary of the anova

aov <- summary(aov(Wind~Month, data = df))

    # create an object "b" containing the p value of aov

b <- aov[[1]][["Pr(>F)"]][1]

The output of aovis:

##              Df Sum Sq Mean Sq F value  Pr(>F)   
## Month         4  164.3   41.07   3.529 0.00879 **
## Residuals   148 1722.3   11.64                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As you can see, the p value is 0.00879.

Including b in-line returns 0.0087901, however if we pass b through p_report() by enclosing paste(p_report(b)) in r denoted back ticks. Typing the following in an R Markdown document:

*p* `r paste(p_report(b))`

returns: p = .009.

Temp

Similarly, we can test for differences in temperature depending on Month. By using the same names for the objects, we can use the same in-line code to report the p values.

    # create an object "aov" containing the summary of the anova

aov <- summary(aov(Temp~Month, data = df))

    # create an object "b" containing the p value of aov

b <- aov[[1]][["Pr(>F)"]][1]

The output of aovis:

##              Df Sum Sq Mean Sq F value Pr(>F)    
## Month         4   7061  1765.3   39.85 <2e-16 ***
## Residuals   148   6557    44.3                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As you can see, the p value is <2e-16.

When this is run through p_report() using:

*p* `r paste(p_report(b))`

which will return: “p < .001”.

Conclusion

The p_report() function is an example of using R to make your workflow easier. R Markdown replaces the need to type the numbers you report with the option of including in-line code to generate these numbers. p_report() means that you do not have to worry about formatting issues when these numbers are reported. Depending on how you structure your code chunks around your writing, and how name your objects, it may be possible to recycle sections of in-line code, speeding up the writing process. Furthermore, the principle behind p_report() can be applied to the writing of other functions (e.g., reporting F values or \(\chi\)2).

References