What are packages?

One of the many benefits of using R for data analysis is the community behind it. Although our focus may be specific on language and linguistics, R is used in a variety of different research and industry areas, which means that different people use R to address different needs. What this means is that there are a number of different packages for R. Essentially, an R package is a set of pre-created functions that someone else has made. By downloading and installing an R package, you gain access to those functions.

But wait, what the heck is a function?

A function in R is a command. It is a pre-made set of instructions that are easily called upon by using the name of the function. Functions in R will have a parentheses after their name. For example, sum() is a function.

The parentheses contain the arguments for a function - special instructions for the function, as well as the data you want the function to operate on. Functions can have more than one argument, depending on how the function was made.

print()

Let’s look at the print() function. This function will print the output of some operation to the R console. For example, if you execute the command print('hello world'), the function will print the string 'hello world' to the console.

print('hello world')
## [1] "hello world"

help()

The print() function is a function that comes with R. There are many other similar functions, such as the help() function which will tell you about existing functions. Let’s try using it to understand what print() does. Pay attention - here I use print() and type just the name of the function I want help for inside the help() function:

help(print)
## Help on topic 'print' was found in the following packages:
## 
##   Package               Library
##   base                  /Library/Frameworks/R.framework/Resources/library
##   ggeffects             /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
## 
## 
## Using the first match ...

Print Values

Description

print prints its argument and returns it invisibly (via invisible(x)). It is a generic function which means that new printing methods can be easily added for new classes.

Usage

print(x, ...)

## S3 method for class 'factor'
print(x, quote = FALSE, max.levels = NULL,
      width = getOption("width"), ...)

## S3 method for class 'table'
print(x, digits = getOption("digits"), quote = FALSE,
      na.print = "", zero.print = "0",
      right = is.numeric(x) || is.complex(x),
      justify = "none", ...)

## S3 method for class 'function'
print(x, useSource = TRUE, ...)

There are other ways of getting help for functions and commands in R, such as using ?

?print
## Help on topic 'print' was found in the following packages:
## 
##   Package               Library
##   base                  /Library/Frameworks/R.framework/Resources/library
##   ggeffects             /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
## 
## 
## Using the first match ...

sum()

Let’s look at one more base function - sum(). This will sum together all of the numbers that you provide to the function. For example…

One number…

sum(1)
## [1] 1

Two numbers…

sum(1,2)
## [1] 3

Even more numbers…

sum(1,2,3,4,5,6,7,8,9,10)
## [1] 55

seq()

Functions usually aren’t as nice as sum(), and will require a default number of arguments. Not including the correct number will result in errors or other bad stuff. It’s always good to look at the help documentation for a function and scroll down to check the examples for use.

Let’s look at another function called seq()

help(seq)

Sequence Generation

Description

Generate regular sequences. seq is a standard generic with a default method. seq.int is a primitive which can be much faster but has a few restrictions. seq_along and seq_len are very fast primitives for two common cases.

Usage seq(…)

## Default S3 method:
seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)),
    length.out = NULL, along.with = NULL, ...)

seq.int(from, to, by, length.out, along.with, ...)
...

The seq() function creates a sequence and has several arguments used to create a sequence. The first arguments are from and to. We can use these arguments inside the function by calling their names. The from and to arguments define the boundaries of the sequence.

A sequence from 1 to 8:

seq(from = 1, to = 8)
## [1] 1 2 3 4 5 6 7 8

Notice that you can actually type the names of the arguments and use = to specify their values. The seq() function also has a by argument - it allows you to specify the size of the sequence’s increments.

A sequence from 10 to 100, incremented by 5 each time:

seq(from = 10, to = 100, by = 5)
##  [1]  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95 100

Arguments also have default positions in a function. Compare:

# not calling arguments
seq(1,10)
##  [1]  1  2  3  4  5  6  7  8  9 10
# calling arguments 
seq(from = 1, to = 10)
##  [1]  1  2  3  4  5  6  7  8  9 10
# calling arguments in a different order
seq(to = 10, from = 1)
##  [1]  1  2  3  4  5  6  7  8  9 10

What is the default argument order for seq()?

New packages = new functions

It is a guarantee that you will want to install a new package to do something else within R. There are many ways to get new packages, but perhaps the most common way is to use the install.packages() function in R. You use this function by including the name of the package you want inside the function (make sure to type the name of the package inside matching quote marks 'package' or "package").

Let’s practice this by installing a silly R package called cowsay - here is how to do it:

install.packages('cowsay')

You will see output that looks like this after you install it:

> install.packages('cowsay')
also installing the dependencies ‘fortunes’, ‘rmsfact’

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.4/fortunes_1.5-4.tgz'
Content type 'application/x-gzip' length 208887 bytes (203 KB)
==================================================
downloaded 203 KB

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.4/rmsfact_0.0.3.tgz'
Content type 'application/x-gzip' length 19378 bytes (18 KB)
==================================================
downloaded 18 KB

trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-arm64/contrib/4.4/cowsay_0.9.0.tgz'
Content type 'application/x-gzip' length 400292 bytes (390 KB)
==================================================
downloaded 390 KB


The downloaded binary packages are in
    /var/folders/4z/2cllrsm12cj_hhk_6cqygd_00000gn/T//RtmpuWey8q/downloaded_packages

One you’ve installed a package, you need to load the package into R so that you can use it. We do this with the library() function. You type the name of the package in the function, like this:

library(cowsay)

What is the cowsay package? Just like functions, we can also use help() with packages. We just need to modify the function to tell it to show the package index, which lists all the functions and other parts of a package.

help(package = 'cowsay')

It seems that cowsay has three components:

  • a data set called animals
  • a function called endless_horse()
  • a function called say()

Let’s work with say(). Read the help page for it.

help(say)

Sling messages and warnings with flair Description Sling messages and warnings with flair

Usage

say(
  what = "Hello world!",
  by = "cat",
  type = NULL,
  what_color = NULL,
  by_color = NULL,
  length = 18,
  fortune = NULL,
  ...
)

Let’s try it out:

say()
## 
##  -------------- 
## Hello world! 
##  --------------
##     \
##       \
##         \
##             |\___/|
##           ==) ^Y^ (==
##             \  ^  /
##              )=*=(
##             /     \
##             |     |
##            /| | | |\
##            \| | |_|/\
##       jgs  //_// ___/
##                \_)
## 

What other values can we give? Running the function without valid arguments gives us an error, but helpfully shows us the possible values for the by argument, which controls the animal we see.

say(what = '', by = '')
## Error in match.arg(by, c(choices = names(animals), "rms", "random")): 'arg' should be one of "cow", "chicken", "chuck", "clippy", "poop", "bigcat", "ant", "pumpkin", "ghost", "spider", "rabbit", "pig", "snowman", "frog", "hypnotoad", "shortcat", "longcat", "fish", "signbunny", "facecat", "behindcat", "stretchycat", "anxiouscat", "longtailcat", "cat", "trilobite", "shark", "buffalo", "grumpycat", "smallcat", "yoda", "mushroom", "endlesshorse", "bat", "bat2", "turkey", "monkey", "daemon", "egret", "duckling", "duck", "owl", "squirrel", "squirrel2", "goldfish", "alligator", "stegosaurus", "whale", "wolf", "rms", "random"
say(what = 'moo!', by = 'cow')
## 
##  ----- 
## moo! 
##  ------ 
##     \   ^__^ 
##      \  (oo)\ ________ 
##         (__)\         )\ /\ 
##              ||------w|
##              ||      ||

accessing function from libraries without loading the library

You can use the :: command to access a function from an installed package without loading it into your R session, using the format package::function(). This is also a nice way to keep track of which which function come from which libraries:

cowsay::say()
## 
##  -------------- 
## Hello world! 
##  --------------
##     \
##       \
##         \
##             |\___/|
##           ==) ^Y^ (==
##             \  ^  /
##              )=*=(
##             /     \
##             |     |
##            /| | | |\
##            \| | |_|/\
##       jgs  //_// ___/
##                \_)
## 

What doesendless_horse() functions do?

help(endless_horse)

Try it out (press ESC to stop the horse…)

endless_horse()