5210 Chapter 3 Lecture Notes (Day 4)

Shane Mueller
Sept 13 2018

Programming concepts

There are a few core concepts that give you a lot of capabilities in R

They let you reuse analysis, automate analysis, save time, repeat processes etc.

  • Functions
  • Looping
  • Conditionals

Because R allows using functions on entire data vectors, it often blurs the line between conditionals and iterating.

Functions in R

Functions are defined using the function() function, and the result is assigned to a name.

  • Don't get function names and variable names confused.
  • These two are the same thing:
se <- function(x)
   stdev <- sd(x)
   se <-  stdev/sqrt(length(x))

se <- function(x) {sd(x)/sqrt(length(x))}

Scoping Rules

  • Anything defined outside the function is available to the function.
  • What happens in the function stays in the function.
  • You must return the results in order to see anything you did
x <- 100
doit <- function(){
   x <- x + 1
[1] 101
[1] 100


In R, function arguments are named:

  • They can have default values.
  • If name is not specified in function, the are assigned in order.
  • If argument is not given, the default is used
  • Argument defaults can be calculated based on other parameters
fn <- function(x, y=NULL,title="My Title")

Nameless (lambda) functions

-You don't need to name a function to use it:

x2 <- data.frame(a=runif(10),b=runif(10)+.5)

lapply(x2, function(x){mean(x)})
[1] 0.4069604

[1] 0.8926438

Exercise: Functions

  • Make a function that computes the min, median, and max of a data set x and returns them as a list of three elements.
  • Create some fake data from at least two different distributions and test it out.

Exercise Solution: Functions

mmm  <- function(x)

##See  how  these  change  when  you  have  more  samples:
mmm(1/runif (1000))
[1]    1.001720    1.965887 2248.495039
mmm(1/runif (100))
[1]  1.028069  2.216996 66.805299
data  <- runif (100) + 1/rnorm (100)
[1] -197.8342889    0.2196956  110.9483544


Most common conditional statement is the if statement

##Conditional Branching
if( 0 )
  print("This will never print")
  print("But this will")
[1] "But this will"
[1] "more"

Other conditional logic

  • ifelse operates on an entire vector at once.
  • switch allows multiple conditions
  • which allows selecting matching values from a vector ##this changes lowercase a to uppercase:
  x <- sample(letters[1:5],10,replace=T)
  x2 <-ifelse(x=="a","A",x)
 [1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"
 [1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"

Exercise: Conditionals

Write a function that will take as its first argument a data vector (e.g., something produced by runif(1000)), and as its second argument a keyword which tells the function whether to plot a histogram or a scatterplot.

Exercise Solution: Conditionals

x <- exp(rnorm(1000)*.3)
myplot <- function(x,type="scatter")
    plot(x)   ##Plot a regular plot here
  }else if(type=="histogram")
      hist(x) ##Plot a histogram


plot of chunk unnamed-chunk-8


plot of chunk unnamed-chunk-8

Exercise: Conditionals 2

Write a new mean function that does not return an error when given a factor. Rather, it returns the modal (most common) value of that factor. Then use that function in the lapply and sapply on x2. Use:

x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))

Exercise Solution: Conditionals 2

x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))
newmean <- function(data)
    tab <- table(data)
  } else {
    return(mean(data) )

Exercise Solution: Conditionals 2 pt 2

[1] 0.4977649
[1] "K"
[1] 0.4977649

[1] 0.4403626

[1] "K"
                  a                   b                   c 
"0.497764854519628" "0.440362612789031"                 "K" 

Looping and Iteration

Looping and iteration are methods for repeating some code or operation many times. Usually, iteration refers to repeating an operation across elements of a data set, and looping is more general

Important methods for this:

  • for()
  • apply()
  • tapply and aggregate

    Methods to avoid unless you know what you are doing

  • while, repeat

  • lapply, sapply

The for loop

This keyword iterates a block of code over a set of values.

  • The for loop is usually the only thing you need
  • a:b defined sequences are usually implicitly defined, so that they don't actually create that set of numbers in memory.
j <- 1
for(i in 1:1000000)
  j <- j + runif(1)  
[1] 500314.9

Two ways to iterate over a vector with for

Version 1

x <- sample(LETTERS)
out <- ""
for(i in 1:length(x))
  out <- paste(out, x[i],sep="")

Version 2

out <- ""
for(i in x)
  out <- paste(out, i,sep="")

ifelse and which

Useful for recoding:

vals <- sample(c("man","WOMAN"),10,replace=T)
coded <- ifelse(vals=="man",1,2)
coded2 <- vals
coded2[which(vals=="man")] <- "MAN"
 [1] 2 2 1 1 2 1 2 2 2 2
 [1] "WOMAN" "WOMAN" "MAN"   "MAN"   "WOMAN" "MAN"   "WOMAN" "WOMAN"
 [9] "WOMAN" "WOMAN"

Exercise: Looping

Create a series of 1,000,000 letters of the alphabet using

items <- sample(letters,1000000,replace=T)
  • Write functions that will replace all a' values with anA', and b with a B'.
  • Write one that uses an if statement, one that uses ifelse, and one that uses which.
  • Run the function inside a system.time() statement, and see which is the fastest.

Exercise Solution: Looping

  • See R code in walkthrough document.


Some topics from discussion

  • Local/Global variables and scoping
  • Usefulness of wrapper functions
  • When you would choose one looping method over another
  • Others?