5210 Chapter 3 Lecture Notes (Day 4)

Shane Mueller
Sept 13 2018

Programming concepts

There are a few core concepts that give you a lot of capabilities in R

They let you reuse analysis, automate analysis, save time, repeat processes etc.

  • Functions
  • Looping
  • Conditionals

Because R allows using functions on entire data vectors, it often blurs the line between conditionals and iterating.

Functions in R

Functions are defined using the function() function, and the result is assigned to a name.

  • Don't get function names and variable names confused.
  • These two are the same thing:
se <- function(x)
{
   stdev <- sd(x)
   se <-  stdev/sqrt(length(x))
   return(se)
}

se <- function(x) {sd(x)/sqrt(length(x))}

Scoping Rules

  • Anything defined outside the function is available to the function.
  • What happens in the function stays in the function.
  • You must return the results in order to see anything you did
x <- 100
doit <- function(){
   x <- x + 1
  return(x)
}
print(doit())
[1] 101
print(x)
[1] 100

Arguments

In R, function arguments are named:

  • They can have default values.
  • If name is not specified in function, the are assigned in order.
  • If argument is not given, the default is used
  • Argument defaults can be calculated based on other parameters
fn <- function(x, y=NULL,title="My Title")
{
  print(paste(x,y,title))
}

Nameless (lambda) functions

-You don't need to name a function to use it:

set.seed(111)
x2 <- data.frame(a=runif(10),b=runif(10)+.5)

lapply(x2, function(x){mean(x)})
$a
[1] 0.4069604

$b
[1] 0.8926438

Exercise: Functions

  • Make a function that computes the min, median, and max of a data set x and returns them as a list of three elements.
  • Create some fake data from at least two different distributions and test it out.

Exercise Solution: Functions

mmm  <- function(x)
{
 c(min(x),median(x),max(x))
}

##See  how  these  change  when  you  have  more  samples:
mmm(1/runif (1000))
[1]    1.001720    1.965887 2248.495039
mmm(1/runif (100))
[1]  1.028069  2.216996 66.805299
data  <- runif (100) + 1/rnorm (100)
mmm(data)
[1] -197.8342889    0.2196956  110.9483544

Conditionals

Most common conditional statement is the if statement

##Conditional Branching
if( 0 )
{
  print("This will never print")
}
if(1)
{
  print("But this will")
}
[1] "But this will"
if(runif(1)<.5)
{
  print("less")
}else{
  print("more")
}
[1] "more"

Other conditional logic

  • ifelse operates on an entire vector at once.
  • switch allows multiple conditions
  • which allows selecting matching values from a vector ##this changes lowercase a to uppercase:
  x <- sample(letters[1:5],10,replace=T)
  x2 <-ifelse(x=="a","A",x)
  x[which(x=="a")]<-"A"
  x
 [1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"
  x2
 [1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"

Exercise: Conditionals

Write a function that will take as its first argument a data vector (e.g., something produced by runif(1000)), and as its second argument a keyword which tells the function whether to plot a histogram or a scatterplot.

Exercise Solution: Conditionals

x <- exp(rnorm(1000)*.3)
myplot <- function(x,type="scatter")
{  
  if(type=="scatter")
  {
    plot(x)   ##Plot a regular plot here
  }else if(type=="histogram")
    {
      hist(x) ##Plot a histogram
  }else{
     warning("error")
    }
}

myplot(x,"histogram")

plot of chunk unnamed-chunk-8

myplot(x,"scatter")

plot of chunk unnamed-chunk-8

Exercise: Conditionals 2

Write a new mean function that does not return an error when given a factor. Rather, it returns the modal (most common) value of that factor. Then use that function in the lapply and sapply on x2. Use:

x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))

Exercise Solution: Conditionals 2

x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))
newmean <- function(data)
{
  if(is.factor(data))
  {
    tab <- table(data)
    names(tab)[which.max(tab)]
  } else {
    return(mean(data) )
  }
}

Exercise Solution: Conditionals 2 pt 2

newmean(x2$a)
[1] 0.4977649
newmean(x2$c)
[1] "K"
lapply(x2,newmean)
$a
[1] 0.4977649

$b
[1] 0.4403626

$c
[1] "K"
sapply(x2,newmean)
                  a                   b                   c 
"0.497764854519628" "0.440362612789031"                 "K" 

Looping and Iteration

Looping and iteration are methods for repeating some code or operation many times. Usually, iteration refers to repeating an operation across elements of a data set, and looping is more general

Important methods for this:

  • for()
  • apply()
  • tapply and aggregate

    Methods to avoid unless you know what you are doing

  • while, repeat

  • lapply, sapply

The for loop

This keyword iterates a block of code over a set of values.

  • The for loop is usually the only thing you need
  • a:b defined sequences are usually implicitly defined, so that they don't actually create that set of numbers in memory.
j <- 1
for(i in 1:1000000)
{
  j <- j + runif(1)  
}
print(j)
[1] 500314.9

Two ways to iterate over a vector with for

Version 1

x <- sample(LETTERS)
out <- ""
for(i in 1:length(x))
  out <- paste(out, x[i],sep="")
out
[1] "SHBOKAWGXILPYDMRTEJVFZQUNC"

Version 2

out <- ""
for(i in x)
  out <- paste(out, i,sep="")
out
[1] "SHBOKAWGXILPYDMRTEJVFZQUNC"

ifelse and which

Useful for recoding:

vals <- sample(c("man","WOMAN"),10,replace=T)
coded <- ifelse(vals=="man",1,2)
coded2 <- vals
coded2[which(vals=="man")] <- "MAN"
coded
 [1] 2 2 1 1 2 1 2 2 2 2
coded2
 [1] "WOMAN" "WOMAN" "MAN"   "MAN"   "WOMAN" "MAN"   "WOMAN" "WOMAN"
 [9] "WOMAN" "WOMAN"

Exercise: Looping

Create a series of 1,000,000 letters of the alphabet using

items <- sample(letters,1000000,replace=T)
  • Write functions that will replace all a' values with anA', and b with a B'.
  • Write one that uses an if statement, one that uses ifelse, and one that uses which.
  • Run the function inside a system.time() statement, and see which is the fastest.

Exercise Solution: Looping

  • See R code in walkthrough document.

Questions?

Some topics from discussion

  • Local/Global variables and scoping
  • Usefulness of wrapper functions
  • When you would choose one looping method over another
  • Others?