5210 Chapter 3 Lecture Notes (Day 4)

Shane Mueller
Sept 13 2018

Programming concepts

There are a few core concepts that give you a lot of capabilities in R

They let you reuse analysis, automate analysis, save time, repeat processes etc.

Functions
Looping
Conditionals

Because R allows using functions on entire data vectors, it often blurs the line between conditionals and iterating.

Functions in R

Functions are defined using the function() function, and the result is assigned to a name.

Don't get function names and variable names confused.
These two are the same thing:

se <- function(x)
{
   stdev <- sd(x)
   se <-  stdev/sqrt(length(x))
   return(se)
}

se <- function(x) {sd(x)/sqrt(length(x))}

Scoping Rules

Anything defined outside the function is available to the function.
What happens in the function stays in the function.
You must return the results in order to see anything you did

x <- 100
doit <- function(){
   x <- x + 1
  return(x)
}
print(doit())

[1] 101

print(x)

[1] 100

Arguments

In R, function arguments are named:

They can have default values.
If name is not specified in function, the are assigned in order.
If argument is not given, the default is used
Argument defaults can be calculated based on other parameters

fn <- function(x, y=NULL,title="My Title")
{
  print(paste(x,y,title))
}

Nameless (lambda) functions

-You don't need to name a function to use it:

set.seed(111)
x2 <- data.frame(a=runif(10),b=runif(10)+.5)

lapply(x2, function(x){mean(x)})

$a
[1] 0.4069604

$b
[1] 0.8926438

Exercise: Functions

Make a function that computes the min, median, and max of a data set x and returns them as a list of three elements.
Create some fake data from at least two different distributions and test it out.

Exercise Solution: Functions

mmm  <- function(x)
{
 c(min(x),median(x),max(x))
}

##See  how  these  change  when  you  have  more  samples:
mmm(1/runif (1000))

[1]    1.001720    1.965887 2248.495039

mmm(1/runif (100))

[1]  1.028069  2.216996 66.805299

data  <- runif (100) + 1/rnorm (100)
mmm(data)

[1] -197.8342889    0.2196956  110.9483544

Conditionals

Most common conditional statement is the if statement

##Conditional Branching
if( 0 )
{
  print("This will never print")
}
if(1)
{
  print("But this will")
}

[1] "But this will"

if(runif(1)<.5)
{
  print("less")
}else{
  print("more")
}

[1] "more"

Other conditional logic

ifelse operates on an entire vector at once.
switch allows multiple conditions
which allows selecting matching values from a vector ##this changes lowercase a to uppercase:

  x <- sample(letters[1:5],10,replace=T)
  x2 <-ifelse(x=="a","A",x)
  x[which(x=="a")]<-"A"
  x

 [1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"

x2

 [1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"

Exercise: Conditionals

Write a function that will take as its first argument a data vector (e.g., something produced by runif(1000)), and as its second argument a keyword which tells the function whether to plot a histogram or a scatterplot.

Exercise Solution: Conditionals

x <- exp(rnorm(1000)*.3)
myplot <- function(x,type="scatter")
{  
  if(type=="scatter")
  {
    plot(x)   ##Plot a regular plot here
  }else if(type=="histogram")
    {
      hist(x) ##Plot a histogram
  }else{
     warning("error")
    }
}

myplot(x,"histogram")

plot of chunk unnamed-chunk-8

myplot(x,"scatter")

plot of chunk unnamed-chunk-8

Exercise: Conditionals 2

Write a new mean function that does not return an error when given a factor. Rather, it returns the modal (most common) value of that factor. Then use that function in the lapply and sapply on x2. Use:

x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))

Exercise Solution: Conditionals 2

x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))
newmean <- function(data)
{
  if(is.factor(data))
  {
    tab <- table(data)
    names(tab)[which.max(tab)]
  } else {
    return(mean(data) )
  }
}

Exercise Solution: Conditionals 2 pt 2

newmean(x2$a)

[1] 0.4977649

newmean(x2$c)

[1] "K"

lapply(x2,newmean)

$a
[1] 0.4977649

$b
[1] 0.4403626

$c
[1] "K"

sapply(x2,newmean)

                  a                   b                   c 
"0.497764854519628" "0.440362612789031"                 "K"

Looping and Iteration

Looping and iteration are methods for repeating some code or operation many times. Usually, iteration refers to repeating an operation across elements of a data set, and looping is more general

Important methods for this:

for()
apply()
tapply and aggregate

Methods to avoid unless you know what you are doing
while, repeat
lapply, sapply

The for loop

This keyword iterates a block of code over a set of values.

The for loop is usually the only thing you need
a:b defined sequences are usually implicitly defined, so that they don't actually create that set of numbers in memory.

j <- 1
for(i in 1:1000000)
{
  j <- j + runif(1)  
}
print(j)

[1] 500314.9

Two ways to iterate over a vector with for

Version 1

x <- sample(LETTERS)
out <- ""
for(i in 1:length(x))
  out <- paste(out, x[i],sep="")
out

[1] "SHBOKAWGXILPYDMRTEJVFZQUNC"

Version 2

out <- ""
for(i in x)
  out <- paste(out, i,sep="")
out

[1] "SHBOKAWGXILPYDMRTEJVFZQUNC"

ifelse and which

Useful for recoding:

vals <- sample(c("man","WOMAN"),10,replace=T)
coded <- ifelse(vals=="man",1,2)
coded2 <- vals
coded2[which(vals=="man")] <- "MAN"
coded

 [1] 2 2 1 1 2 1 2 2 2 2

coded2

 [1] "WOMAN" "WOMAN" "MAN"   "MAN"   "WOMAN" "MAN"   "WOMAN" "WOMAN"
 [9] "WOMAN" "WOMAN"

Exercise: Looping

Create a series of 1,000,000 letters of the alphabet using

items <- sample(letters,1000000,replace=T)

Write functions that will replace all a' values with anA', and b with a B'.
Write one that uses an if statement, one that uses ifelse, and one that uses which.
Run the function inside a system.time() statement, and see which is the fastest.

Exercise Solution: Looping

See R code in walkthrough document.

Questions?

Some topics from discussion

Local/Global variables and scoping
Usefulness of wrapper functions
When you would choose one looping method over another
Others?