Shane Mueller
Sept 13 2018
There are a few core concepts that give you a lot of capabilities in R
They let you reuse analysis, automate analysis, save time, repeat processes etc.
Because R allows using functions on entire data vectors, it often blurs the line between conditionals and iterating.
Functions are defined using the function() function, and the result is assigned to a name.
se <- function(x)
{
stdev <- sd(x)
se <- stdev/sqrt(length(x))
return(se)
}
se <- function(x) {sd(x)/sqrt(length(x))}
x <- 100
doit <- function(){
x <- x + 1
return(x)
}
print(doit())
[1] 101
print(x)
[1] 100
In R, function arguments are named:
fn <- function(x, y=NULL,title="My Title")
{
print(paste(x,y,title))
}
-You don't need to name a function to use it:
set.seed(111)
x2 <- data.frame(a=runif(10),b=runif(10)+.5)
lapply(x2, function(x){mean(x)})
$a
[1] 0.4069604
$b
[1] 0.8926438
mmm <- function(x)
{
c(min(x),median(x),max(x))
}
##See how these change when you have more samples:
mmm(1/runif (1000))
[1] 1.001720 1.965887 2248.495039
mmm(1/runif (100))
[1] 1.028069 2.216996 66.805299
data <- runif (100) + 1/rnorm (100)
mmm(data)
[1] -197.8342889 0.2196956 110.9483544
Most common conditional statement is the if statement
##Conditional Branching
if( 0 )
{
print("This will never print")
}
if(1)
{
print("But this will")
}
[1] "But this will"
if(runif(1)<.5)
{
print("less")
}else{
print("more")
}
[1] "more"
x <- sample(letters[1:5],10,replace=T)
x2 <-ifelse(x=="a","A",x)
x[which(x=="a")]<-"A"
x
[1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"
x2
[1] "b" "b" "e" "b" "e" "c" "e" "d" "A" "b"
Write a function that will take as its first argument a data vector (e.g., something produced by runif(1000)), and as its second argument a keyword which tells the function whether to plot a histogram or a scatterplot.
x <- exp(rnorm(1000)*.3)
myplot <- function(x,type="scatter")
{
if(type=="scatter")
{
plot(x) ##Plot a regular plot here
}else if(type=="histogram")
{
hist(x) ##Plot a histogram
}else{
warning("error")
}
}
myplot(x,"histogram")
myplot(x,"scatter")
Write a new mean function that does not
return an error when given a factor. Rather, it returns
the modal (most common) value of that factor. Then use
that function in the lapply
and sapply
on x2
.
Use:
x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))
x2 <- data.frame(a=runif(100),b=runif(100),c= as.factor(sample(LETTERS,100,replace=T)))
newmean <- function(data)
{
if(is.factor(data))
{
tab <- table(data)
names(tab)[which.max(tab)]
} else {
return(mean(data) )
}
}
newmean(x2$a)
[1] 0.4977649
newmean(x2$c)
[1] "K"
lapply(x2,newmean)
$a
[1] 0.4977649
$b
[1] 0.4403626
$c
[1] "K"
sapply(x2,newmean)
a b c
"0.497764854519628" "0.440362612789031" "K"
Looping and iteration are methods for repeating some code or operation many times. Usually, iteration refers to repeating an operation across elements of a data set, and looping is more general
Important methods for this:
tapply and aggregate
Methods to avoid unless you know what you are doing
while, repeat
lapply, sapply
This keyword iterates a block of code over a set of values.
j <- 1
for(i in 1:1000000)
{
j <- j + runif(1)
}
print(j)
[1] 500314.9
Version 1
x <- sample(LETTERS)
out <- ""
for(i in 1:length(x))
out <- paste(out, x[i],sep="")
out
[1] "SHBOKAWGXILPYDMRTEJVFZQUNC"
Version 2
out <- ""
for(i in x)
out <- paste(out, i,sep="")
out
[1] "SHBOKAWGXILPYDMRTEJVFZQUNC"
Useful for recoding:
vals <- sample(c("man","WOMAN"),10,replace=T)
coded <- ifelse(vals=="man",1,2)
coded2 <- vals
coded2[which(vals=="man")] <- "MAN"
coded
[1] 2 2 1 1 2 1 2 2 2 2
coded2
[1] "WOMAN" "WOMAN" "MAN" "MAN" "WOMAN" "MAN" "WOMAN" "WOMAN"
[9] "WOMAN" "WOMAN"
Create a series of 1,000,000 letters of the alphabet using
items <- sample(letters,1000000,replace=T)
a' values with an
A', and b with a B'. if
statement, one that uses ifelse
,
and one that uses which
.Some topics from discussion