Shane Mueller
Sept 11 2018
data <- read.table("c5data.txt")
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
1 15 1 0 6668 0 652 300 653 300 1 0 0 1.00000 1.00000
2 15 1 1 6701 33 652 306 653 300 1 33 33 6.08276 7.08276
3 15 1 2 6732 64 652 313 653 300 1 31 64 13.03840 20.12120
4 15 1 3 6771 103 652 321 653 301 1 39 103 20.02500 40.14620
5 15 1 4 6797 129 651 327 653 301 0 26 103 26.07680 66.22300
6 15 1 5 6820 152 650 332 653 301 0 23 103 31.14480 97.36780
Read in data file c5data.txt from menu, after setting the working directory. Then, copy the generated command into an .R file, and load it directly from there.
-Understand how to change the headers before, during, and after reading them in.
dat <- matrix ( runif (1000) ,100 ,10)
colnames(dat) <- letters [1:10]
write.csv(dat , "random.csv" )
newdat <- read.csv( "random.csv" )
There are a number of ways to look at an object and see what how it is stored:
'data.frame': 31 obs. of 3 variables:
$ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
$ Height: num 70 65 63 72 81 83 66 75 80 75 ...
$ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
[1] "Girth" "Height" "Volume"
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
[24] 24 25 26 27 28 29 30 31
[1] "data.frame"
Girth Height Volume
Min. : 8.30 Min. :63 Min. :10.20
1st Qu.:11.05 1st Qu.:72 1st Qu.:19.40
Median :12.90 Median :76 Median :24.20
Mean :13.25 Mean :76 Mean :30.17
3rd Qu.:15.25 3rd Qu.:80 3rd Qu.:37.30
Max. :20.60 Max. :87 Max. :77.00
Sorting is useful, and the built-in sort function will do this for a vector:
[1] 0.02291064 0.15750374 0.22634794 0.24906685 0.37653700 0.61181822
[7] 0.74620890 0.85396900 0.88831001 0.93120421
This won't work for data frames, where you may want to sort a frame by the values of one column. Use order
for this (not to be confused with the similar rank
ord <- order(trees$Height)
[1] 3 20 2 7 14 1 19 4 24 16 23 8 10 15 12 13 25 21 11 9 22 28 29
[24] 30 5 26 27 6 17 18 31
This indicates the indexes in order from least to greatest. If you use the subset operation, it will reorder that vector or data frame in that order
[1] 63 64 65 66 69 70 71 72 72 74 74 75 75 75 76 76 77 78 79 80 80 80 80
[24] 80 81 81 82 83 85 86 87
Girth Height Volume
3 8.8 63 10.2
20 13.8 64 24.9
2 8.6 65 10.3
7 11.0 66 15.6
14 11.7 69 21.3
1 8.3 70 10.3
The type argument of plot allows you to plot points connected by lines, using the type=”b” argument. First, plot tree height by volume in its original order, connecting adjacent values, using the type=“b” argument. Then re-sort them by tree height and re-plot. Finally, re-sort them in a random order, and re-plot.
data ( trees )
par ( mfrow = c (1 ,3) )
plot (trees$Volume , trees$Height , type = "b" )
ord <- order ( trees$Height )
plot ( trees$Volume [ ord ] , trees$Height [ ord ] , type = "b" )
ord <- sample (1: nrow(trees) )
plot ( trees$Volume[ ord ] , trees$Height[ ord ] , type = "b" )
party <- c("R","R","D","R","R","D","D","D","R","R","D")
gender <- c("M","M","F","F","F","F","M","M","F","M","M")
vote <- c("A","B","A","A","A","B","A","A","B","B","A")
survey <- data.frame(party,gender,vote)
##look at values of each variable:
5 6
F 3 2
M 4 2
These work by dividing data by levels of one or more categorical variables, applying a function to each group, and returning a data structure that recombines these values
set.seed(111); x <- rnorm(500) ##generate random numbers
y <- x + runif(500,-.3,.3) ##related random numbers
dat3 <- data.frame(x=x,y=y) ##create a data frame
dat3$factor <- factor(round(dat3$x/10,1)*10) ##make bins from x
dat3$group <- sample(c("A","B"),500,replace=T)
dat3.agg <- aggregate(dat3$y,list(bin=dat3$factor),mean)
##aggregate x by the same bins:
dat3.agg$xvals <- aggregate(dat3$x,list(bin=dat3$factor),mean)$x
bin x xvals
1 -3 -2.9694215 -3.00802536
2 -2 -1.8622799 -1.84992413
3 -1 -0.8908935 -0.89592586
4 0 0.0382442 0.04730851
5 1 0.9096850 0.91197362
6 2 1.7929266 1.82510517
7 3 2.6943167 2.69485129
tapply works like aggregate, but produces a different output. ##use tapply to aggregate y by the bins: <- tapply(dat3$y,list(bin=dat3$factor),mean)
-3 -2 -1 0 1 2
-2.9694215 -1.8622799 -0.8908935 0.0382442 0.9096850 1.7929266
bin A B
-3 -3.1106901 -2.75751854
-2 -1.6859730 -1.92360406
-1 -0.8322358 -0.95914982
0 0.0284143 0.04672135
1 0.8842331 0.93338158
2 1.8288174 1.76002681
3 2.7074811 2.61533043
apply applies a function to the row (1) or column (2) of a matrix or data frame
m <- matrix(runif(28),7,4)
#Find minimum in each column
apply(m,2,min) ##By column
[1] 0.05638315 0.17026205 0.20461216 0.17142021
#find maximum in each column:
apply(m,1,max) ##By row
[1] 0.7625511 0.6690217 0.7489722 0.6249965 0.8821655 0.7703016 0.8819536
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
Follow along in walkthrough