---
title: "Generalized Additive Models (GAMs)"
author: "Shane T. Mueller shanem@mtu.edu"
date: "`r Sys.Date()`"
output:
  pdf_document: default
  html_document:
    
    df_print: paged
  rmdformats::readthedown:
    gallery: yes
    highlight: kate
    self_contained: no
  word_document:
    reference_docx: ../template.docx
always_allow_html: yes
---


```{r knitr_init, echo=FALSE, cache=FALSE}
library(knitr)
library(rmdformats)

## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
	             cache=TRUE,
               prompt=FALSE,
               tidy=TRUE,
               comment=NA,
               message=FALSE,
               warning=FALSE)
opts_knit$set(width=75)
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


# Generalized Additive Models (GAMs)
Generalized Additive Models (GAMs) are an advance over glms that allow you to integrate and combine transformations of the input variables, including things like lowess smoothing.  The basic approach is framed similar to a glm, but we can specify transforms within the syntax of the model, these transforms get applied optimally to predict the outcome, and then look at the transformed relationship as part of the model.

A GAM is essentially a regression model, but the gam library permits glms and mixed effects models as well. A binomial glm is logistic regression and essentially a classifier, so it is easy to generalize.

GAMs have been around since the 1990s, but have recently come into resurgence as a means  of developing more interpretable models.  The flexibility of having numerous input transforms gives them a lot of potential predictive power that simpler regression models do not have, but the ability to trace the link makes them easier to understand than something like SVMs.


### Libraries used:

* ```gam```


Let's first look at the iphone data set. The gam library works best with numeric predictors and outcomes, so we will do some transformations first:

```{r,fig.width=8,fig.height=6}
library(gam)
iphone <- read.csv("data_study1.csv")
values <- iphone[,-1]
values$phone <- 0+(iphone$Smartphone=="iPhone") ## Convert to a 0/1 number
values$Gender <- 0+(values$Gender=="female")  #convert to a number
```


# A basic GAM model

A simple GAM looks basically like a glm. We can see that initially, the model gets 69% accuracy, which is about as good as any of the classifiers we developed were (of course, we would worry about cross-validation).  One thing the gam library does is plot the relationship between each predictor and the outcome. In this case, they are all linear predictors.  This visualization is important because it lets us assess individual relationships.


```{r,fig.width=10,fig.height=8}
library(DAAG)
model <- gam(phone~.,data=values,family="binomial",trace=T)
confusion(predict(model)>0,iphone$Smartphone)

predict(model,type="terms")
par(mfrow=c(2,2))
plot(model,se=TRUE)
```

```{r}
summary(model)
```
The ```summary()```  function conducts a basic ANOVA table.  Based  on this, we might want to select a smaller set of variables


```{r,fig.width=10,fig.height=8}
model2 <- gam(phone~Age+
                Honesty.Humility+Extraversion+Emotionality+
                Avoidance.Similarity+Phone.as.status.object+Social.Economic.Status,
              data=values,family="binomial",trace=T)
confusion(predict(model2)>0,iphone$Smartphone)

par(mfrow=c(2,2))
plot(model2,se=TRUE)
summary(model2)
```

Accuracy suffers for the smaller model a little bit, but hopefully we are going to have a more general model that does not overfit.

# Incorporating transforms into a GAM


The advantage of a GAM is that we can specify various transforms within the predictor set, such as poly(), log(), and s()--which is a loess-style smoothing function.  s(variable,n) will permit a smoothing function with 4 degrees of freedom--sort of a polynomial of the 4th order.  Let's try this with two of the strongest predictors. We will add a 4th-order smoothed function of age, and a polynomial of phone-as-status, and a smoothed function of SES, which we might suspect could be non-linear.


```{r,fig.width=10,fig.height=8}

model3 <- gam(phone~s(Age,4)+
                Honesty.Humility+Extraversion+Emotionality+
                Avoidance.Similarity+poly(Phone.as.status.object,4)+
                s(Social.Economic.Status,5),
              data=values,family="binomial",trace=T)
confusion(predict(model3)>0,iphone$Smartphone)

par(mfrow=c(2,2))
plot(model3,se=TRUE)
summary(model3)
anova(model2,  model3) ##this compares the two models

```
We can see the linear and non-linear effects of age separately.  By adding some  transformed variables, it looks like we improve the fit somewhat. Furthermore, we can see the relationships between the input variables and the output as the impact the prediction through the transform. Notice that age seems to have two 'humps': one around age 20 and one around age 50. This sort of tells us that age impacts iphone ownership such that younger people buy them for status, and older people buy them maybe because they have more money or they feel they are better/simpler, and the 25-40 are stuck in a too-poor-and-don't-care zone. 


The explicit model comparison with ANOVA is possible because we have specified nested models. Here, the more complex model is significantly better--this is saying the additional transforms are worth it.


# Stepwise variable and transform selection

We can use a special step function to explore tradeoffs of different transforms.  The scope list gives all the possible transforms to consider, and it picks the version or versions of each transform that is the best.


```{r,fig.width=10,fig.height=8}
step.model <- step.Gam(model3,
                       scope=list("Age"=~1+Age+s(Age,1)+s(Age,2)+s(Age,3) + log(Age),
                                  "Phone.as.status.object"=~1+s(Phone.as.status.object) 
                                                            +s(Phone.as.status.object,3)+log(Phone.as.status.object),
                                  "Social.Economic.Status"=~1+s(Social.Economic.Status,3)),
                       direction="both",
                       trace=T
                       )
print(step.model)
summary(step.model)

par(mfrow=c(2,2))
plot(step.model,se=TRUE)
confusion(predict(step.model)>0,iphone$Smartphone)

```

Notice the ANOVA separates out the parametric and non-parametric predictors because they need to be tested differently.  Overall, the prediction accuracy doesn't change much by including these higher-order transforms, but it tells us something about the relationship that may be informative. If we find non-linearities, it can reveal important things about the causal relationship that we would never see if we  just assumed all relationships are linear.