⌘+C
<- readRDS(url("https://s3.amazonaws.com/pbreheny-data-sets/whoari.rds"))
dt attach(dt)
<- glmnet(X, y) fit
Jihong Zhang
February 19, 2019
More details please refer to the link below: (https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html#lin)
This post shows how to use glmnet
package to fit lasso regression and how to visualize the output. The description of data is shown in here.
The summary table below shows from left to right the number of nonzero coefficients (DF), the percent (of null) deviance explained (%dev) and the value of \lambda (Lambda
).
We can get the actual coefficients at a specific \lambda whin the range of sequence:
Also, it can allow people to make predictions at specific \lambda with new input data:
cv.glmnet
is the function to do cross-validation here.
Plotting the object gives the selected \lambda and corresponding Mean-Square Error.
We can view the selected \lambda’s and the corresponding coefficients, For example,
lambda.min
returns the value of \lambda that gives minimum mean cross-validated error. The other \lambda saved is lambda.lse
, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace lambda.min
with lambda.lse
above.
# create a function to transform coefficient of glmnet and cvglmnet to data.frame
coeff2dt <- function(fitobject, s) {
coeffs <- coef(fitobject, s)
coeffs.dt <- data.frame(name = coeffs@Dimnames[[1]][coeffs@i + 1], coefficient = coeffs@x)
# reorder the variables in term of coefficients
return(coeffs.dt[order(coeffs.dt$coefficient, decreasing = T),])
}
coeff2dt(fitobject = cv.fit, s = "lambda.min") %>% head(20)
coeffs.table <- coeff2dt(fitobject = cv.fit, s = "lambda.min")
ggplot(data = coeffs.table) +
geom_col(aes(x = name, y = coefficient, fill = {coefficient > 0})) +
xlab(label = "") +
ggtitle(expression(paste("Lasso Coefficients with ", lambda, " = 0.0275"))) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none")
As an example, we can set \alpha=0.2
According to the default internal settings, the computations stop if either the fractional change in deviance down the path is less than 10^{-5} or the fraction of explained deviance reaches 0.999.
---
title: Lasso Regression Example using glmnet package in R
author: Jihong Zhang
date: '2019-02-19'
categories:
- R
- manual
tags:
- Lasso
- glmnet
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = FALSE)
```
More details please refer to the link below: (<https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html#lin>)
This post shows how to use `glmnet` package to fit lasso regression and how to visualize the output. The description of data is shown in [here](http://myweb.uiowa.edu/pbreheny/data/whoari.html).
```{r packages, message=FALSE, warning=FALSE, include=FALSE, paged.print=FALSE}
library(glmnet)
library(tidyr)
library(dplyr)
library(ggplot2)
```
```{r pressure}
dt <- readRDS(url("https://s3.amazonaws.com/pbreheny-data-sets/whoari.rds"))
attach(dt)
fit <- glmnet(X, y)
```
## Visualize the coefficients
```{r}
plot(fit)
```
### Label the path
```{r}
plot(fit, label = TRUE)
```
The summary table below shows from left to right the number of nonzero coefficients (DF), the percent (of null) deviance explained (%dev) and the value of $\lambda$ (`Lambda`).
We can get the actual coefficients at a specific $\lambda$ whin the range of sequence:
```{r}
coeffs <- coef(fit, s = 0.1)
coeffs.dt <- data.frame(name = coeffs@Dimnames[[1]][coeffs@i + 1], coefficient = coeffs@x)
# reorder the variables in term of coefficients
coeffs.dt[order(coeffs.dt$coefficient, decreasing = T),]
```
Also, it can allow people to make predictions at specific $\lambda$ with new input data:
```{r}
nx = matrix(rnorm(nrow(dt$X)*ncol(dt$X)), nrow = nrow(dt$X), ncol = ncol(dt$X))
pred <- predict(fit, newx = nx, s = c(0.1, 0.05))
head(pred, 20)
```
`cv.glmnet` is the function to do cross-validation here.
```{r}
X <- dt$X
y <- dt$y
cv.fit <- cv.glmnet(X, y)
```
Plotting the object gives the selected $\lambda$ and corresponding Mean-Square Error.
```{r}
plot(cv.fit)
```
We can view the selected $\lambda$'s and the corresponding coefficients, For example,
```{r}
cv.fit$lambda.min
cv.fit$lambda.1se
```
`lambda.min` returns the value of $\lambda$ that gives minimum mean cross-validated error. The other $\lambda$ saved is `lambda.lse`, which gives the most regularized model such that error is within one standard error of the minimum. To use that, we only need to replace `lambda.min` with `lambda.lse` above.
```{r}
# create a function to transform coefficient of glmnet and cvglmnet to data.frame
coeff2dt <- function(fitobject, s) {
coeffs <- coef(fitobject, s)
coeffs.dt <- data.frame(name = coeffs@Dimnames[[1]][coeffs@i + 1], coefficient = coeffs@x)
# reorder the variables in term of coefficients
return(coeffs.dt[order(coeffs.dt$coefficient, decreasing = T),])
}
coeff2dt(fitobject = cv.fit, s = "lambda.min") %>% head(20)
```
```{r}
coeffs.table <- coeff2dt(fitobject = cv.fit, s = "lambda.min")
ggplot(data = coeffs.table) +
geom_col(aes(x = name, y = coefficient, fill = {coefficient > 0})) +
xlab(label = "") +
ggtitle(expression(paste("Lasso Coefficients with ", lambda, " = 0.0275"))) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none")
```
## Elastic net
As an example, we can set $\alpha=0.2$
```{r}
fit2 <- glmnet(X, y, alpha = 0.2, weights = c(rep(1, 716), rep(2, 100)), nlambda = 20)
print(fit2, digits = 3)
```
According to the default internal settings, the computations stop if either the fractional change in deviance down the path is less than $10^{-5}$ or the fraction of explained deviance reaches 0.999.
```{r}
plot(fit2, xvar = "lambda", label = TRUE)
# plot against %deviance
plot(fit2, xvar = "dev", label = TRUE)
```
```{r}
predict(fit2, newx = X[1:5, ], type = "response", s = 0.03)
```