These are notes for STAT 224: Applied Regression Analysis, taught by James Dignam.

## Lecture 3

- Inference tests test the importance of the predictor

## Lecture 4

- Can do all the regular things (confidence intervals, etc) to $\beta$s
- $R^2$ is a measure of explained variation

## Lecture 5

- MLR is regression with many predictors
- Coefficients are one unit change holding all other predictors constant

## Lecture 6

- F-Test tests whether all the coefficients are simultaneously 0
- Standardized residuals residuals have mean 0 and variance 1 and are dependent also
- Internally studentized residuals are approx. normal and have mean 0 and variance 1

## Lecture 7

- Regression relies on assumptions about the model’s form and about the errors
- Mean of response is a linear function of predictors (linearity)
- Errors need to be normally distributed
- Errors need to have mean of 0
- Errors and resposne have constant variance over predictors
- Errors are independent of each other

## Lecture 8

- Ordinal and nominal categorical variables
- Represented in regression by dummy/indicator variables

- Different categories need to be mutually exclusive
- Omitted category is the base/control/reference category
- Don’t keep interactions in the model without the corresponding main effects in the model (talk about effects of both, they go together now)

## Lecture 9

- Confounder is variable related to the predictor of interest and causally related to outcome but not in the causal pathway
- Temporal ordering necessary in causal relationships
- Effect modifiers are factors related to both predictors and response (they modify the strength of the association)
- Control of confounding is key in explanatory modeling
- positive confounders overestimates the effect and vice versa

## Lecture 10

- When linear regression assumptions violated, tranform
- When transforming, all estimates and confidence intervals are expressed in that scale

- Different violations call for different transformations
- non-normality is the least serious violation
- log transformation works when process/relationship is on a multiplicative scale
- square root form can have more stable variance over values
- Box-Cox Transformation is power transformation (find lambda)
- Rank order preserving
- $\lambda = -1$ is an inverse transform etc.

## Lecture 11

- Logit transform percentages ($\text{logit}(p)) = \log\frac{p}{1-p}$)
- Reasons to transform:
- Unequal variance in the response variable
- Response variable not normally distributed (distribution the variable comes from has variance related to the mean)

- Weighted Lease Squares (WLS) permits differential influence of data points
- Autocorrelation is when error terms are correlated
- Still apparent randomness among the residuals, but some kind of serial pattern that suggests lack of independence (implies heteroscedasticity)
- Same observations in different time periods
- Spatially close observations
- Experiments run in batches

- Durbin-Watson Test detects serial correlation
- Cochrane-Orcutt transformation corrects for autocorrelated errors

## Lecture 12

- Multicollinearity is when some or all predictors in a model are correlated with each other
- Not an error; arises from the lack of independent information about predictor variables in the dataset
- Threatens interpretation that coefficient is the incrase in the mean of Y for one-unit increase in X

- How much can we permit, and what do we do if there’s too much?
- All statistically independent means predictors are
*orthogonal* - If collinearity is strong and ignored, all CIs become larger and $\beta$s become unstable (change substantially when other variables are added/removed)
- Less of a concern for predictions; more of a concern for explaining a phenomenon

- All statistically independent means predictors are
- Signs of multicollinearity:
- F test is significant but all individual t-tests are nonsignificant
- A $\beta$ has opposite sign than expected
- General instability in estimates
- Large standard errors

- Variance Inflation Factor (VIF), where J is a predictor $$\text{VIF} = \frac{1}{1-R_j^2}$$
- Greater than 10 is multicollinear by convention

- What to do:
- Get more / better data
- Omit redundant variables
- Constrained regression
- Principal Components

## Lecture 13

- Variable selection:
- forward selection (add predictor with highest correlation with Y, then add highest partial correlation),
- backward selection (begin with all predictors, remove one with smallest t statistic), stepwise selection (add predictor as in forward selection, consider omitting based on backward),
- common approach (choose significant variables, put in MLR and remove non-significant, repeat until no more removing criteria)

- Residual Mean Square is related to $R^2$ [\text{RMS} = \frac{\text{SSE}}{n - p}]
- Use AIC and BIC to see if models are nested or not (those metrics balance information extracted from the data and number of parameters)

## Lecture 14

- Logistic regression is when response variable $Y$ is binary discrete variable (taking values 0 or 1)
- The model predicts the logit of the response as a linear function of predictors

- Risk difference is the difference between one group and the other, as a proportion of the total
- Odds ratio is ratio of odds of event in each exposure group
- Difference of log odds is basic effect measure (same as log odds ratio)
- Use MLE because error structure is different
- Null hypothesis is independence vs. association

## Lecture 15

- Even in logistic regression, each covariate is an
**adjusted effect**meaning hold other predictors constant - Coefficients are log odds ($E^\beta_1$ is odds ratio)
- Multiple logistic regression analysis:
- Likelihood Ratio – like F-Test for linear model, surfaces significant predictors
- LR has $\chi^2$ distribution with DF = # of model parameters
- As in SLR, redundant with test of only one predictor
- Hosmer-Lemeshow test measure goodness-of-fit (can id systemic variation that is not explained)
- AIC and BIC can also be used
- ROC curves plots sensitivity vs false positive rate (perfect prediction has area under curve of 1, best classifier is in upper left corner)
- Can tell you what classification threshold to use (w. validation in independent samples)

## Lecture 16

- Generalized linear models are a framework for unifying theory and estimation models (need different: response, link function, error term, model)
- Link function addresses how linear predictor $\chi \beta$ relates to $E(Y)$

- Poisson Regression (models count variables i.e. can’t be negative, as outcome)
- Assumption: conditional on the predictors the conditional mean and variance of the outcome are equal
- Don’t use when variance exceeds the mean (overdispersed Poisson random variable) or too many zeros (zero-inflated Poisson)
- Use goodness-of-fit test to determine whether the model form fits data