CompBioMethods: Generalized Linear Mixed Effect Models

Tamas Schauer

Microscopy Data

cells were stained for centromeric marker
outcome: number of centromeric clusters (counts)
conditions: Target vs. Control RNAi
replicate: experimental day
4 microscopy slides in total
cells are not independent on the same slide


# insert data
dat <- data.frame(experiment = rep(c("day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day2", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1", "day1"),2),
                 RNAi = rep(c("Control", "Target"), each = 100),
                 counts = c(6, 6, 4, 6, 3, 5, 10, 9, 5, 4, 8, 4, 9, 7, 7, 2, 6, 9, 4, 3, 5, 4, 4, 2, 8, 4, 5, 6, 3, 7, 7, 4, 6, 3, 3, 3, 7, 5, 5, 4, 5, 5, 5, 6, 5, 3, 3, 4, 5, 5, 2, 3, 4, 4, 4, 7, 5, 4, 6, 4, 5, 4, 4, 5, 4, 4, 5, 2, 5, 4, 5, 4, 6, 10, 5, 6, 4, 11, 3, 13, 3, 6, 3, 5, 5, 4, 4, 4, 4, 6, 8, 6, 4, 11, 5, 4, 5, 1, 2, 4, 16, 6, 9, 7, 11, 12, 6, 3, 7, 8, 8, 12, 7, 9, 5, 9, 5, 10, 7, 6, 7, 19, 3, 13, 7, 6, 13, 8, 8, 11, 7, 13, 10, 6, 4, 13, 3, 4, 2, 10, 6, 8, 6, 9, 9, 5, 4, 20, 2, 5, 7, 5, 10, 4, 7, 5, 6, 3, 4, 7, 9, 8, 16, 6, 10, 8, 5, 6, 4, 11, 6, 2, 6, 8, 11, 6, 5, 8, 8, 7, 10, 9, 7, 4, 6, 8, 2, 10, 7, 8, 3, 4, 10, 13, 6, 12, 8, 1, 5, 6))

# group for each slide
dat$slide <- factor(paste(dat$RNAi, dat$experiment, sep = "_"))

# relevel RNAi
dat$RNAi <- relevel(dat$RNAi, ref = "Control")

# check factor levels
str(dat)


'data.frame':   200 obs. of  4 variables:
 $ experiment: Factor w/ 2 levels "day1","day2": 2 2 2 2 2 2 2 2 2 2 ...
 $ RNAi      : Factor w/ 2 levels "Control","Target": 1 1 1 1 1 1 1 1 1 1 ...
 $ counts    : num  6 6 4 6 3 5 10 9 5 4 ...
 $ slide     : Factor w/ 4 levels "Control_day1",..: 2 2 2 2 2 2 2 2 2 2 ...


# simple summary
summary(dat)


 experiment      RNAi         counts                slide   
 day1:100   Control:100   Min.   : 1.000   Control_day1:50  
 day2:100   Target :100   1st Qu.: 4.000   Control_day2:50  
                          Median : 6.000   Target_day1 :50  
                          Mean   : 6.275   Target_day2 :50  
                          3rd Qu.: 8.000                    
                          Max.   :20.000

Data Visualization

visualization of the datapoints (beeswarm) and boxplot
effect looks ‘clear’ by eye
it does not look normally distributed (also tied values)
variance does not look homogeneous


library(beeswarm)
boxplot(counts ~ slide, data = dat, ylim = c(0,20))
beeswarm(counts ~ slide, data = dat, pch=19, col="#00000088", add = TRUE)

Data Distribution

inspecting data distribution by qqplot
Poisson and negative binomial looks better


library(car)
library(MASS)

par(mfrow = c(1,4), cex=1.1)
attach(dat)

# normal distribution (it does not look "normal")
qqp(counts, "norm", pch = 20)

# lognormal distribution
qqp(counts, "lnorm", pch = 20)

# Poisson distribution
poisson <- fitdistr(counts, "Poisson")
qqp(counts, "pois", lambda = poisson$estimate, pch = 20)

# negative binomial distribution (Poisson + overdispersion)
negbinom <- fitdistr(counts, "negative binomial")
qqp(counts, "nbinom", size = negbinom$estimate[1], mu = negbinom$estimate[2], pch = 20)

Mixed Effect models

the model contains fixed and random effects
beta - vector of fixed effects
u - vector of random effects
formula:

Wikipedia Links:

https://en.wikipedia.org/wiki/Multilevel_model

https://en.wikipedia.org/wiki/Mixed_model

general scheme how to choose fixed and mixed effect models

Source: http://www.flutterbys.com.au/stats/tut/tut11.2a.html

MASS glmmPQL

fit a glmm
fixed effect: RNAi (treatment)
random intercept: slide (groups)
negative binomial family


# fit a fixed effect glm for estimating theta (dispersion) parameter
glm_fixed <- glm.nb(counts ~ RNAi, data = dat)
theta <- glm_fixed$theta

# use theta in glmm
glmm <- glmmPQL(counts ~ RNAi, random = ~1|slide, data = dat, 
                family = negative.binomial(link = "log", theta = theta))
summary(glmm)


Linear mixed-effects model fit by maximum likelihood
 Data: dat 
  AIC BIC logLik
   NA  NA     NA

Random effects:
 Formula: ~1 | slide
        (Intercept)  Residual
StdDev: 0.009718551 0.9935715

Variance function:
 Structure: fixed weights
 Formula: ~invwt 
Fixed effects: counts ~ RNAi 
                Value  Std.Error  DF  t-value p-value
(Intercept) 1.6174061 0.04919156 196 32.87975  0.0000
RNAiTarget  0.3988294 0.06472175   2  6.16222  0.0253
 Correlation: 
           (Intr)
RNAiTarget -0.76 

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.0948686 -0.4920140 -0.1595450  0.3940259  4.0237412 

Number of Observations: 200
Number of Groups: 4

get means by exponentiation


library(nlme)
as.numeric(c(exp(fixed.effects(glmm))[1], 
             exp(fixed.effects(glmm))[1] * exp(fixed.effects(glmm))[2]))


[1] 5.04 7.51

Conclusion: Target RNAi significantly increases the number of centromeric clusters
Mean difference: ~2.5, p-value: ~0.03, n of experiments: 2, n of cells: 200

Subsampling

check whether number of observations (cells) influence results
subsample (half) and replicate 100x for obtaining p-values


pvals <- replicate(n = 100, expr = {
              
dat_sub <- dat[sample(1:200, size = 100),]

glmm_sub <- glmmPQL(counts ~ RNAi, random = ~1|slide, data = dat_sub, 
                    family = negative.binomial(link = "log", theta = theta),
                    verbose = F)

coef(summary(glmm_sub))[2,5]})

summary(pvals)


   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.02691 0.04282 0.05253 0.05698 0.06512 0.13585

Conclusion: the number of cells has an influence but not so dramatic
it would be interesting to see how more replicates (exp. days) would change results

Detailed Tutorial

Many thanks to these pages:

http://www.flutterbys.com.au/stats/tut/tut9.1.html

http://www.flutterbys.com.au/stats/tut/tut11.2a.html

Data Source

The dataset was provided by Natalia Kochanova (LMU, BMC, Molecular Biology, Imhof group)

Generalized Linear Mixed Effect Models

Table of Contents

Microscopy Data

Data Visualization

Data Distribution

Mixed Effect models

MASS glmmPQL

Subsampling

Detailed Tutorial

Data Source

Corrections

Citation