PCA with Non-normalized and Normalized Data


Zayn Z
Senior majoring in BME and CS.

PCA with Non-normalized and Normalized Data

With this visualization, we are comparing PCA with non-normalized and normalized data. We are encoding the categorical data, spots, using the geometric primitive of point. We encoded the quantitative data, PC1, using position on the x-axis. We also encoded the quantitaive data, PC2, using position on the y-axis. We created two plots similar plots, one in which PCA was conducted on non-normalized gene expression data and one in which PCA was conducted on normalized gene expression data. Normalization was done by taking the logarithm and using a scale factor of 100. We see that normalization scales the data so it better captures the variance.

library(Rtsne)
library(patchwork)

dim(data)
data[1:10, 1:10]

pos<-data[,2:3]
gexp<-data[,4:ncol(data)]

topgene<-names(sort(apply(gexp, 2, var), decreasing=TRUE)[1:1000])

gexpfilter<-gexp[,topgene]
dim(gexpfilter)

gexpfilternorm <- log10(gexpfilter/rowSums(gexpfilter)*100 + 1)

pcsfilter<-prcomp(gexpfilter)
pcsnorm<-prcomp(gexpfilternorm)

p1<-ggplot(as.data.frame(pcsfilter$x))+geom_point(aes(x=PC1,y=PC2))+ggtitle("Non-normalized Data")

p2<-ggplot(as.data.frame(pcsnorm$x))+geom_point(aes(x=PC1,y=PC2))+ggtitle("Normalized Data")
p1+p2