PCA with Non-normalized and Normalized Data

Senior majoring in BME and CS.

With this visualization, we are comparing PCA with non-normalized and normalized data. We are encoding the categorical data, spots, using the geometric primitive of point. We encoded the quantitative data, PC1, using position on the x-axis. We also encoded the quantitaive data, PC2, using position on the y-axis. We created two plots similar plots, one in which PCA was conducted on non-normalized gene expression data and one in which PCA was conducted on normalized gene expression data. Normalization was done by taking the logarithm and using a scale factor of 100. We see that normalization scales the data so it better captures the variance.

library(Rtsne)
library(patchwork)

dim(data)
data[1:10, 1:10]

pos<-data[,2:3]
gexp<-data[,4:ncol(data)]

topgene<-names(sort(apply(gexp, 2, var), decreasing=TRUE)[1:1000])

gexpfilter<-gexp[,topgene]
dim(gexpfilter)

gexpfilternorm <- log10(gexpfilter/rowSums(gexpfilter)*100 + 1)

pcsfilter<-prcomp(gexpfilter)
pcsnorm<-prcomp(gexpfilternorm)

p1<-ggplot(as.data.frame(pcsfilter$x))+geom_point(aes(x=PC1,y=PC2))+ggtitle("Non-normalized Data")

p2<-ggplot(as.data.frame(pcsnorm$x))+geom_point(aes(x=PC1,y=PC2))+ggtitle("Normalized Data")
p1+p2

05 Feb 2024

« Comparison Between Normalized and Non-Normalized Gene Expression Prior to Dimensional Reduction Analysis of most expressed gene (POSTN): PCA, t-SNE, and UMAP Plots »