Gene Expression Data Analysis Suite


Sample Datasets for aiding experimentation and analysis


Click on the following images to download sample dataset for experimentation.  Some of the datasets do not have the column for gene names and/or gene functions.  In some cases, a number of columns from different original microarray data had to be put in one file.  Wherever the size of text file was very large (say 2 MB or higher), they have been zipped into a single .zip file.  All the files are publicly available from the Internet.  Copyrighted data files and images provided, whatsoever, are gratefully acknowledged.  Most of the datasets can be obtained from NCBI’s Gene Expression Omnibus at


Caution: Usually, smaller datasets should be used for experimentation, and on successful completion, work should include large datasets.  Datasets vary between 50 KB to over 2 GB.  Algorithms should be capable of taking input such large datasets and then able to store them in primary memory.  Sometimes more than two copies of the entire dataset are required in the memory due to matrix operations.  They can cause the computer to hang indefinitely.



Sl. No.


Title, organism

size of dataset (genes x samples)




File Extn











Breast Cancer, Human


3226 x 22

It has 3226 genes of 21 patients (with data of a patient repeated twice), pertaining to 7 patients each of BRCA1, BRCA2 and Sporadic cases.

Hedenfalk et al. (2001)


349 KB










Sugarcane, dataset GDS203


3072 x 12

Expression profile of sugarcane plantlet leaves exposed to cold for 0, 3, 6, 12, 24 and 48 hours. Thirty-four cold-induced ESTs identified, of which 23 were novel cold-responsive genes that had not previously been reported

Nucleic Acids Res. 2002 Jan 1;30(1):207-10


224 KB










MIFLC (Mus musculus),

dataset GDS590

1185 x 9

Molecular events of muscle injury examined in 3 month male C57BL/6 extensor digitorum longus muscle (EDL) following lengthening contractions. EDL examined at 6 and 72 hours and compared to non-treated control.

Nucleic Acids Res. 2002 Jan 1;30(1):207-10


77 KB










Arabidopsis thaliana, dataset GDS101

1322 x 18

Examination of gene expression induced by UV-B light and gamma-ray treated plantlets.

Nucleic Acids Res. 2002 Jan 1;30(1):207-10


154 KB










Homo sapiens, dataset GDS71


1440 x 12

Use of protein microarrays to characterize patterns of variation in hundreds of thousands of different proteins and antibodies in clinical or research applications. 

Nucleic Acids Res. 2002 Jan 1;30(1):207-10


117 KB










Mus musculus, dataset GDS346

3168 x 7

Effect of 5-HT2A seratonin receptor agonist (±)2,5-dimethoxy 4-iodoamphetamine (DOI) in somato-sensory cortex. 129Sv mice injected with DOI (2 mg/kg or 10 mg/kg), 1 hour to study serotonergic hallucinogen effect in somato-sensory cortex

Nucleic Acids Res. 2002 Jan 1;30(1):207-10


203 KB










Yeast dataset

6221 x 80

A time series dataset with various parameters such as different cell cycles, sporulation, diauxic shift, etc. recorded at frequent intervals.

Eisen et al. (1998)


844 KB










Helicobacter pyroli dataset

4608 x 4

Case controlled study from Britta Bjorkholm and Lars Engstrand

Bjorkholm et al (2001)


481 KB










Lymphoma, Human

13411 x 40

Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling

Alizadeh  et al (2000)


1117 KB



