Gene Expression Data Analysis Suite


 

Sample Datasets for aiding experimentation and analysis

 

Click on the following images to download sample dataset for experimentation.  Some of the datasets do not have the column for gene names and/or gene functions.  In some cases, a number of columns from different original microarray data had to be put in one file.  Wherever the size of text file was very large (say 2 MB or higher), they have been zipped into a single .zip file.  All the files are publicly available from the Internet.  Copyrighted data files and images provided, whatsoever, are gratefully acknowledged.  Most of the datasets can be obtained from NCBI’s Gene Expression Omnibus at http://www.ncbi.nlm.nih.gov/geo.

 

Caution: Usually, smaller datasets should be used for experimentation, and on successful completion, work should include large datasets.  Datasets vary between 50 KB to over 2 GB.  Algorithms should be capable of taking input such large datasets and then able to store them in primary memory.  Sometimes more than two copies of the entire dataset are required in the memory due to matrix operations.  They can cause the computer to hang indefinitely.

 

 


Sl. No.

Image

Title, organism

size of dataset (genes x samples)

Description

Reference

 

File Extn

Size

 

 

 

 

 

 

 

 

1.

Breast Cancer, Human

 

3226 x 22

It has 3226 genes of 21 patients (with data of a patient repeated twice), pertaining to 7 patients each of BRCA1, BRCA2 and Sporadic cases.

Hedenfalk et al. (2001)

TXT

349 KB

 

 

 

 

 

 

 

 

2.

Sugarcane, dataset GDS203

 

3072 x 12

Expression profile of sugarcane plantlet leaves exposed to cold for 0, 3, 6, 12, 24 and 48 hours. Thirty-four cold-induced ESTs identified, of which 23 were novel cold-responsive genes that had not previously been reported

Nucleic Acids Res. 2002 Jan 1;30(1):207-10

TXT

224 KB

 

 

 

 

 

 

 

 

3.

MIFLC (Mus musculus),

dataset GDS590

1185 x 9

Molecular events of muscle injury examined in 3 month male C57BL/6 extensor digitorum longus muscle (EDL) following lengthening contractions. EDL examined at 6 and 72 hours and compared to non-treated control.

Nucleic Acids Res. 2002 Jan 1;30(1):207-10

TXT

77 KB

 

 

 

 

 

 

 

 

4.

Arabidopsis thaliana, dataset GDS101

1322 x 18

Examination of gene expression induced by UV-B light and gamma-ray treated plantlets.

Nucleic Acids Res. 2002 Jan 1;30(1):207-10

TXT

154 KB

 

 

 

 

 

 

 

 

5.

Homo sapiens, dataset GDS71

 

1440 x 12

Use of protein microarrays to characterize patterns of variation in hundreds of thousands of different proteins and antibodies in clinical or research applications. 

Nucleic Acids Res. 2002 Jan 1;30(1):207-10

TXT

117 KB

 

 

 

 

 

 

 

 

6.

Mus musculus, dataset GDS346

3168 x 7

Effect of 5-HT2A seratonin receptor agonist (±)2,5-dimethoxy 4-iodoamphetamine (DOI) in somato-sensory cortex. 129Sv mice injected with DOI (2 mg/kg or 10 mg/kg), 1 hour to study serotonergic hallucinogen effect in somato-sensory cortex

Nucleic Acids Res. 2002 Jan 1;30(1):207-10

TXT

203 KB

 

 

 

 

 

 

 

 

7.

Yeast dataset

6221 x 80

A time series dataset with various parameters such as different cell cycles, sporulation, diauxic shift, etc. recorded at frequent intervals.

Eisen et al. (1998)

ZIP

844 KB

 

 

 

 

 

 

 

 

8.

Helicobacter pyroli dataset

4608 x 4

Case controlled study from Britta Bjorkholm and Lars Engstrand

Bjorkholm et al (2001)

XLS

481 KB

 

 

 

 

 

 

 

 

9.

Lymphoma, Human

13411 x 40

Distinct Types of Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling

Alizadeh  et al (2000)

ZIP

1117 KB

 

 


 

Home