Gene Expression Data Analysis Suite
Sample
Datasets for aiding experimentation and analysis
Click on the following images to download sample
dataset for experimentation. Some of the
datasets do not have the column for gene names and/or gene functions. In some cases, a number of columns from
different original microarray data had to be put in one file. Wherever the size of text file was very large
(say 2 MB or higher), they have been zipped into a single .zip file. All the files are publicly available from the
Internet. Copyrighted data files and
images provided, whatsoever, are gratefully acknowledged. Most of the datasets can be obtained from
NCBI’s Gene Expression Omnibus at http://www.ncbi.nlm.nih.gov/geo.
Caution: Usually, smaller datasets
should be used for experimentation, and on successful completion, work should
include large datasets. Datasets vary
between 50 KB to over 2 GB. Algorithms
should be capable of taking input such large datasets and then able to store
them in primary memory. Sometimes more
than two copies of the entire dataset are required in the memory due to matrix
operations. They can cause the computer
to hang indefinitely.
Sl. No. |
Image |
Title, organism |
size of dataset (genes x samples) |
Description |
Reference |
File Extn |
Size |
|
|
|
|
|
|
|
|
1. |
Breast
Cancer, Human |
3226 x 22 |
It has 3226
genes of 21 patients (with data of a patient repeated twice), pertaining to 7
patients each of BRCA1, BRCA2 and Sporadic cases. |
Hedenfalk et al. (2001) |
TXT |
349 KB |
|
|
|
|
|
|
|
|
|
2. |
Sugarcane,
dataset GDS203 |
3072 x 12 |
Expression
profile of sugarcane plantlet leaves exposed to cold for 0, 3, 6, 12, 24 and
48 hours. Thirty-four cold-induced ESTs identified, of which 23 were novel
cold-responsive genes that had not previously been reported |
Nucleic Acids Res. 2002 Jan
1;30(1):207-10 |
TXT |
224 KB |
|
|
|
|
|
|
|
|
|
3. |
MIFLC (Mus
musculus), dataset
GDS590 |
1185 x 9 |
Molecular
events of muscle injury examined in 3 month male C57BL/6 extensor digitorum
longus muscle (EDL) following lengthening contractions. EDL examined at 6 and
72 hours and compared to non-treated control. |
Nucleic Acids Res. 2002 Jan 1;30(1):207-10 |
TXT |
77 KB |
|
|
|
|
|
|
|
|
|
4. |
Arabidopsis
thaliana, dataset GDS101 |
1322 x 18 |
Examination
of gene expression induced by UV-B light and gamma-ray treated plantlets. |
Nucleic Acids Res. 2002 Jan
1;30(1):207-10 |
TXT |
154 KB |
|
|
|
|
|
|
|
|
|
5. |
Homo
sapiens, dataset GDS71 |
1440 x 12 |
Use of
protein microarrays to characterize patterns of variation in hundreds of thousands
of different proteins and antibodies in clinical or research
applications. |
Nucleic Acids Res. 2002 Jan
1;30(1):207-10 |
TXT |
117 KB |
|
|
|
|
|
|
|
|
|
6. |
Mus
musculus, dataset GDS346 |
3168 x 7 |
Effect of
5-HT2A seratonin receptor agonist (±)2,5-dimethoxy 4-iodoamphetamine (DOI) in
somato-sensory cortex. 129Sv mice injected with DOI (2 mg/kg or 10 mg/kg), 1
hour to study serotonergic hallucinogen effect in somato-sensory cortex |
Nucleic Acids Res. 2002 Jan
1;30(1):207-10 |
TXT |
203 KB |
|
|
|
|
|
|
|
|
|
7. |
Yeast dataset |
6221 x 80 |
A time
series dataset with various parameters such as different cell cycles,
sporulation, diauxic shift, etc. recorded at frequent intervals. |
Eisen et al.
(1998) |
ZIP |
844 KB |
|
|
|
|
|
|
|
|
|
8. |
Helicobacter pyroli dataset |
4608 x 4 |
Case
controlled study from Britta Bjorkholm and Lars Engstrand |
Bjorkholm et al
(2001) |
XLS |
481 KB |
|
|
|
|
|
|
|
|
|
9. |
Lymphoma, Human |
13411 x 40 |
Distinct Types of
Diffuse Large B-Cell Lymphoma Identified by Gene Expression Profiling |
Alizadeh et al (2000) |
ZIP |
1117 KB |