How can i extract data from data from table using a string of words.

5 views (last 30 days)
I am trying to extract data from a file but using a nested loop and if statement has not worked. The main file has column 1 as genes ranging form (1:60000) its a very large file. These genes have names like DUOX1 and so on. Now each column has the patient name a total of 45. Each gene has an expression number for each patient. What i want to do is to extract a certain amount of genes by using their names to extract all the expression for all patients.
the loop that i have been trying has been:
%16 TDS Genes
data_table=readtable("Normal and Tumor TPM.xlsx","Sheet","Normal Expression","VariableNamingRule","preserve");
tds_genes_table=readtable("16 TDS.xlsx","Sheet","Normal");
genes_16 = string(tds_genes_table.Properties.VariableNames);
n = max(size(data_table)) %gives total amount of genes in the main file "data". Each has a name like DUOX1
m = max(size(genes_16)) %contains gene names that i want to extract from data
%loop to match 16 tds genes with patient and expression
for i = 1:n
for j = 1:m
if i=j
expression(j,:) = data_table(i,:)
end
end
end
  6 Comments
Jose Mendoza Garcia
Jose Mendoza Garcia on 9 Sep 2022
what i want to do is extract these 16 genes form the large file with all the expression information
this are the 16 genes:
DIO1 DIO2 DUOX1 DUOX2 FOXE1 GLIS3 NKX2-1 PAX8 SLC26A4 SLC5A5 SLC5A8 TG THRA THRB TPO TSHR

Sign in to comment.

Answers (2)

Cris LaPierre
Cris LaPierre on 9 Sep 2022
I think the Access Data in a Table page is what you want. You might also be interested in the height and width functions for tables. Also, I think this is a scenario where it is best to set the TextType to string.
I think the code might look something like this.
data_table=readtable("Normal and Tumor TPM.xlsx","Sheet","Normal Expression","VariableNamingRule","preserve","TextType","string")
tds_genes_table=readtable("16 TDS.xlsx","Sheet","Normal","TextType","string")
genes_16 = string(tds_genes_table.Properties.VariableNames)
idx = ismember(data_table.Gene_ID,genes_16)
expression = data_table(idx,:)

dpb
dpb on 9 Sep 2022
Edited: dpb on 9 Sep 2022
data_table=readtable("Normal and Tumor TPM.xlsx","Sheet","Normal Expression","VariableNamingRule","preserve");
geneslist=strtrim({'DIO1';'DIO2';'DUOX1';'DUOX2';'FOXE1';'GLIS3';'NKX2-1';'PAX8';'SLC26A4 ';'SLC5A5 ';'SLC5A8 ';'TG ';'THRA ';'THRB ';'TPO ';'TSHR'});
data_table.GENEVARIABLENAME=categorical(data_table.GENEVARIABLENAME); % turn to categorical if isn't; you don't tell us what the variable names are...
tWantedByGene=data_table(matches,data_table.GENEVARIABLENAME,geneslist),:); % select rows with matching genes
should be all you're looking for...
As always, attaching a representative section of the data file would facilitate writing actual code to address any problems.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!