How can I read from the REST API from uniprot with matlab - the data is produced in chunks

4 views (last 30 days)
I tried reading this using the matlab command
taxonID = 9606
url = ['https://rest.uniprot.org/uniprotkb/search?query=taxonomy_id:' num2str(taxonID) '&fields=accession%2Cgene_oln%2Cgene_primary%2Cec%2Cxref_geneid%2Cxref_refseq%2Cmass%2Csequence&format=tsv&size=500&sort=protein_name%20asc'];
uniprotDL = webread(url);
The problem is that I only get the first 500 lines. size cannot be set larger than 500, they say it has to be solved using paging. The python example is as follows (not exactly the same, but very similar):
import requests
from requests.adapters import HTTPAdapter, Retry
re_next_link = re.compile(r'<(.+)>; rel="next"')
retries = Retry(total=5, backoff_factor=0.25, status_forcelist=[500, 502, 503, 504])
session = requests.Session()
session.mount("https://", HTTPAdapter(max_retries=retries))
def get_next_link(headers):
if "Link" in headers:
match = re_next_link.match(headers["Link"])
if match:
return match.group(1)
def get_batch(batch_url):
while batch_url:
response = session.get(batch_url)
response.raise_for_status()
total = response.headers["x-total-results"]
yield response, total
batch_url = get_next_link(response.headers)
url = 'https://rest.uniprot.org/uniprotkb/search?fields=accession%2Ccc_interaction&format=tsv&query=Insulin%20AND%20%28reviewed%3Atrue%29&size=500'
interactions = {}
for batch, total in get_batch(url):
for line in batch.text.splitlines()[1:]:
primaryAccession, interactsWith = line.split('\t')
interactions[primaryAccession] = len(interactsWith.split(';')) if interactsWith else 0
print(f'{len(interactions)} / {total}')
As you can see, they download it in chunks. But we want to do it in MatLab, since this is a part of some other package that we are building. Is it possible to do this in matlab?

Answers (1)

Chetan
Chetan on 7 Sep 2023
I understand that you are attempting to access a paged API and retrieve all available data.
In MATLAB, it is possible to implement a pagination approach to download data from the UniProt REST API in chunks.
You can utilise the basic while loop and webread function to access the data.
You check for result length when the empty results come then stop the while loop.
You can access the following example for more details:
baseURL = 'https://api.punkapi.com/v2/beers';
url = [baseURL '?per_page=10'];
beers = batchReadAPI(url);
numBeers = numel(beers);
disp(['Downloaded ' num2str(numBeers) ' beers']);
function data = batchReadAPI(baseURL)
data = [];
len = 1;
page=1;
while len ~= 0
url = [baseURL '&page=' num2str(page)];
options = weboptions('ContentType', 'json', 'CharacterEncoding', 'UTF-8');
response = webread(url, options);
data = [data; response]
if len~=0
len=length(response)
end
page=page+1;
end
end
Refer to the following documentation for more details like changing the content type and various options availabe with the “webread:
I hope these suggestions help you resolve the issue you are facing.
Best regards,
Chetan Verma

Categories

Find more on Downloads in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!