Problem with using fopen

9 views (last 30 days)
Joel Karlsson
Joel Karlsson on 6 Apr 2021
Edited: Joel Karlsson on 6 Apr 2021
The goal is not just get the words from a pdf like you get from extractFileText(filename) syntax, but also the position of each sentence. The solution i use is to read the pdf and then flatedecode it to acive this information. After decoding the information can look like this: I found a pyhonscript* that works and i want to translate it into matlab.
...here comes the problem
Python:
pdf = open("TestCOA.pdf","rb").read() <--- python read the file perfectly
Matlab:
fileID = fopen("TestCOA.pdf",'rb','n','us-ascii');
A = fscanf(fileID,'%c') <-- reads some char but mixed with invalid characters <?>
pdf=py.open("TestCOA.pdf","rb").read() <-- same results with the python integration syntax
Upploaded example pdf to try it out. Hope someone can help me to figure this out. :)

Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!