Can MATLAB jsondecode() be used to decode a massive json file?

71 views (last 30 days)
What I want to do is parse this massive Cast Vote Record for an election in Alaska in 2022. Size is 373M. I am not sure yet what class of field I am looking for, and I was hoping to view the file as a text file, but there are no line nor indents put in. They took out all of the white space, evidently. File is impossible to read with any text editor I have.
I would like to parse it, one field at a time.
I have once in my life written numerical output from a MATLAB script to a json file, simply following the rules and having choice of how I was going to format it. It was still an ugly job. I just don't think I can write my own parser, but was hoping maybe I could get jsondecode() to help me.

Accepted Answer

Harsha Vardhan
Harsha Vardhan on 17 Feb 2024
Edited: Harsha Vardhan on 17 Feb 2024
Hi,
I see that you are trying to parse a huge json file using MATLAB.
This can be done using 'jsondecode' and 'jsonencode' functions as below:
After extracting the contents of the zip file - https://www.elections.alaska.gov/results/22SSPG/CVR_Export_20220908084311.zip , many json files are available in the extracted folder. Among them, we will parse the largest json file - 'CvrExport.json' of size 364MB using the MATLAB code below. The below code stores the decoded json data into the 'data' variable, Further, the code also properly formats the json data and then writes to the file - 'output.json'.
% Read JSON file
jsonStr = fileread('CvrExport.json');
% Decode JSON data
data = jsondecode(jsonStr);
% Open text file for writing
fid = fopen('output.json', 'w');
% Write formatted JSON to the text file
fprintf(fid, '%s\n', jsonencode(data, PrettyPrint=true));
% Close the text file
fclose(fid);
Now, we can parse one field at a time using the dot operator as below.
data
data =
struct with fields:
Version: '5.5.52.6'
ElectionId: '2022 Primary Election and Special General'
Sessions: {192289×1 cell}
%data.Sessions will return 192289 1x1 structs. Among them, we will access the first session as below.
data.Sessions{1}
ans =
struct with fields:
TabulatorId: 91100
BatchId: 1
RecordId: 1
CountingGroupId: 2
ImageMask: 'D:\NAS\2022 Primary Election and Special General\Results\Tabulator91100\Batch001\Images\91100_00001_000001*.*'
SessionType: 'QRVote'
VotingSessionIdentifier: ''
UniqueVotingIdentifier: ''
Original: [1×1 struct]
Similarly, the json can be further parsed into the the 'Original' field.
The 2nd problem you mentioned is being unable to view the json file in a text editor. This will be a problem for the 'output.json' file too since it is also a huge file. You can have a work around for this by viewing a few lines at a time. For example, the following MATLAB script displays the first 200 lines of the 'output.json' file.
% Open the text file for reading
fid = fopen('output.json', 'r');
% Read the first 200 lines
numLines = 200;
for i = 1:numLines
line = fgetl(fid);
if line == -1
% Break if end of file is reached
break;
end
disp(line);
end
% Close the file
fclose(fid);
You can also view line by line of the huge output.json file using the 'more' command in the 'Windows Command Prompt Window (CMD)'. You may check the output below:
%Command
more +1 output.json
%Output
"Version": "5.5.52.6",
"ElectionId": "2022 Primary Election and Special General",
"Sessions": [
{
"TabulatorId": 91100,
"BatchId": 1,
"RecordId": 1,
"CountingGroupId": 2,
"ImageMask": "D:\\NAS\\2022 Primary Election and Special General\\Results\\Tabulator91100\\Batch001\\Images\\91100_00001_000001*.*",
"SessionType": "QRVote",
"VotingSessionIdentifier": "",
"UniqueVotingIdentifier": "",
"Original": {
"PrecinctPortionId": 404,
"BallotTypeId": 5,
"IsCurrent": true,
"Cards": {
"Id": 515,
"PaperIndex": 0,
"Contests": [
{
"Id": 5,
"ManifestationId": 59,
"Undervotes": 0,
"Overvotes": 0,
"OutstackConditionIds": [],
"Marks": {
"CandidateId": 141,
"ManifestationId": 904,
"PartyId": 14,
"Rank": 1,
-- More (0%) --
You may refer here for documentation of the 'jsondecode' and 'jsonencode' functions:
  1. https://www.mathworks.com/help/matlab/ref/jsondecode.html
  2. https://www.mathworks.com/help/matlab/ref/jsonencode.html
Hope this helps in resolving your query!
  1 Comment
robert bristow-johnson
robert bristow-johnson on 19 Feb 2024
Thank you so much. This looks like an extremely complete answer. There are smaller JSON files that I'll try that out with. Then I'll go after the big one, but I fear my laptop will choke. That's gotta be close to the memory limit for the whole thing.
Just FYI, I am trying to determine and tally what the first and second-choice rankings are for each ballot and regarding either of the three major candidates. So there are 9 numbers I want to extract and to verify what we have here: https://drive.google.com/file/d/1y32bPVmq6vb6SwnMn6vwQxzoJfvrv6ID/view .

Sign in to comment.

More Answers (0)

Tags

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!