Index and Search Dataset Arrays
Note
The dataset
data type is not recommended. To
work with heterogeneous data, use the MATLAB®
table
data type instead. See MATLAB
table
documentation for more information.
Ways To Index and Search
There are many ways to index into dataset arrays. For example,
for a dataset array, ds
, you can:
Use
()
to create a new dataset array from a subset ofds
. For example,ds1 = ds(1:5,:)
creates a new dataset array,ds1
, consisting of the first five rows ofds
. Metadata, including variable and observation names, transfers to the new dataset array.Use variable names with dot notation to index individual variables in a dataset array. For example,
ds.Height
indexes the variable namedHeight
.Use observation names to index individual observations in a dataset array. For example,
ds('Obs1',:)
gives data for the observation namedObs1
.Use observation or variable numbers. For example,
ds(:,[1,3,5])
gives the data in the first, third, and fifth variables (columns) ofds
.Use logical indexing to search for observations in
ds
that satisfy a logical condition. For example,ds(ds.Gender=='Male',:)
gives the observations inds
where the variable namedGender
, a nominal array, has the valueMale
.Use
ismissing
to find missing data in the dataset array.
Examples
Common Indexing and Searching Methods
This example shows several indexing and searching methods for categorical arrays.
Load the sample data.
load hospital;
size(hospital)
ans = 1×2
100 7
The dataset array has 100 observations and 7 variables.
Index a variable by name. Return the minimum age in the dataset array.
min(hospital.Age)
ans = 25
Delete the variable Trials
.
hospital.Trials = []; size(hospital)
ans = 1×2
100 6
Index an observation by name. Display measurements on the first five variables for the observation named PUE-347
.
hospital('PUE-347',1:5)
ans = LastName Sex Age Weight Smoker PUE-347 {'YOUNG'} Female 25 114 false
Index variables by number. Create a new dataset array containing the first four variables of hospital
.
dsNew = hospital(:,1:4); dsNew.Properties.VarNames(:)
ans = 4x1 cell
{'LastName'}
{'Sex' }
{'Age' }
{'Weight' }
Index observations by number. Delete the last 10 observations.
hospital(end-9:end,:) = []; size(hospital)
ans = 1×2
90 6
Search for observations by logical condition. Create a new dataset array containing only females who smoke.
dsFS = hospital(hospital.Sex=='Female' & hospital.Smoker==true,:); dsFS(:,{'LastName','Sex','Smoker'})
ans = LastName Sex Smoker LPD-746 {'MILLER' } Female true XBR-291 {'GARCIA' } Female true AAX-056 {'LEE' } Female true DTT-578 {'WALKER' } Female true AFK-336 {'WRIGHT' } Female true RBA-579 {'SANCHEZ' } Female true HAK-381 {'MORRIS' } Female true NSK-403 {'RAMIREZ' } Female true ILS-109 {'WATSON' } Female true JDR-456 {'SANDERS' } Female true HWZ-321 {'PATTERSON'} Female true GGU-691 {'HUGHES' } Female true WUS-105 {'FLORES' } Female true