Main Content

getpdb

Retrieve protein structure data from Protein Data Bank (PDB) database

    Description

    PDBStruct = getpdb(PDBid) searches the PDB database for the protein structure record specified by the identifier PDBid and returns the MATLAB® structure PDBStruct, which contains a field for each PDB record.

    The Protein Data Bank (PDB) database is an archive of experimentally determined 3-D biological macromolecular structure data. getpdb retrieves protein structure data from the Protein Data Bank (PDB) database, which contains 3-D biological macromolecular structure data.

    PDBStruct = getpdb(PDBid,Name=Value) specifies options using one or more name-value arguments in addition to the input argument, PDBid.

    example

    Examples

    collapse all

    Retrieve the structure information for the electron transport (heme) protein that has a PDB identifier of 5CYT, read the information into a MATLAB® structure pdbstruct, and save the information to a PDB-formatted file electron_transport.pdb in the MATLAB® Current Folder.

    pdbstruct = getpdb("5CYT",ToFile="electron_transport.pdb")
    pdbstruct = struct with fields:
                   Header: [1×1 struct]
                    Title: 'REFINEMENT OF MYOGLOBIN AND CYTOCHROME C'
                 Compound: [4×23 char]
                   Source: [4×38 char]
                 Keywords: 'ELECTRON TRANSPORT (HEME PROTEIN)'
           ExperimentData: 'X-RAY DIFFRACTION'
                  Authors: 'T.TAKANO'
             RevisionDate: [1×7 struct]
               Superseded: [1×1 struct]
                  Journal: [1×1 struct]
                  Remark1: [1×1 struct]
                  Remark2: [1×1 struct]
                  Remark3: [1×1 struct]
                  Remark4: [2×59 char]
                Remark100: [3×59 char]
                Remark200: [49×59 char]
                Remark280: [6×59 char]
                Remark290: [32×59 char]
                Remark300: [6×59 char]
                Remark350: [13×59 char]
                Remark500: [105×59 char]
                Remark525: [13×59 char]
                Remark620: [15×59 char]
                Remark800: [5×59 char]
             DBReferences: [1×1 struct]
        SequenceConflicts: [1×2 struct]
                 Sequence: [1×1 struct]
                Heterogen: [1×2 struct]
            HeterogenName: [1×2 struct]
                  Formula: [1×3 struct]
                    Helix: [1×5 struct]
                     Link: [1×5 struct]
                     Site: [1×1 struct]
                   Cryst1: [1×1 struct]
                  OriginX: [1×3 struct]
                    Scale: [1×3 struct]
                    Model: [1×1 struct]
             Connectivity: [1×52 struct]
                   Master: [1×1 struct]
                SearchURL: 'https://files.rcsb.org/download/5CYT.pdb'
    
    

    Input Arguments

    collapse all

    Unique identifier for a protein structure record in the PDB database, specified as a string or character vector. Each structure in the PDB database is represented by a four-character alphanumeric identifier. For example, 4hhb is the identifier for hemoglobin.

    Data Types: char | string

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: ToFile="electron_transport.pdb"

    File name or a path and file name for saving the PDB-formatted data, specified as a string or character vector. If you specify only a file name, getpdb searches the MATLAB Current Folder.

    Data Types: char | string

    Control for returning the protein sequence only, specified as true or false .If there is one sequence, it is returned as a character array. If there are multiple sequences, they are returned as a cell array.

    Data Types: logical

    Connection timeout to retrieve data from the PDB database, specified as a positive number. Specify the connection timeout value in seconds. For details, see here

    Data Types: double

    Output Arguments

    collapse all

    MATLAB structure containing a field for each PDB record, returned as a structure array. This table summarizes the possible PDB records and the corresponding fields in PDBStruct.

    PDB Database RecordField in the MATLAB Structure
    HEADERHeader
    OBSLTEObsolete
    TITLETitle
    CAVEATCaveat
    COMPNDCompound
    SOURCESource
    KEYWDSKeywords
    EXPDTAExperimentData
    AUTHORAuthors
    REVDATRevisionDate
    SPRSDESuperseded
    JRNLJournal
    REMARK 1Remark1
    REMARK N, where N equals 2 through 999.Remarkn, where n equals 2 through 999.
    DBREFDBReferences
    SEQADVSequenceConflicts
    SEQRESSequence
    FTNOTEFootnote
    MODRESModifiedResidues
    HETHeterogen
    HETNAMHeterogenName
    HETSYNHeterogenSynonym
    FORMULFormula
    HELIXHelix
    SHEETSheet
    TURNTurn
    SSBONDSSBond
    LINKLink
    HYDBNDHydrogenBond
    SLTBRGSaltBridge
    CISPEPCISPeptides
    SITESite
    CRYST1Cryst1
    ORIGXnOriginX
    SCALEnScale
    MTRIXnMatrix
    TVECTTranslationVector
    MODELModel
    ATOMAtom
    SIGATMAtomSD
    ANISOUAnisotropicTemp
    SIGUIJAnisotropicTempSD
    TERTerminal
    HETATMHeterogenAtom
    CONECTConnectivity

    The Sequence field is also a structure containing sequence information in the following subfields:

    • NumOfResidues

    • ChainID

    • ResidueNames — Contains the three-letter codes for the sequence residues.

    • Sequence — Contains the single-letter codes for the sequence residues.

    Note

    If the sequence has modified residues, then the ResidueNames subfield might not correspond to the standard three-letter amino acid codes. In this case, the Sequence subfield will contain the modified residue code in the position corresponding to the modified residue. The modified residue code is provided in the ModifiedResidues field.

    The Model field is also a structure or an array of structures containing coordinate information. If the MATLAB structure contains one model, the Model field is a structure containing coordinate information for that model. If the MATLAB structure contains multiple models, the Model field is an array of structures containing coordinate information for each model. The Model field contains the following subfields:

    • Atom

    • AtomSD

    • AnisotropicTemp

    • AnisotropicTempSD

    • Terminal

    • HeterogenAtom

    The Atom field is also an array of structures containing the following subfields:

    • AtomSerNo

    • AtomName

    • altLoc

    • resName

    • chainID

    • resSeq

    • iCode

    • X

    • Y

    • Z

    • occupancy

    • tempFactor

    • segID

    • element

    • charge

    • AtomNameStruct — Contains three subfields: chemSymbol, remoteInd, and branch.

    Version History

    Introduced in R2006a