Protein Assemblies: Nature-Inspired and Designed Nanostructures

Ordered protein assemblies are attracting interest as next-generation biomaterials with a remarkable range of structural and functional properties, leading to potential applications in biocatalysis, materials templating, drug delivery and vaccine development. This Review covers ordered protein assemblies including protein nanowires/nanofibrils, nanorings, nanotubes, designed two- and three-dimensional ordered protein lattices and protein-like cages including polyhedral virus-like cage structures. The main focus is on designed ordered protein assemblies, in which the spatial organization of the proteins is controlled by tailored noncovalent interactions (including metal ion binding interactions, electrostatic interactions and ligand–receptor interactions among others) or by careful design of modified (mutant) proteins or de novo constructs. The modification of natural protein assemblies including bacterial S-layers and cage-like and rod-like viruses to impart novel function, e.g. enzymatic activity, is also considered. A diversity of structures have been created using distinct approaches, and this Review provides a summary of the state-of-the-art in the development of these systems, which have exceptional potential as advanced bionanomaterials for a diversity of applications.


INTRODUCTION
Nature exploits protein assemblies of different types, ranging from viruses to microtubules and bacterial pili to large protein assemblies and bacterial S-layers. Extended protein assemblies are structural components of the extracellular matrix and biofilms for example, as well as cell motility structures. Many viruses and bacterial microcompartments (BMCs) comprise ordered protein assemblies forming cages (discussed further in Section 5 below) around nucleic acids in the case of viruses or proteins in the case of BMCs, although other classes of viruses have extended assemblies (several are mentioned in Sections 2 and 3 below). Proteins can also assemble around metal centers, for example in heme proteins, or they can form large multicomponent assemblies such as ribosomes. Advances in the understanding of computational protein design and genetic engineering methods have recently enabled the rational design of protein subunit structures to form regular 1D-, 2D-and 3Dsuperstructures including nanowires, nanotubes, 2D and 3D lattices and cage structures. This Review describes research in the field of engineered protein nanostructures, and summarizes the various methods that have been employed to fabricate such ordered protein assemblies. We do not consider natural protein assemblies, although modified variants are discussed. Also not included is a discussion of natural protein crystal structures, obviously a huge separate subject in its own right.
Engineered protein assemblies are attractive for future applications due to their potential for large scale biosynthesis and their biofunctionality and biocompatibility. New emergent properties are expected from the nanostructured materials. Nanowire and nanotube structures may be created that have potentially valuable structural or (opto-) electronic properties. Assemblies based on enzymes may have enhanced catalytic behavior. Synthetic vaccines can be designed that are inspired by, or based on, existing virus structures. Other applications for protein cages may exploit their potential to encapsulate cargo.
In the most typical approach, fusion protein assemblies are created with predefined symmetry elements in order to define the directionality of the superstructure. 1−12 Less commonly, proteins or peptides may be covalently linked (via flexible peptide or other spacers) in order to preimpose directionality of assembly. 13,14 In a different approach, noncovalent interactions can be employed by modification of specific residues at the protein surface to enable π−π stacking 15−18 or metal ion coordination, 19−22 for example. Again, the position and relative orientations of modification sites have to be chosen to design protein assemblies with specific symmetries which lead to defined superstructures. Alternatively, host−guest ligand− receptor interactions may be employed. [15][16][17]23 In the simplest case, binary structures can be formed through electrostatic interactions. 24−34 The topic of ordered protein assemblies has been the focus of a number of previous reviews, 35 −41 and has been touched upon in broader reviews of protein-based materials. 42 Here, the state-of-the-art in this emerging field is summarized and exemplified by selected works which elegantly highlight the precision control of nanoscale protein assemblies that can be achieved using advanced design and synthesis methods.

NANOFIBRIL AND NANORING STRUCTURES
Linear arrays of proteins, i.e., nanofibrils or nanowires, result from one-dimensional (1D) assembly through suitable binary head−tail interactions such as lock-and-key or ligand−receptor interactions or complementary binding interactions. As an example of assembly guided by ligand−receptor/lock-and-key interactions, linear structures have been constructed through cucurbit [8]uril, CB [8], host−guest interactions. 23 Dimers of glutathione S-transferase (GST) were modified at the two Ntermini with the tripeptide FGG which forms a host−guest complex with CB [8] with high binding constant. The resulting nanowires were characterized by diameters of 5 nm and were tens of nm in length.
Using complementary binding peptide interactions, Usui et al. created so-called "nanolego" building blocks based on pairing of homotetrameric proteins modified at each of the four subunit C-termini by a peptide (Figure 1a). 1 Two protein fusion building blocks each containing four corner peptides were produced, the two types having complementary binding peptide units (Nanolego A and B, Figure 1a). Self-assembly of a mixture of the two leads to linear aggregation and nanofibril formation (Figure 1b). The tetrameric protein scaffold was a superoxide reductase (SOR) and the peptide units were either a mouse PDZ (signaling protein) domain or a PDZ-binding peptide (these two reversibly associate with high binding affinity). In a further development, the PDZ peptides were modified with cysteine mutations to enable stepwise extension of the aggregates based on disulfide bond formation. Finally, the fabrication of finite length aggregates (3-mers) was studied by introducing capping units with only two of the four binding units. 1 Protein nanorings have been created using chemical inducers of protein dimerization, for example using a dimeric methotrexate MTX 2 -C 9 which has high binding affinity for dihydrofolate reductase, DHFR (which was modified with extended linkers between the two subunits). 43 Methotrexate is a therapeutic molecule used in chemotherapy and the treatment of automimmune diseases including arthritis, which acts by inhibiting DHFR. Figure 2a,b show the proposed assembly mechanism along with the DHFR 2 variants prepared, while Figure 2c shows a representative TEM image of nanorings (20−28 nm in outer diameter). The closure into ring structures with defined sizes (i.e., number of subunits) is proposed to result from the balance between conformational flexibility and the entropy of oligomerization. 43 The same group has demonstrated ring structures with 8−30 nm diameter using fusion proteins DHFR-Hint1 (dihydrofolate reductase-histidine triad nucleotide binding) with a variable length peptide spacer between the Hint1 unit and the DHFR protein. 2 These fusion proteins were polymerized with a dimeric enzyme inhibitor molecule. The ratio of intra-to intermolecular polymerization could be controlled via adjustment of fusion protein concentration, leading to oligomers containing 2−12 monomers. Intermolecular cyclization was also favored by reduction of the length of the linking peptide. 2  Small tetramer and trimer oligomers ("nanorings") can be made from complementary coiled-coil forming peptides joined by a disordered flexible (GN) x peptide linker ( Figure 3). 14 Parallel dimeric coiled-coil formation favors the trimeric and tetrameric structures, which were detected by analytical ultracentrifugation. The peptides with the shortest linker GN 1 formed fibril structures (Figure 3), as imaged by TEM. 14 Peptide fibril structures are not considered further in this section, as this subject has been extensively reviewed elsewhere. 44−47 The rod-shaped M13 bacteriophage has been used as a nanowire template scaffold for a variety of applications. In one example, photocatalytic structures based on chemical grafting of photosensitizer and catalyst molecules to the M13 major coat protein p8 was reported. 48 The M13 bacteriophage comprises approximately 2700 copies of α-helical coat proteins arranged around the viral DNA. Proteins with either chemically linked zinc porphyrin photosensitizer or an iridium oxide catalyst (attached noncovalently using an IrO 2 -binding peptide) were coassembled, producing fiber (nanowire) structures. Light-driven water splitting was observed to be catalyzed by the assemblies. 48 The same strategy was used to produce cobalt oxide nanowires using p8 coat proteins modified with metal ion-binding tetra-glutamate sequences. 49 Additional incorporation of gold-binding peptide produced hybrid gold−cobalt oxide wires as electrodes, which improved the charge storage capacity of model lithium ion batteries. 49 In   a further development, the M13 bacteriophage has been genetically engineered to incorporate a terminal peptide that binds single wall carbon nanotubes (SWCTs) and to incorporate peptides within the major coat protein that bind amorphous iron phosphate. These iron phosphate-based nanowire materials demonstrated excellent performance as cathodes for lithium ion batteries. 50 An engineered M13 bacteriophage was also developed to produce a Co−Pt hybrid material with superparamagnetic properties. 51 Another recent example involves the use of M13 bacteriophages as scaffolds for metal deposition onto nanofoam meshes prepared by glutaraldehyde cross-linking of M13 modified with a glutamicacid rich peptide at the N-terminus. 52 The free-standing metal nanofoams prepared may have use in the development of novel electrodes.
Nanorings can be created by coassembly of modified M13 bacteriophages and linker molecules. 53 The M13 bacteriophage was genetically engineered to incorporate antistreptavidin and hexahistidine peptides at opposite ends. Stoichiometric addition of streptavidin-NiNTA (Ni(II)-nitriloacetic acid hexahistidine-binding motifs) linkers led to the reversible formation of nanorings. 53 Work on M13 bacteriophage and other viruses as scaffolds for nanomaterials development has been reviewed. 54 Electrostatic interactions between oppositely charged proteins can drive assembly into nanoring structures. In a recent example, supercharged Cerulean and GFP (Green Fluorescent Protein) variants were mixed to form toroid (nanoring) structures comprising two stacked octameric rings, as revealed by high resolution cryo-EM. 30

NANOTUBE STRUCTURES
In this section, protein nanotubes created by design or modification of natural nanotube structures (rod-like viruses in particular) are considered. Peptide nanotube structures are not discussed in this section, this topic having been the subject of several recent reviews. 55−60 A subunit of cyotochrome c that comprises a four-helixbundle haem protein has been used as a stable building block to produce nanotube and two-or three-dimensional (2D, 3D) crystalline structures via metal-coordination interactions. 19 A variant of the protein modified to present two metal-binding bis-histidine motifs on its surface forms a C 2 -symmetric dimer structure stabilized by Zn 2+ binding. Furthermore, one Zn 2+ binding site is left open to binding by another dimer, the binding sites being positioned to favor orthogonal assembly, leading to helical chains. Appropriate solution conditions (high pH or low pH and high relative Zn 2+ ion concentration) lead to fast formation of nanotubes, whereas the opposite conditions lead to slow nucleation into 2D and 3D crystals. Figure 4 shows a cryo-EM image with a model for the helical arrangement of the tetrameric (Zn 2+ -linked pair of dimers) building unit. 19 The same cytochrome c variant was later coupled via cysteine cross-linking to produce a dimer which was used to construct a tetrameric aggregating unit via Zn 2+ coordination. 19 The tetrameric dimer wraps helically to form the walls of nanotubes, of two types, which were observed according to the solution conditions (pH, buffer and Zn 2+ excess). Slow nucleation conditions lead to the formation of 2D-and 3D-crystals in this system, as discussed in Section 4.
The formation of nanotubes was observed by inducing the association of the homotetrameric protein soybean agglutinin (SBA) using a ligand containing both a galactose-based sugar Biomacromolecules Review unit to bind the protein and an aromatic motif (Rhodamine B, RhB) to drive π−π stacking interactions ( Figure 5). 15 A model for the helical wrapping of the proteins to form the nanotube wall could be obtained, based on electron tomographic imaging. The nanotube growth kinetics can be changed by temperature adjustment and the nanotubes could be dissociated by adding β-cyclodextrin which binds the RhB. 15 These structures resemble protein microtubule structures which are formed from dimers of the protein tubulin and have an outer diameter 24 nm and lengths up to tens of micrometers. Microtubules are involved in mitosis and this, for example, is the basis of the activity of the anticancer drug Taxol which hinders microtubule depolymerization, promoting the arrest of mitosis and death of cancer cells. 61,62 Tobacco mosaic virus (TMV) has a rod-like structure, consisting of an array of coat proteins wrapped around an RNA core, leading to a nanotube capsid structure. This has been exploited to produce protein nanotubes by modifying the coat proteins, which has benefits because it is suggested that the display of antigens in a regular array leads to an enhanced immunogenicity compared to that induced by free proteins. 63 Palmer and co-workers modified TMV coat proteins to enable biotinylation which then allows binding of streptavidin-tagged proteins, exemplified with GFP and an N-terminal fragment of the canine oral papillomavirus L2 protein. In both cases, nanotube structures which elicited higher immunogenic response than unconjugated (and unassembled) protein were observed. 63 In a similar fashion, an S123C mutant TMV coat protein has been used as a handle to attach electron donor and acceptor fluorescence chromophores, creating a scaffold for light harvesting. 64,65 Unmodified TMV has a negatively charged surface and this has been used to template the deposition of cationically modified gold nanoparticles via electrostatic interactions, leading to the formation of twisted fiber bundle structures. 34 Magnetic alignment of these structures yielded a plasmonic polarizer.
It is possible to design peptides that adsorb to particular species or surfaces. Using this inherent design flexibility, DeGrado's group have developed antiparallel homohexameric coiled coil peptides that assemble into nanotubes around SWCTs. 66 The peptides incorporate hydrophobic units, for example the C β methyl of Ala, to facilitate binding to the hexagonal array of carbon atoms at the nanotube surface.

2D AND 3D CRYSTAL STRUCTURES
Planar assembled protein structures exist in nature, for example S-layers are monolayers of (glyco)protein structure in the membrane of archaea and certain bacteria. 67 S-layer protein structures have been reconstructed using tetrameric fusion proteins comprising one of a number of S-protein fragments and three streptavidin units. 3 It was shown that these engineered constructs can form planar (oblique) lattice structures on flat surfaces or on cell wall polymer-containing cell wall fragments. The regular display of the streptavidin units enables binding of biotin and biotinylated protein such as ferritin in a regular pattern. 3 It has been suggested that the periodic structure of S-layers is ideal for the development of affinity matrices used in DNA, protein or antibody detection chips. 68 S-layer proteins have also been examined in vaccine development, due to the ability to present antigens at surfaces, along with adjuvant properties. 68 S-layer structures additionally have potential in the development of immobilized biocatalysts, this having been demonstrated with fusion constructs incorporating extremophile enzymes. 69,70 The fluorescent protein GFP has been incorporated into S-layer fusion proteins, enabling the creation of fluorescent biomarkers, pH indicators and the fluorescence imaging of the uptake of Slayered liposomes into cells. 68 Nanoparticle arrays can also be templated using the periodic structure of S-layers, modified to display gold-binding cysteine residues, 71 or utilizing S-layer pores to grow cadmium sulfide quantum dots. 72 Further details on these applications can be found in a review on S-layer structures. 68 Small cross-shaped aggregates ( Figure 6) have been created using the C 4 -symmetric tetrameric catalytic protein RhuA, Lrhamnolose-phosphate aldolase. 73 Each aldolase subunit was modified with a His 6 tag for oriented binding to a planar surface as well as two tethered biotin uses to bind streptavidin with defined orientation. TEM revealed the presence of the expected cross-shaped network aggregates ( Figure 6) on lipid monolayers when mixing biotinylated RhuA ( b R) and b R modified with four streptavidins ( b R.S 4 ). The size of the small aggregates could be expanded by adding spacers of bisbiotinylated streptavidin ( bb S, Figure 6). Rod spacers created by mixing bb S and S led to extended string-like structures. 73 In a further development, the authors incorporated a Ca 2+binding β-helix fragment of the enzyme serralysin between two

Biomacromolecules
Review PGAL (6-phospho-β-galactoside) proteins in a PGAL-β-PGAL construct and showed Ca 2+ -dependent switching, with a change in the separation of the two domains in the dumbbell-shaped fusion construct. 73 The importance of designing the correct interface between proteins in multiprotein assemblies has been emphasized by several groups. 24,74−76 Grueninger et al. emphasized the importance of rigid side chain contacts and they designed mutants of proteins with such enhanced contacts. 74 In particular, they modified monomeric PGAL to favor dimer formation by enriching contacts across local 2-fold axes and also produced tetramers from dimeric O-acetylserine sulfhydrylase (Oas) and urocanase (Uro) and modified tetrameric RhuA to favor pairwise association at the C 4 axis surfaces. They also adapted the mycobacterial porin MypA to give a D 8symmetric unit forming tail−tail dimers. Figure 7 shows the modified protein structures along with the symmetry axes (and Figure 7d shows a TEM image showing linear association of the modified RhuA tetramer shown in Figure 7b). 74 RhuA was used as a building block for 2D lattice assembly in a study where aggregation was controlled via several types of interaction via selective protein mutations. 20 Specifically, single-disulfide, double-disulfide or double-histidine (metal coordinating) mutants were prepared. The self-assembly process is reversible via oxidation/reduction (of disulfide interactions between cysteines) or using EDTA, a zinc ion chelator in the case of the double-histidine variant. Figure 8 shows the 2D lattices resulting from the assembly process which have different symmetries and a high degree of regularity. The C89 RhuA variant was observed to form a number of defect-free 2D lattice polymorphs as a result of the dynamic single disulfide bond flexibility. This material shows ideal auxetic behavior, undergoing longitudinal expansion upon transverse stretching. 20 Recent all-atom molecular dynamics simulations suggest that the free-energy landscape of these lattices is governed by solvent reorganization entropy. 77 Two-dimensional crystals as well as nanoribbon and nanowire structures were observed using the homotetrameric protein LecA from Pseudomonas aeruginosa. 16 LecA is galactose-specific (which influences its infectivity) and mixing LecA assembly-inducing ligands containing galactopyranoside derivatives with pendant rhodamine B (RhB) units induces association of the proteins due to π−π stacking of the RhB units (cf. Figure 5 and associated discussion in the preceding section). Several stacking modes are possible depending on the ligand spacer length, which influences the geometry of π−π stacking interactions (Figure 9). This leads to assembly into the observed ribbon, 2D crystal or nanowire structures. 16 Three-dimensional crystal structures can be formed by exploiting the sugar-lectin binding and π−π stacking interactions, using concanavalin A (con A), a homotetrameric protein with D 2 symmetry and mannose or lactose-based ligands incorporating aromatic Rhodamine B units to drive dimerization via π−π stacking (cf. for example Figure 5 and Figure 9). 17 Platelet-shaped crystals were noted, and single crystal X-ray diffraction enabled determination of the distinct structures of the crystals formed with different ligand linkers. 17 The methods discussed so far rely on protein site modification, chemical coupling, or production of fusion proteins with defined geometries to drive self-assembly. In contrast, Sinclair et al. developed a class of fusion protein comprising units taken from protein assemblies with different rotational symmetries, linked at their termini along one symmetry axis. 4 The fusion constructs are suitable for high-  level, soluble expression in E. coli. These can aggregate into 1D or 2D structures termed crysalins. The components may be homologous (comprising only one type of subunit) or heterologous (combined two types of subunit). Using streptavidin/Streptag I as heterologous D 2 assemblies and DsRed as a homologous D 2 assembly, linear assemblies were observed (Figure 10a) whereas combination of E. coli ALAD (ALAD: aminolevulinic acid dehydrogenase) as D 4 homologous assembly and streptavidin/Streptag1 as D 2 heterologous unit led to 2D lattices (Figure 10b). The same D 4 homologous unit with Lac21E/Lac21K (heterotetrameric coiled coil peptides based on a Lac repressor protein sequence, stabilized by Glu/Lys interactions 78 ) as a C 2 building block led to a different 2D lattice (Figure 10c). 4 An alternative approach is de novo design of proteins to create 2D lattices. Gonen et al. used the Rosetta protein modeling software to design proteins to form 2D lattices with defined symmetries. 75 Specifically, from among the 17 distinct 2D lattice structures that can be formed from 3D objects, they selected a subset with two unique interfaces and building blocks with internal point symmetry. The designed proteins were then expressed in genetically engineered E. coli and the structures assembled in solution were observed by cryo-TEM. 75 Figure 11 shows the targeted 2D lattices along with representations of the protein packings and the designed interface structures along with TEM images and projection maps with overlaid design models. The construct for the P321 lattice is a trimer of β-helices, that for the P42 1 2 lattice is based on tetrameric α-helices, and that for P6 is based on α-helical hexamers. 75 The same concept was used to design protein cage structures, as discussed further in Section 5.
A novel route to porous 2D crystal structures was developed based on screening the protein data bank for small oligomeric proteins with defined rotational symmetry, with a central pore smaller than 5 nm, interfaces engineered to avoid steric clashes, flexible loops and termini oriented such that the C-terminus in     which forms a hexameric aggregate with a ∼3 nm pore ( Figure  12a). The protein was engineered such that the subunits were linked with a six-residue linker (Figure 12b), leading to the formation of the 2D honeycomb lattice shown in Figure 12c. Self-assembly was induced by addition of calcium ions, since each subunit coordinates one Ca 2+ ion. TEM observations confirmed the formation of the expected 2D honeycomb lattice (Figure 12d). Lanci et al. computationally designed a threehelix coiled coil peptide to form honeycomb (p6 symmetry) lattices, by designing a charged outer interface in a homotrimeric coiled coil (Figure 13a) to favor pairwise complementary electrostatic interactions between helices in neighboring peptides. 24 A single crystal structure for the designed protein confirmed the intended designed structure (Figure 13b). 24 Honeycomb lattices are discussed further in Section 5, since such structures have been observed (in the case of building blocks with flexible linkers) to curve into cage structures. 76 In a pioneering paper, Dotan et al. showed that cubic structures based on diamond lattices can be prepared by creating dimers of the lectin concanavalin A, which has a tetrameric structure. 79 The proteins were dimerized using bis-mannopyranoside, leading to a dumbbell shaped dimer which packs into a diamond lattice due to the imposed configuration of the protein subunits. As mentioned in Section 3, a modified cytochrome protein comprising dimers of C 2 -symmetric dimers via histidine-mediated Zn 2+ -coordination that forms nanotubes can also assemble into 2D and 3D crystal structures under slow nucleation conditions. 19 The protein ferritin is interesting for protein nanomaterials design due to its highly symmetric cage-like structure. Ferritins comprise 24 subunits which assemble into a pseudospherical shell with octahedral symmetry. Each subunit consists of a four α-helix bundle and a fifth short E-helix. 25 The E-helices form the C 4 -symmetric channels (Figure 14), of which there are six in a ferritin shell (along with eight C 3 axes). The channels in ferritin have sizes between 0.3 and 0.4 nm and allow the transport of small ions and molecules. Yang et al. expanded the C 4 -symmetric pore size ( Figure 14) in mature soybean seed ferritin (mSSF) by E helix deletion from the H-1 half of the subunits (Figure 14). 25 The expanded pore was able to accommodate poly(L-lysine) (degree of polymerization = 15), leading to tethering of the ferritin cages into a square array via electrostatic interactions (ferritin is rich in acidic residues). 25 Zhou et al. have exploited the symmetry within the subunits of a ferritin protein in order to substitute aromatic residues located near the C 4 symmetry axes (Figure 15a,b). 18 The substitution of phenylalanine (F) or tyrosine (Y) residues at a single site within each of the 24 protein subunits was designed to induce directional aromatic stacking π−π interactions. These were shown to lead to the formation of planar 2D lattices of ferritin molecules in the case of F-substituted proteins (Figure 15c) or 3D cubic lattices in the case of Ysubstituted proteins (Figure 15d) as shown schematically in Figure 15a. 18 Three-dimensional protein crystals can be developed as the basis of a new type of metal−organic framework (MOF). 21, 22 The Tezcan group has developed MOFs based on ferritin arrays, initially substituting a Zn 2+ -binding histidine residue at residue 122 near to the C 3 axis. 21 The ferritin proteins were then noncovalently linked using benzene-1,4-dicarboxylic acid, leading to bcc crystal structures. 21 This work was extended to other divalent metal ions and alternative dihydroxamate linkers, leading to a range of MOFs with body-centered cubic or tetragonal lattices. 22 Native ferritin (human heavy chain) forms a fcc lattice (Figure 16a−c) mediated by Ca 2+ K86Q interactions ( Figure  16d). Tezcan and co-workers have shown that this lattice can be used as a template for polymerization of a polymer hydrogel network and that by appropriate choice of a responsive polymer, polymer gels containing embedded ferritin lattices that can be reversibly swollen by change of ionic strength or pH (Figure 16e). 80 The polymer chosen was poly(acrylate-coacrylamide) which was prepared by free radical polymerization in the presence of APS (ammonium persulfate) and TEMED (tetramethylenediamine) as initiators and with N,N′methylenebis(acrylamide) as cross-linker, and NaCl was included to limit swelling during polymerization. Postpolymerization swelling was initiated by placement in deionized water (Figure 16e), leading to an increase in lattice parameter from a = 19 nm to a = 23 nm, determined by SAXS. The expansion was isotropic as confirmed by isotropic expansion of the faceted polyhedral gel crystals observed by optical microscopy. The gels exhibited self-healing behavior, for example cracks in the polyhedral crystals induced by ion-induced contraction

Biomacromolecules
Review were observed to spontaneously seal due to the dynamic nature of the bonds between polymer chains and ferritin proteins. 80 Complexes of a zinc phthalocyanine with eight cationic groups with a tetra-anionic pyrene derivative can bind to the anionic surface patches on ferritin (apoferritin), inducing crystallization into fcc packed cocrystals (Figure 17), with a lattice spacing a = 20 nm. 81 The crystals retain the photoactivity of the phthalocyanine dye molecules including fluorescence and light-induced singlet oxygen production.
Simply using electrostatic interactions between oppositely charged proteins, it is possible to produce cocrystals of avidin (which has net negative charge) and cowpea chlorotic mottle virus (CCMV) with a net positive charge, by mixing in aqueous solution. 32 The crystals had a bcc structure with lattice spacing a = 35 nm. The use of avidin further enabled the pre-or post-assembly functionalization of the crystals with biotinylated molecules such as fluorescent dyes, enzymes or gold nanoparticles. 32 In another example, binary crystals have been produced by cocrystallization of oppositely charged    ferritin, surface modified with either basic (arginine or lysine) or acidic (glutamic acid or aspartic acid) residues. 27 The binary crystal had tetragonal symmetry. Metal oxide nanoparticles could be sequestered within either or both protein cages. This concept was developed to show that the crystallization could be modulated by metal ion (Mg 2+ ) concentration, the binary tetragonal lattice at low Mg 2+ concentration being replaced with a unitary cubic lattice at high [Mg 2+ ]. 82 In a related study, binary crystal structures were fabricated using mixtures of anionic proteins and cation-coated gold nanoparticles. 31 The anionic proteins were cage-like ferritin (apoferritin or magnetoferritin) or CCMV, and these can encapsulate RNA or superparamagnetic iron oxide particles.
Electrostatic interactions can lead to the formation of a fcc lattice, as exemplified in mixtures of a P22 bacteriophage coat protein (virus-like particle, VLP) and a G6 (sixth generation) PAMAM dendrimer. 28,83 PAMAM dendrimers are cationic poly(amido amine) particles. The P22 bacteriophage coat protein (CP) was modified with a short anionic peptide (VAALEKE) 2 at the C terminus, producing an anionic particle. Mixing in appropriate proportions in suitable ionic strength conditions leads to the formation of a cubic lattice. Alternatively, amorphous aggregates could be prepared using a ditopic protein linker that binds the CP at multiple symmetry-specific sites. This linker can also be used to "cement" the ordered cubic structures formed in mixtures with PAMAM dendrimers, stabilizing the assembly against increase in ionic strength. 28 Fusion proteins of the P22 CP with the enzymes ketoisovalerate decarboxylase (KivD) or alcohol dehydrogenase A (AdhA) formed capsid structures similar to those of the unmodified CP. 83 The enzymatic activity was found to be retained in the G6 dendrimer-modified CP assemblies, enzymes being confined within the VLPs. In a similar fashion, PAMAM dendrimers can be used to produce binary crystal structures (with hcp or fcc structure) with ferritin. 33 The lattice constant is controlled by the size of the dendrimer (i.e., the generation number).

CAGE STRUCTURES AND POLYHEDRAL
NANOPARTICLES Many viruses and also some proteins such as ferritins 84 or carboxysomes 85 (involved in carbon fixation by bacteria) naturally form pseudospherical polyhedral cage structures. Clathrin-coated vesicles also have a cage structure, built from triskelion (three-arm) subunits of the Clathrin heavy chain (with bound light chains). 86 Clathrin can form tetrahedral mini-coat, hexagonal barrel or soccer ball structures in vitro. 86 A discussion of these structures, and those of viruses, is outside the scope of the present review, although examples of virus-like protein nanoparticle assemblies and of virus-derived assemblies are considered.
Controlling the association of coiled coil peptides by design has enabled the assembly of cage structures. Woolfson and coworkers designed a two-component system comprising a homotrimeric coiled coil linked to one of two heterodimeric coiled coils (containing complementary charged residues) through an external disulfide bond between cysteine residues (Figure 18a). 87 The building blocks are expected to form a honeycomb lattice, however due to the inherent conformational flexibility, closed shell structures termed SAGES, selfassembled cage-like particles, were observed with a diameter of approximately 100 nm. 87 Later, Ryadnov's group developed cysteine-linked homodimeric coiled-coils with three different faces such that complementary electrostatic interactions between neighboring dimers would favor formation of a honeycomb lattice (Figure 18b) or so-called tecto-dendrimer unit. 26 Again, curling up into virus-like cages was observed in practice, with a diameter of approximately 12−18 nm. The cage-like particles were able to transfect RNA and DNA. In related work, Castelletto et al. have prepared covalently linked "triskelion" three-arm peptides containing the self-complementary β-sheet sequence RRWTWE, based on a sequence from lactoferrin. 13 These associate to give honeycomb lattices which curve into cage structures or capsules, able to encapsulate and deliver siRNA, and with additional antimicrobial activity. This was ascribed to membrane pore formation, as imaged by AFM using model supported lipid bilayers. 13 Attaching coiled-coil peptides to the free C-terminus of a trimeric aldolase protein (KDPG aldolase from Thermotoga maritima) enables the design of cage-like assemblies by mixing homologues with complementary heterodimer-forming coiled coils. 88 After expression of the C-terminal extended aldolase in E. coli, TEM and AUC (analytical ultracentrifugation) confirmed the presence of small assemblies in solution, with typical diameters 10−20 nm. 88,89 A dimer was reported to be the most common assembled structure, although some tetrahedral and octahedral cages were detected. 89 This work was extended by using an esterase C 3 -symmetric trimer linked via flexible spacers to C-terminally attached helical peptides, designed to form tetrameric coiled coils. 5 The fusion protein was expected to form octahedral cage structures. The experiments confirmed the formation of such structures, provided the length of the spacer was sufficient, via mass spectrometry, AUC and TEM imaging.
Yeates and co-workers have produced nanocage structures from fusion proteins, using the concept shown in Figure 19. The fusion protein comprised trimeric bromoperoxidase and the dimeric M1 matrix protein of influenza virus, connected by a nine-residue helical linker. The fusion protein was expected to have a tetrahedral shape, favoring the formation of dodecameric cage structures, which indeed were observed by TEM, after recombinant expression in E. coli and preparation of aqueous solutions. 6 A crystal structure for the dodecameric cage structure was later obtained. 90 The authors also reported a fusion protein that forms helical filaments based on the M1 protein fused to carboxylesterase linked by a 5-residue α-helical linker. 6 In a similar fashion, fusion proteins designed to encode the information necessary to direct assembly have been used to produce 24-subunit cage structures, based on positioning of trimeric building blocks along each of the 3-fold symmetry axes of a tetrahedron. 7 The protein structure and interaction modeling software Rosetta 91 was then used to design the sequences at the interfaces of the building blocks, in order to enhance the stability of the interface through packing of suitable hydrophobic residues. In addition, structures were assembled from four trimeric and six dimeric building blocks aligned along the respective tetrahedral symmetry axes. After Figure 18. (a) Schematic for coiled coil peptide assembly designed to self-assemble into a honeycomb lattice (which is observed to curve into a cage structure). 87 Left: a homotrimeric coiled coil is linked via cysteine disulfide cross-linking to a homodimeric coiled coil. Mixing of either the top building block (center, green and red) termed Hub A with coiled coil module B (basic coil peptide, blue) or Hub B (center bottom 3-arm structure, green and blue) and module A (acidic coil peptide, red) leads to the formation of a honeycomb lattice (right). (b) Design of a dendrimer-like coiled coil peptide which forms a cage structure. 26  screening for solubility and compatibility with self-assembly, constructs were selected for experimental study. TEM images showed that the fusion proteins expressed in E. coli selfassembled into the designed structures in solution, and crystal structures were obtained for some of the assemblies. 7 Developing this concept, icosahedral protein cages have been created by design of 60-subunit fusion proteins using trimeric protein scaffolds arranged with icosahedral symmetry (i.e., arranging the trimer 3-fold symmetry axis to be coincident with the 3-fold axes of the icosahedron). 8 The distance from the icosahedron center and the rotation angle of each trimer about its axis were then optimized for close packing, minimizing steric clashes. The hydrophobic interfaces between the trimer building blocks were then filled by computerassisted design of amino acid sequences. Figure 20 shows cryo-TEM images along with reconstructions from the model design, confirming the icosahedral cage structure. 8 In an extension of this work, this group also presented 120-subunit icosahedral protein cages with sizes 24−40 nm in diameter based again on designed fusion proteins, but using heteromeric components. 9 Combinations of distinct building blocks among dimers, trimers and pentamers (according to the icosahedral symmetry elements) were used, for example 12 pentameric and 20 trimeric building blocks aligned along the 5-fold and 3-fold icosahedral symmetry axes can produce an icosahedral protein cage, which can also be constructed from combinations of pentamers and dimers or trimers and dimers. 9 In a parallel development of the helical oligomer fusion strategy, a cubic cage-forming structure was designed and expressed in E. coli. 10 The intention was to create a porous material, resembling a MOF, although long-range cubic ordering was not observed. The fusion protein comprises trimeric KDPG aldolase (the same used by Patterson et al. 88 ) linked via a four-residue helical linker to the dimeric domain of protein FkpA ( Figure  21a). The designed 24-subunit cage structure with octahedral symmetry is shown in Figure 21b. Single crystal X-ray  diffraction and TEM confirmed the presence of the cubic cage structures after incubation of solutions of the fusion protein, although 12-mers, 18-mers and 24-mers were also detected by mass spectrometry analysis, TEM and SAXS. 10 Ferritin, which is widely used to prepare protein lattices as discussed in the preceding section, is also a cage-like protein. 92 Mutants have been engineered with Cys residues in metalbinding domains in order to sequester gold formed by reduction from Au 3+ ions. A crystal structure of the cage with bound gold was obtained. 92 Modification of ferritin nanocages by attachment of PEG facilitates penetration of the nanoparticles into tumor tissue and airway mucus. 93,94 The PEG surface coating density was optimized by mixing highly PEGylated ferritin (attached via surface amines) with the native ferritin by disassembling the proteins and then reassembling using pH control. The anticancer drug doxorubicin was conjugated to PEGylated ferritin via an acid-labile linker as a therapeutic delivery vehicle. 93 The size of protein cages can be tuned by modification of the surface charge, as exemplified by recent work on the capsid-forming enzyme AaLS which in its native form adopts an icosahedral shape (60 subunits). 95,96 Directed evolution led to a supercharged luminal capsid surface, able to better encapsulate oppositely (positively charged) cargo, in particular HIV protease, with an expansion in cage size corresponding to 180 or 240 subunits. 95 The structure of the expanded supercharged cages was investigated in detail using cryo-TEM and was found to comprise tetrahedrally-and icosahedrally-arranged pentameric units. 96 By mixing negatively supercharged AaLS with cationically supercharged ferritin, nested cage structures are obtained. 29 This is a good example in which tuning of electrostatic interactions on protein surfaces can be used to create new assemblies, in this so-called Matryoshka-type structures.
Self-assembling peptide nanoparticles (SAPNs) have been designed based on peptides that contain two α-helical domains linked by a two-glycine residue spacer, one of the oligomerization domains comprises a coiled coil that forms pentamers, while the other is from a trimeric coiled-coil domain ( Figure 22a). 97,98 The peptides are positioned to lie on the C 5 and C 3 symmetry axes respectively of an icosahedron or dodecahedron. The nanoparticles containing 60 or 180 peptides were modeled based on an icosahedral structure (Figure 22b). 98 The former nanoparticle structure is favored for a de novo designed sequence containing cysteine residues (for which there is the potential for disulfide cross-linking) 97 whereas the latter results from a modified construct with alanines replacing the cysteines and with extended terminal domains. 98 The systems form roughly spherical nanoparticles with a diameter of 16 nm (for the 60 subunit protein) 97 or 27 nm (for the 180 subunit protein). 98 In an extension of this

Biomacromolecules
Review research, variant SAPNs were prepared and characterized by SANS, STEM (which enables molar mass estimation) and DLS. 99 Based on the determined particle size (the core radius from SANS was 35−37 nm) and molar mass, it was proposed that these larger nanoparticles contain 240, 300, or 360 peptides, which were modeled as virus-like polyhedra. 99 Fusion of a de novo designed protein that forms a dimeric folded four-helix bundle with a trimeric domain from T4 bacteriophage fibritin leads to oligomers comprising multiples of 6-mers, as shown in Figure 23. 11 Fitting of SAXS data enabled the envelope shape of the aggregates in solution to be obtained, which indicated the presence of tetrahedral and barrel-shaped assemblies. 11 Virus-like particles (VLPs) have been modified to create nanoreactors, based on enzymes incorporated as fusion proteins with the scaffold proteins (SPs) which form the inner shell of viruses, which are surrounded with coat proteins (CPs). This is exemplified by the N-terminal conjugation of alcohol dehydrogenase (AdhD) to the SP of bacteriophage P22 ( Figure 24). 12 A P22 VLP is composed of approximately 420 copies of a 46.6 kDa coat protein (CP) that assembles into an icosahedral capsid with the aid of approximately 100−330 copies of a 33.6 kDa scaffolding protein (SP). In the AdhD-SP conjugate, the C-terminal α-helical scaffold protein facilitates coassembly with the P22 CP, leading to particles indistinguishable from those of native P22. The AdhD gene is inserted into the pET11 expression vector (Figure 24). The catalytic activity was maintained, furthermore since P22 undergoes structural transitions on heating which lead to expansion or pore formation, the accessibility of the tethered enzymes can be adjusted thermally. 12 In a parallel study, encapsulation of thermostable CelB glycosidase inside the P22 capsid was demonstrated using the same concept, again with no loss of enzyme activity and without impairing the ability of the P22 to undergo thermally induced morphology changes. 100 The packaging of fluorescent proteins on the interior surface of P22 VLPs was demonstrated in a similar fashion. 101 The concept was later extended to incorporate multiple (2 or 3) fused enzymes, including CelB and dimeric ADP-dependent glucokinase and also monomeric AT-dependent galactokinase in the 3-enzyme construct. 102 These enzymes can catalyze a cascade of coupled reactions, demonstrated with lactose as substrate. The activity of all encapsulated enzymes was confirmed, and the kinetic parameters were measured.

CONCLUSIONS AND FUTURE PERSPECTIVES
In summary, a variety of approaches have been successfully demonstrated to assemble proteins into defined aggregates including cages and 1D, 2D or 3D structures. Protein assembly can be induced by noncovalent interactions such as metal-ion mediated pairing or hydrophobic side-chain interface engineering or electrostatic interactions using modified proteins or by de novo design of proteins. Protein mutants can be created exploiting C-or N-terminal modifications or site-selective modifications, utilizing suitable residues such as cysteines located with respect to protein subunit symmetry axes. A range of natural multi-subunit proteins can be used for this purpose, there being a range of proteins with suitable C 2 , C 3 , C 6 , D 2 and D 4 subunit symmetries, among others, which can be used to produce 2D and 3D lattices, while an essential element of large

Biomacromolecules
Review cage structures is the inclusion of pentameric proteins in the design. It is possible to produce cage and 2D and 3D lattices with a remarkable degree of precision in the ordering using protein assemblies, provided appropriate design rules are followed.
It will be interesting to follow further research developments that lead to the design and creation of novel lattice structures (and possibly aperiodic quasicrystals). Perhaps inspiration can be taken from the field of DNA origami, utilizing stronger covalent interactions (such as multiple hydrogen bonds between nucleic acids) than have been exploited thus far. Other superstructures such as multiring (and interlinked) assemblies can be envisaged in analogy with the field of rotaxanes, with the related challenge to construct novel protein motors, inspired or distinct from natural ones.
Coiled coil proteins/peptides are an attractive design unit for simple de novo designed assemblies including polyhedral particles, ring structures, planar lattices and linear assemblies although coiled coils are combined with other elements to create cage and 3D lattice structures. On the other hand, assemblies based on natural proteins such as enzymes can enable the potential exploitation of the native function, for example biocatalysis. Native and mutant proteins can be produced recombinantly (commonly using E. coli expression vectors), leading to the potential to scale up the synthesis.
An alternative method to produce functional protein-based materials is to use protein assemblies as templates or scaffolds, as exemplified by the modification of polyhedral or rod-like virus capsids with desired function by engineering of the coat or scaffold proteins. Another example is the use of bacterial Slayers to produce two-dimensional protein arrays, modified to enable metal templating or to create planar catalysts by positioning enzymes. Materials with remarkable catalytic and optoelectronic properties have been engineered in this way. These may have a role in addressing important challenges, for example in photocatalytic water oxidation or in CO 2 fixation, as discussed above, and related applications in clean energy generation can be envisaged, by choice of appropriate enzymes. Since enzyme cascades have important roles in vivo, their engineering using protein assemblies is also an exciting avenue for future developments.
As well as applications in biocatalysis, protein assemblies have potential in the creation of novel porous materials for separation and cage-like structures can be used to encapsulate and deliver cargo such as drug molecules, in a targeted manner (exploiting or modifying the protein coating to target particular cell functionalities). Alternatively, the intrinsic properties of such particles could be used to induce immunogenicity, with the potential additional benefits arising from self-adjuvant properties. Another class of therapeutic approaches may involve the modification of the assembly pathway of protein superstructures such as microtubules, which is the basis of Taxol's anticancer activity. There are many related examples of protein structures (e.g., extracellular protein assemblies, ion pumps etc.) involved in disease progression which have not yet been targeted.
As yet, there are few examples of dynamic engineered protein assemblies, although in one recent example it has been demonstrated that a transition in 2D lattice structure of RhuA variant crystals (discussed in detail in Section 4) can be achieved by vigorous mixing and sedimentation (or by reversible Ca 2+ -induced switching). 77 There is considerable scope to produce new responsive materials by incorporating biological motor protein elements (myosins, dyneins, ATPase etc.). This is an area with great potential to produce innovative active biomaterials.
Considering the impressive examples outlined in this Review, it should be clear that protein materials are very promising components of next-generation structural and functional biomaterials based on the unprecedented diversity of structures and properties that have evolved in natural proteins or can be designed into de novo constructs. ■ REFERENCES