Size-Tunable Nanoneedle Arrays for Influencing Stem Cell Morphology, Gene Expression and Nuclear Membrane Curvature.

High-aspect-ratio nanostructures have emerged as versatile platforms for intracellular sensing and biomolecule delivery. Here, we present a microfabrication approach in which a combination of reactive ion etching protocols were used to produce high-aspect-ratio, nondegradable silicon nanoneedle arrays with tip diameters that could be finely tuned between 20 and 700 nm. We used these arrays to guide the long-term culture of human mesenchymal stem cells (hMSCs). Notably, we used changes in the nanoneedle tip diameter to control the morphology, nuclear size, and F-actin alignment of interfaced hMSCs and to regulate the expression of nuclear lamina genes, Yes-associated protein (YAP) target genes, and focal adhesion genes. These topography-driven changes were attributed to signaling by Rho-family GTPase pathways, differences in the effective stiffness of the nanoneedle arrays, and the degree of nuclear membrane impingement, with the latter clearly visualized using focused ion beam scanning electron microscopy (FIB-SEM). Our approach to design high-aspect-ratio nanostructures will be broadly applicable to design biomaterials and biomedical devices used for long-term cell stimulation and monitoring.

This means that by definition cells aligned to the x-axis are mapped to the same value (0), hence why the peak looks roughly twice the height of those at 90. During imaging for the image-based cell profiling there was a slight uncertainty in the registration of the nanoneedle array to the microscope camera/image frame, hence a slight broadening in peaks in some cases (e.g. the difference in height between the substrates iv and v may not be significant).         Representative fluorescent images of stained lipid (red) in hMSCs cultured on flat, nanopillar and nanoneedle substrates, and quantification of extracted stain using absorbance measurements (mean ± SD, N = 2). This shows that all substrates were able to support chemically stimulated adipogenesis, and there was no evidence of material-driven differentiation under basal conditions. Scale bars = 400 µm.
(b) Alizarin Red S staining used to visualize calcium deposits after 21 days of osteogenic differentiation. Representative digital camera images of stained calcium (red) deposited by hMSCs cultured on flat, nanopillar and nanoneedle substrates, and quantification of extracted stain using absorbance measurements (mean ± SD, N = 2). This shows no evidence of material-driven differentiation under basal conditions, while flat and nanopillar substrates were able to support osteogenesis. Nanoneedles, meanwhile, showed greatly reduced calcium staining, suggesting that this material substrate impairs differentiation down this lineage. Scale bars = 4 mm.

EXPERIMENTAL METHODS
Stiffness of nanopillars and nanoneedles. The theoretical stiffness and deformation profile of nanopillars and nanoneedles were solved using Euler-Bernoulli beam theory. The governing equation relates the applied moment (M), the elastic modulus (E) of the material, and the second moment of the area (I), to the deflection (v) of the beam: Each of the parameters (M, E, and I) can vary as a function of position (x, y, z) along the beam. Since the system was modeled for a concentrated load at the apex of the structure, a position-dependent moment is developed, i.e. M(x). The moment is evaluated as: ( ) = • , where F is the concentrated load and x is the distance from the applied load. The elastic modulus (E) of silicon was assumed to remain constant throughout the etching process; thus, E is taken to be 129.5 GPa, which is the provided value from the manufacturer. Finally, the second moment of the area, which is a geometric factor, changes along the axis of symmetry, i.e. I(x). For a solid conical structure, I(x) is a function of the tip diameter (A), base diameter (B), and length (L): Implementing these conditions, the governing equation for deflection of a solid conical structure becomes: The following boundary conditions were applied: Eq. 4 states that at x = L the slope of deflection is equal to 0. Furthermore, Eq. 5 states that the deflection at x = L is equal to 0, i.e. no deformation at the base of the structure. Following integration and implementation of the boundary conditions, the solution for the deflection of a solid conical beam is: The solution was validated using 3 different methods. The first step was to set A = B, i.e. a cylindrical beam, and to compare the beam deflection of Eq. 6 to the well-known solution [1] : Figure S12 demonstrates perfect agreement between the solutions. The second validation step was to compare this generalized solution for deflection at any point along the length of a conical beam to the solution of McCutcheon [2] , which only gives the apical deflection of a solid cone.
The subscript (A) denotes properties of the tip, i.e. tip deflection (vA) and apical second moment of the area (IA). Figure S13 again demonstrates perfect agreement with the known solution. The final validation step was to demonstrate that the solution is accurate for any point along the length of the beam and thus gives a deflection profile of the entire beam. To achieve this the beam was modeled, meshed, and solved using FEBio [3] , an open-source finite element solver. Each beam was meshed with 100 elements in the longitudinal axis (x) and 64 radial elements. The material was modeled as an isotropic linear elastic solid with a Poisson's ratio of 0.3, and physical dimensions: L = 1 mm, A = 0.1 mm, B = 0.2 mm.
A traction force (F = 0.01 N) was imposed at the apex of each cone with the base remaining fixed (zero displacement and rotation). Deformation profiles for each element along the longitudinal axis are plotted in Figure S14 and demonstrate good agreement. It should be noted that discrepancies do exist between the two solutions due to the modeling assumptions, e.g. Euler's solution assumes infinitesimal strains and small rotations. Therefore Eq. 6 and any solutions based on Euler's beam theory are only valid for small deflections. While it is not proven here, we assume that the hMSCs produce small traction forces and thus small beam deflections allowing us to use Eq. 6. High-content screening of cell images and modeling of cell features. The following describes both the image analysis, subsequent data handling, and then modeling work. This was carried out broadly in two sections: 1. Automated image analysis using a custom image-analysis pipeline in CellProfiler (CellProfiler 3.0.0, Broad Institute). [4] 2. ata analysis and modeling using custom scripts written in R (R 3.5.2, R Core Team: http://www.Rproject.org/).
Image-based cell profiling has been conducted as stated below: Pre-processing of images: Each microscope image was pre-processed using ImageJ, to split each combined field-of-view into separate image files for each channel, using a macro written in ImageJ. [5] Pipeline construction: Immunofluorescent microscopy images were imported into CellProfiler. Initially, a small test set of images were manually pre-selected, representative of each of the experimental conditions to manually tune the pipeline parameters. In the following, parameters were tuned interactively using CellProfiler's graphical user interface.
Summary of pipeline (bold text refers to binary objects generated by the pipeline):  File import: o Metadata is assigned to image files, channels are mapped to their respective fields-of-view.
o Pixel intensities are rescaled to the range 0 and 1 for calculation purposes (default approach for CellProfiler).
 Primary segmentation of nucleus o Cell nuclei were identified from the image of DAPI fluorescence. The thresholding method was a two-class adaptive Otsu threshold. An adaptive thresholding approach, using a window size of 50 pixels, was qualitatively observed to be more robust across a wide variety of substrate types.
o Objects smaller than 4.1 µm and larger than 20.3 µm were considered to be a segmentation error and discarded.
 Secondary segmentation of cell body o The cell body for each cell was identified by using the nucleus as a kernel to perform a secondary segmentation on the tubulin  channel image.
o A global minimum cross-entropy thresholding approach, combined with a propagation algorithm to help distinguish between clumped objects, was found to give the best results.
o The threshold of this step was tuned so that the resulting binary object captured the majority of the cell body region, but not necessarily all protrusions.
 Secondary segmentation of cell body + cell protrusions o Using the cell body as a kernel, another secondary segmentation was performed on a maximum projection of the actin, tubulin , and YAP channels. Prior to the projection, all channels were normalized to their maximum/minimum values to maximize the contribution of each to the projection. This projection was only used for this segmentation, not for any subsequent intensity measurements.
o Segmentation was performed using an adaptive Otsu algorithm, two-class, with a windowsize of 10 pixels.
o This object incorporates both the cell body and total protrusion area. o In addition, the required threshold level was also recorded. were measured for each of the binary objects defined above. Additional measurements made by the pipeline, but not included in analysis:  Perinuclear area determination: o Commonly, the perinuclear area, as defined by a fixed-width ring surrounding the nucleus, is used to normalize for cell thickness variations. [6] o For highly rounded or elongated cells on nanoneedles, in particular where the cell nucleus is frequently collocated with the cell membrane in highly elongated cells, it was found particularly challenging to consistently define a peri-nuclear without introducing segmentation artifacts, hence this approach was not used here.  o Initial tests identified that while many features, such as intensity texture parameters, provided good class separation during linear discriminant analysis, these features were also convoluted with the appearance of the nanoneedle array in the underlying image data.
Hence it was unclear if the signal being measured was from the cell texture, from the pattern of nanoneedles, or most likely, a mixture of both.

Batch analysis:
Due the number of images involved, pipeline processing was sub-divided into smaller batches using a custom script written in Microsoft Powershell. Each batch was called as single command-line call to a new CellProfiler instance, which was automatically assigned asynchronously to individual logical processor cores of a server (Microsoft Windows Server 2012 R2, Intel Xeon CPU E5-2630 @ 2.6 GHz with 12 logical cores, 96 GB RAM). Analysis took approximately 19.5 hours, after which the individual data files were concatenated using another PowerShell script. This resulted in a single csv file for each binary object, plus additional folders and files containing image and pipeline metadata.
General note on data plots derived from measurements from image-based cell profiling: Data shown in the manuscript was exported from R as csv files and plotted using OriginPro. Other plots were generated using the package ggplot2 [7] and ggcorrplot (https://CRAN.R- o ggplot2 (3.1.0), [7] o dplyr (0.  caret (6.0-81), [9]  ggcorrplot (0. o Any cell where the median intensity of either the nucleus or cytoplasm, on any of the actin, tubulin , or YAP channels, was less than double the channel background intensity were removed due to the poor signal-to-noise ratio.
o Finally, all object datasets were cross-referenced to remove any cell that did not have a complete set of measurements across all features and objects, to ensure one-to-one mapping between datasets in the analysis.
 Image and cell numbers:  The measurement modules included in CellProfiler produce a very large number of data fields per cell (> 1,500 for the pipeline used here). Not all of these are useful, and some features are highly correlated, which can cause significant problems in the proper modeling of the data. Some features are discrete, or cannot be appropriately transformed for the modeling undertaken here (for example the Euler number of the binary objects). These features were excluded from the analysis. Hence, the number of features was reduced by selecting those which form a complementary set of data about each cell. Table S1 details the features selected.

Data transformation and normalization:
Many of the measured features, in particular the shape parameters, have highly skewed distributions. Such distributions violate the assumption of normally distributed data for many statistical techniques.
Measurements were transformed to become more normal-like by applying a generalized logarithm function, as described by Laufer et al. [10,11] : where x is the data point to be transformed and c is a scaling factor. The scaling factor is the third percentile of the empirical distribution function of the measured variable. At = 0 this equation is the same as a normal logarithm, but for values of > 0 this function transforms the output for values of < 0 , avoiding infinity errors in the subsequent logarithm.
In addition, a robust Z-score was used to normalize batch-to-batch variations in the three technical replicates included in the image-based cell profiling. Using the flat substrate of each replicate as the control, a robust Z-score for each feature was calculated as: Robust Z-score = (Value of data point − Median of the same variable on a flat control) (Median absolute deviation of the same variable on the flat control) The robust Z-score uses the median and median absolute deviation, rather than the mean and standard deviation typically used in a Z-score, to minimize the effect of outliers.

Identification and removal of highly-collinear features:
Before the data can be modeled, highly correlated features should be removed. Manual feature selection (described above) helps to reduce the risk of multicollinearity between features during modeling. However, this alone may not identify all strongly correlated features. Failing to remove these features can result in highly unstable models, making their interpretation meaningless. Multiple techniques exist for tackling this problem, here we use an approach discussed by Caicedo et al. [10] The Spearman's rank correlation coefficient was calculated for all pairs of features, an example is shown in Figure S13 Table S1 for the full list of input features, and Supplementary Table 2 for list of which features were kept in each model.

Linear discriminant analysis model construction and validation:
We are most interested in the ultimate behavior of cells on the substrate, so the modeling here considered only cells from the 72-hour timepoint. The following analysis uses the cleaned, transformed and normalized data, as described above.
Linear discriminant analysis (LDA) was used here, as we have a number of non-metric classes (different sharpness nanoneedles and a flat control) with a large number of metric variable measurements, plus the resulting model can be interpreted to infer information about the underlying features. [12] LDA acts to maximize the separation between data points belonging to different classes (in this case different substrates). It does this by creating one or more discriminant functions, that can be used to score each cell. Each function is a linear combination of variables, where each variable here represents one of the features that was included in the model (e.g. cell area, nucleus solidity, etc.). Each variable is weighted by a coefficient, which represents the degree to which that variable contributes to the discriminant function. For a given cell, the discriminant function scores represent the class the cell either belongs too (during supervised training) or has been assigned to (during model validation).
To build each model, the relevant dataset was randomly split into two halves. Fifty percent of the cells were designated as a training dataset. The remainder were designated as the test (also known as holdout) dataset. An LDA based classifier was trained on the training dataset, using the MASS R package. [9] The trained classifier was then used to predict the class of each cell in the test dataset. This approach allows the specificity and sensitivity of the model to be directly assessed, by measuring how many cells were correctly and incorrectly classified, as we ultimately know which substrate they belonged to. As a further validation, each model was trained again on a dataset with randomized class labels, which resulted in a model with no significant separation of the test dataset. The stability of the model was assessed by running the script three times, using different seed values for the random number generator in R. The random number generator is used to sample data values to create the training and test datasets, so this is a check to make sure that our interpretation of the results is valid for more than just a single training/test dataset. The overall accuracy of the five-, three-and two-class models, remained within 2 % agreement for all three iterations, i.e. building the model from a sample of the data doesn't overly affect the final model accuracy. Figure 2c in   Two-class model -incorporating cells from two substrate types (sharp nanoneedles, flat substrate).  In each case, the optimum number of discriminant functions for each model was one minus the number of classes, i.e. four, two and one respectively, as checked using the caret R package. [9] Figure S15 shows the confusion matrices for each of the three models. These matrices represent how many cells were correctly identified, or if misclassified, into which class they were incorrectly assigned. The three models together illustrate how it is relatively easy to identify cells on flat, nanopillar and sharp nanoneedles (the three extreme cases), classification becomes considerably harder for sharp, and blunt types of nanoneedles. This is consistent with flat, nanopillar and sharp nanoneedles being the most distinctly different substrates. Further details about the specificity and sensitivity (figures of merit derived from the confusion matrix) for each model and each class is shown Tables S3 to S5). Figure S15. Confusion matrices (heatmaps) for the five-, three-and two-class models respectively.
The number and color fill of each square indicates the number of cells that were classified to a given class.
Interpretation of the LDA model: In order to simplify the interpretation of the discriminant functions, discriminant loadings were calculated. For each model, the scores assigned to each cell in the test dataset were correlated with the original measurements of each feature to produce the discriminant loading. These loadings represent the strength of the correlation between the discriminant function and the underlying data.
The five-and three-class models are described by more than one discriminant function. To further aid the interpretation of these models, each discriminant loading was weighted by the relative contribution of that discriminant function. For example, in the three-class model, the first discriminant function represents accounts for roughly 90% of the between-class variance, hence the discriminant loadings attributed to this function contribute more to the overall class separation.
Here we use a weighted approach reported by Hair et al. referred to as the potency value: [12] Potency value of variable on function = (Discriminant loading ) × Eigenvalue of discriminant function Sum of all eigenvalues across all significant functions which can be combined to form a composite potency index for each variable: Composite potency index of variable =

Sum of potency values of variable across all significant discriminant functions
The composite potency index is a relative measure of the importance of a given variable to the overall discriminant functions. Table S1. Description of main image-based cell profiling features included as initial inputs to the LDA model Calculated as:

× Cell body area perimeter
The form factor equals 1 for a circle, and less for irregularly shaped cells. Cell mean radius Calculated as the mean of all the distances between every pixel inside the cell body and the nearest pixel outside of the cell body.

Cell perimeter
Perimeter of the cell body.
Cell major axis length An equivalent ellipse, is fitted to the cell body. The properties of this ellipse are used to determine a number of parameters, including major and minor axis length, and the orientation of the cell with respect to the image frame.
These values provide information about the size, elongation, and relative orientation of cells.
Note: measures of orientation are not included in modeling due to uncertainty in absolute orientation, but the description is included here for completeness.

Cell minor axis length
Cell orientation

Cell solidity
The convex hull of the cell body is calculated. Here, the convex hull can be visualized by imaging a rubber band stretched around the cell, defining a region that fully encloses the cell body. The solidity is then the ratio: Cell body area Cell body convex hull A solidity of 1 represents a cell where the edge does not fold back in on itself. Irregularly shaped cells have solidities of less than 1. Nucleus morphology

Nuclear area
As above, but for the nucleus. Nuclear compactness Nuclear eccentricity Nuclear extent Nuclear form factor Nuclear mean radius Nuclear major axis length Nuclear minor axis length Nuclear perimeter Nuclear solidity Cell body + cell protrusions morphology

Maximum radius
The maximum distance from any pixel inside the cell body + cell protrusions to a pixel outside of the object. Here, it represents a measure of maximum cell size. Cell protrusion ratio As defined above.

Cell body intensity CTCF actin intensity
Corrected total cell fluorescence, as defined above. CTCF tubulin  intensity CTCF YAP intensity Mass displacement actin For a given channel, this represents the distance between the center of mass of the greyscale intensities inside the cell body, and the geometric center of the cell body.
A mass displacement of zero indicates the majority of the measured intensity is clustered perfectly in the center of the cell. Larger values suggest the intensity is clustered towards the edge of the cell body.

Mass displacement tubulin  Mass displacement YAP
Nucleus intensity YAP ratio As defined above.
Cytoplasm intensity Actin/ tubulin  ratio As defined above.
Voronoi cell area Local cell density As defined above.

Fiveclass
Cell eccentricity, cell major axis length, cell mean radius, cell minor axis length, cell solidity, local cell density, cell protrusion ratio, cell CTCF actin, cell CTCF tubulin , cell mass displacement actin, cell mass displacement tubulin , cell mass displacement YAP, cytoplasm ratio of actin to tubulin , nuclear extent, nuclear form factor, nuclear minor axis length, nuclear solidity, nuclear/cytoplasm YAP ratio Threeclass Cell eccentricity, cell major axis length, cell mean radius, cell minor axis length, cell solidity, local cell density, cell protrusion ratio, cell CTCF actin, cell mass displacement actin, cell mass displacement tubulin , cell mass displacement YAP, cytoplasm ratio of actin to tubulin , nuclear extent, nuclear form factor, nuclear minor axis length, nuclear solidity, nuclear/cytoplasm YAP ratio Twoclass Cell eccentricity, cell extent, cell form factor, cell major axis length, cell mean radius, cell minor axis length, cell local cell density, cell protrusion ratio, cell CTCF actin, cell mass displacement actin, cell mass displacement tubulin , cell mass displacement YAP, cytoplasm ratio of actin to tubulin , nuclear extent, nuclear form factor, nuclear minor axis length, nuclear solidity, nuclear/cytoplasm YAP ratio