STATISTICA

By Deb Watson (Mar. 2015)

Before you start

  • First, get your excel spreadsheet formatted properly.  Do not have any formulas or equations, these will leave strange blanks spots in your Statistica workbook.
  • Example (P15 and Adult Stats tab in Adult + Developmental SER Summary for 2011 CLM Retreat SfN.xlsx)
  • Import the sheet from excel, changed "Files of Type" to Excel Files.  Select your workbook... Open; then Import Selected Sheet to a Spreadsheet, pick the one you are interested in from those that appear... OK, Check "Get variable names from first row" and "import cell formatting", OK
  • If you get an error about text values, just check the box "Do this for all variables when the data is text" then click "Import as Text Labels"

Basic statistics: T-tests

  • Statistics... Basic Statistics...  T-test, independent, by groups...   then pick your Dependent Variables and Grouping Variable (I used "Age"); you will get a print out that should look like this:

Basic Statistics: Descriptive Statistics

  • After the spreadsheet is formatted and imported
  • Statistics… Basic Statistics… OK

 

  • Select the variable of interest (example Sp/um)
  • Switch to the Advanced Tab and select any additional measures (i.e. Median).  Add any By Group measures as needed (i.e. Condition). Then click Summary.

 

  • For Skewness analyses, switch to the Normality Tab, increase Number of intervals to 40, add a check mark to Shapiro-Wilks W test (if sample size is small).  Add any By Group measures as needed (i.e. Condition). Then click Histograms.
    • The Histogram will also give you p-values for KS stat & Lillefors
      • Large samples (n>300) don’t use Shapiro-Wilks because it’s too sensitive
      • KS uses theoreticals, while Lillifors uses your own data to decide.

 


Distribution Fitting

  • If the above descriptive statistics measures have led you to determine that your data is skewed, it might be worth further investigating as follows.
  • Statistics… More Distributions… ‘Fit Distribution’… OK

  • Under the Quick tab, click Variables… and select your continuous and/or discrete variables… OK… OK
    • For the majority of the postsynaptic data – the measurements are ‘continuous’ (can have an infinite number of values between two points, i.e. PSD area or SpHD) NOT discrete (count data, 0, 1, 2 etc – i.e. docked vesicles)

 

  • First click on “Summary Graph” to create a paneled window with your histogram & normal ‘red’ line and the KS & Lillifors statistics, the Normal P-Plot, Descriptive Statistics and a box plot
  • Next click on “Summary Statistics” for the complete Descriptive statistics table
  • “Distribution Summary Statistics” – which will provide the KS statistic and its associated p-value, and the Anderson-Darling (AD) statistic and it’s p-value
  • Finally, generate a Q-Q plot (you want these points to line up along the ‘red’ diagonal line) with minimal deviation at the tails
  • After you generate all of these items for your non-transformed data (i.e. SpHD), repeat on your transformed variable (nl SpHD) – ideally any issues of normality are resolved and you can proceed with parametric statistics.  If not – you are stuck with non-parametric tests (Kruskal-Wallis)


Factorial ANOVAs -  Used for Dendrite Data (i.e. sp/μm)

  • After the spreadsheet is formatted and imported
  • Statistics… ANOVA… Select either One-way or Factorial ANOVA… OK

 

  • Select the variables to be analyzed – Dependent is the variable that was measured or quantified (i.e. Sp/um) and Categorical Predictors (factors) are the independent variables (i.e. Experiment and Condition)

 

  • Then Click on Factor Codes, first select Zoom to make sure all of the categories are listed.  For example under condition, it should say Control and TBS.  Click OK, then ALL. Then OK again.  For this type of analysis leave Between Effects alone and run the Full factorial.  Click OK one last time.
  • Click More Results to expand the Menu, from this:
  •   To look like this:  
  • First switch tabs to the Summary Tab and click on Test all Effects to generate an ANOVA table

 

  • Switch to the Mean tab to generate your means and graphs.  Then click All Marginal Tables on the 2nd row – by Observed, Weighted.  Using Unweighted will give you the WRONG standard errors.  DO NOT USE!  If you want to plot an effect switch the drop down as necessary to examine the effect in question (i.e. Expt, Cond, or Expt*Cond).  Be sure to check Show/plot means +/- standard errors, otherwise your plot will have confidence intervals not SEMs.  Then click Plot (again on the second row).

  • Finally, if you have any significant interactions to examine, switch to the Post-hoc tab.  For Main Effects, there is no point in running post-hocs, since you know the effect is significant or not from the ANOVA table and the means have already been generated.  The post-hoc test is only for evaluating which means are different within an interaction.  Just like with the Means tab, you have to pick the effect you want to examine from the drop down menu.  Finally click on the test you want to use – in this case – Tukey HSD.

  • You can return to the Variable Menu to run the next comparison by clicking Modify on the Right.
  • The next important step is to check your Assumptions (Independence, Normality, Homogeneity of Variance; see below)

Hierarchical Nested ANOVAs – Used for Spine Dimensions and PSD area

  • These steps are very well described and illustrated in the SOP that KH created (Statistica Nested Anova Analysis.doc).  They are only summarized briefly here.
  • Import the Excel sheet you need to analyze (see steps above if you need to Import as Text Values)
  • Statistics… Advanced Models… General Linear… General Linear Models…

 

  • Select Cases if you want to analyze by a specific number (i.e. spines with head diameters < .55 um; or v40<.55) or if you want to sort the parameter by specific text, (Cross-sectioned synapses; or v25=‘xs’)
  • Ok
  • Variables…
    • Dependent (PSD area, “qntCONfa” or Delta PSD area, “Delta qntCONfa” or Spine Length, “ProtLen” or Delta spine length, “Delta ProtLen”) – Y axis
      • If you data have been determined to be skewed, you should normalize your data – with a log normal transform (http://www.unm.edu/~marcusj/datatransforms.pdf).
      • If the skew is now eliminated – proceed with parametric analysis, but remember to check the homogeneity of variance (see below).
    • Categorical (Experiment or Animal, Condition, Unique dendrite number, “Expt, Cond, CodeDenNum)
    • Continuous (none) – Only use for ANCOVAs
    • Ok
  • Factor Codes (click ALL for each)
  • Between Effects
    • Click the radio button for “Use Custom Effects for the between design”
    • Highlight all 3 on the left underneath “Predictor Variables Categorical”
    • Click “Hierarc. Nest” underneath “Method”
    • Under condition, “Cond”, highlight Expt

– Nesting each condition within experiment

  • Under unique dendrite name, “CodeDenNum”, highlight both Expt and condition) – Nesting each dendrite within experiment and condition
  • Finally to examine the main effect of condition, you need to click on Cond on the left (highlighted blue below) and then click Add under Method.
  • Ok, Ok, Ok
  • Ignore Missing Data warning, Ok
  • Click More results
  • If you want to analyze by individual groups, click By Group, Grouping Variable (t-post).  I’ll probably do my analyses by time-point using individual spreadsheets.
  • Summary… Test all effects
  • Means… “Plot or show means for effect:” (select any of the options you want the means).  Be sure to click on “All marginal tables” near “Observed, weighted”. To graph a difference, click on the “Plot” button by “Observed, weighted”.  Be sure to check the box “Show/plot means +/- standard errors.
  • Post-hoc… “Effect:” (select any effect that you want to explore with a post-hoc test from the drop-down list). Tukey HSD.


  • The last step is checking that your assumptions are correct: (Good website) - You have to take out the effects of all the Xs before you look at the distribution of Y.  (I just realized that you do this to your entire data set across groups)
  1. Independence of observations – this is an assumption of the model that simplifies the statistical analysis.
  2. Normality – the distributions itself and the distribution of the residuals are normal.
  3. Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same.


How to test?  You can check all three with a few residual plots–a Q-Q plot of the residuals for normality and a scatterplot of Residuals on X or Predicted values of Y to check 1 and 3.


  • How/Why we Test Assumptions (Much of the info was gleaned from the “Elementary Concepts in Statistics” section within STATISTICA’s help menu)
    • Homogeneity of Variances (i.e. the groups have the same variance, you want these to be non-significant)
      • Levene’s test (ANOVA), but F statistic is quite robust against violations of this assumption (from STATISTICA help)
      • Brown-Forsythe tests – uses the median to test, whereas Levene’s test uses the mean, which can be an issue if your data is skewed
        • Statistics… Basic Statistics… Breakdown & one-way ANOVA… OK

 

  • Variables… Dependent (i.e. Y-axis, PSD area, SpHD, nl SpHD, etc.) and Grouping Variable (i.e. X-axis, Cond)…
  • Codes for grouping variables… All… OK.  OK
  • ‘ANOVA & tests’ tab… Levene tests… Brown-Forsythe tests
  • I like to do these before and after the nl transformation (should be non-significant at least after the transformation).
  • Normality (see 3. Distribution Fitting above)
    • Can test with skewness and kurtosis (see above in Descriptive Statistics)
      • Any skewness > 2 and kurtosis > 3 are bad!
        • Can also calculate the skew z-score
          • sqrt [(6n(n-1)) / (n-2)(n+1)(n +3)]
          • Can test with Shapiro-Wilk test, but do not rely on this if your sample is large (n>300, http://www.surrey.ac.uk/psychology/current/statistics/index.htm). (see above in Descriptive Statistics)
            • Kolmogorov-Smirnov d, Lillifors, and Anderson-Darling tests are good alternatives
            • Q-Q and/or P-P plots
      • Overall, the F-test is remarkably robust to deviations from normality.  The skewness of the distribution usually does not have a sizable effect on the F statistic. If the N per cell is fairly large, then deviations from normality do not matter much at all because of the central limit theorem, according to which the sampling distribution of the mean approximates the normal distribution, regardless of the distribution of the variable in the population. (from STATISTICA help)
    • Residuals (distance between data and the linear fit)
      • Histogram – within-cell residuals relative to red ‘normal’ line (I still don’t think this graph tells you much)
      • Normal p-p plot – plots your data in comparison to a ‘normal’ red line.  It is the difference from the group mean.
        • If there is a general lack of fit, and the data seem to form a clear pattern (e.g., an S or U shape) around the line, then the variable may have to be transformed in some way (e.g., a log transformation to "pull-in" the tail of the distribution).
        • I still don’t understand why if this one looks good, why the other residual normal plot is an issue
          • Doesn’t it show that the log transform worked?
          • Also - Normality is the least important assumption; almost all of ANOVA procedures robust to minor departures from normality

 


  • Residuals 1 Tab
    • Click ‘Normal’ – test for whether your residuals are normal
      • If there is a general lack of fit, and the data seem to form a clear pattern (e.g., an S shape) around the line, then the variable may have to be transformed in some way (e.g., a log transformation to "pull-in" the tail of the distribution).
      • You can click ‘Predicted and residuals’ – to get the raw residuals (observed vs. predicted) and then can test if those are normal (see Descriptive Statistics Above)

 


Power Analysis

  • Statistics… Power Analysis… Sample Size Calculation…  Two Means, t-Test, Ind Samples

 


  • Enter the parameters (mu1 and mu2 are the means you are comparing, Alpha set at 0.05, Sigma is the pop SD, and Power Goal can be lowered to 0.80), you will need to calculate the population standard deviation (Sigma) separately from Basic Statistics.  2-tailed.  OK…

 

  • Calculate N and examine the table to decide whether you have sufficient N


Homogeneity of Slopes and Analysis of Covariance (ANCOVA)

  • These tests are used to examine whether two dependent variables (i.e. SpHD and PSD area) co-vary (i.e. ‘are related’) with respect to a given predictor. (i.e Cond).  Typically illustrated with a scatterplot. (Linear Regression of this sort is really similar to GLM – website)
  • Because ANCOVA assumes equal slopes you must do the Homogeneity of Slopes test first!
  • Use the same spreadsheet as 4. Hierarchical Nested ANOVAs.
  • Statistics… Advanced Models… General Linear… Select “Homogeneity-of-slopes model” from the menu at the left.  If you have specific cases to run, you can select them as needed from Select Cases (i.e. maybe only the Control condition).  To do this, check Enable Selection Conditions, then click the radio button for Specific, selected by: and then enter your criteria (i.e. v10=’Control’). Click OK, OK.

 

  • Select Variables.
    • Dependent (PSD area, “qntCONfa”; Spine Head Diameter, “Sp HD”; or summed endo Surface Area, “Endo SA”) – Y axis
    • Categorical (Just Condition, “Cond” – there is NO nesting)
    • Continuous Predictor or Blocking Variable (PSD area, “qntCONfa” or Spine Surface Area, “Sp SA”) – X axis.  Click OK. 

 

  • Click Factor Codes… Zoom to double check all your factors are there.  OK… All… OK… OK… More Results… Switch to the Summary Tab… Test all Effects.  If your Categorical Predictor is significant (i.e. red) then you have a significant Covariate.  If not, the two do not co-vary.  If you have a significant interaction term (i.e. Cond*Sp HD is red) – you cannot do an ANCOVA, you have to use a Separate Slopes Design.  Examine your relationship using a scatterplot, which is detailed below in Item 9.
  • Depending on how this analysis turned out – you either do an ANCOVA (no interaction) next or use the Separate slopes Design (interaction)
    • Statistics… Advanced Models… General Linear… Select “Analysis of Covariance” from the menu at the left.  Again, if you have specific cases to run, select them as needed from Select Cases (i.e. maybe only the Control condition).  Click OK, OK.

 

  • Select Variables.
    • Dependent (PSD area, “qntCONfa”; Spine Head Diameter, “Sp HD”; or summed endo Surface Area, “Endo SA”) – Y axis
    • Categorical (Just Condition, “Cond” – there is NO nesting)
    • Continuous Predictor or Blocking Variable (PSD area, “qntCONfa” or Spine Surface Area, “Sp SA”) – X axis.  Click OK. 
    • Click Factor Codes… Zoom to double check all your factors are there.  OK… All… OK… OK… More Results… Switch to the Summary Tab… Test all Effects.  If your Categorical Predictor (i.e. Cond) is significant (red) then you have a significant Covariate.  If not, the two do not co-vary.  Examine your relationship using a scatterplot, which is detailed below in Item 9.


  • Statistics… Advanced Models… General Linear… Select “Separate-slopes model” from the menu at the left.  Again, if you have specific cases to run, select them as needed from Select Cases (i.e. maybe only the Control condition).  Click OK, OK.

 

  • Select Variables.
    • Dependent (PSD area, “qntCONfa”; Spine Head Diameter, “Sp HD”; or summed endo Surface Area, “Endo SA”) – Y axis
    • Categorical (Just Condition, “Cond” – there is NO nesting)
    • Continuous Predictor or Blocking Variable (PSD area, “qntCONfa” or Spine Surface Area, “Sp SA”) – X axis.  Click OK. 
    • Click Factor Codes… Zoom to double check all your factors are there.  OK… All… OK… OK… More Results… Switch to the Summary Tab… Test all Effects.  If your Categorical Predictor is significant (i.e. red) then you have a significant Covariate.  If not, the two do not co-vary.  If you have a significant interaction term (i.e. Cond*Sp HD is red) –  Not only do your variables co-vary, the categorical predictor is are moderated by the covariate (i.e. The effect condition on PSD area is moderated by SpHD.  The effect of Condition on PSD area depends on SpHD).  Examine your relationship using a scatterplot, which is detailed below in Item 9.
    • In addition to the usual assumptions for ANOVA (independence, normality, homogeneity of variances) there are two others:
      • I have found several websites that discuss what to do when your data are not independent (website)
      • Or if your data do not have equal slopes (website).  You cannot under any circumstances still run the ANCOVA.  But, there are a few ways to handle this particular issue:
        • Remove the covariate and run nested ANOVA on it
        • Run the full GLM (with the interaction) and describe in results.
          • You can also evaluate at different levels of the covariate (website) – (i.e. SpHD)
          • Means tab, under the Least Squares Means chance the radio button to User-defined values.  Click Define.  Enter the first level (I chose .25 – a spot to the left of the intersection). 
          • The options in the Covariate values group box determine at what values the continuous predictor variables (covariates) will be set for the computation of least squares means. By default, the values for any continuous predictors (covariates) in the model will be held at their respective overall Covariate means. You can also specify User-defined values for the covariates; after selecting this option button, click the Define button to display the Values for Covariates dialog box and specify the values. Finally, you can set the values for the continuous predictor variables so as to compute the Adjusted means, these are the predicted values (means) after "adjusting" for the variation of the means of the continuous predictor variables over the cells in the current Effect (see above). Adjusted means are widely discussed in the traditional analysis of covariance (ANCOVA) literature; see, for example, Finn (1974), Pedhazur (1973), or Winer, Brown, and Michels, K. M. (1991). The Adjusted means option button is only available in full factorial designs. Note that the Covariate values group box will not be available when you are using the ANOVA module
          • Switch to the Planned comps tab
            • Least squares means are also sometimes called predicted means, because they are the predicted values when all factors in the model are either held at their means or the factor levels for the respective means.
            • Levels that are to be compared against each other are assigned positive or negative integer values; however, it is important that the sum of such contrasts is equal to zero
            • For condition, this would be 1 and -1 because we have two levels – Control and LTP.


Nonparametric Analysis

  • If the assumptions of normality and homogeneity of variances are not resolved with natural log transformations – nonparametric tests must be performed.  But note, however, that hierarchical nesting is not possible with these tests
  • Statistics… Nonparametrics… Comparing multiple indep. Samples (groups)… OK

 

  • From the Quick tab, select “Variables”, then select your dependent variable (i.e. Sp HD or PSD area).  Next select the Indep. (grouping) variable (i.e. Cond).  OK.
  • Click “Summary: Kruskal-Wallis ANOVA & Median test” to run the analysis (if the effect is significant it will be listed as the last sentence in the header, eg. “Kruskal-Wallis test: H ( 6, N=1004) = 20.51504, p=0.0022”.
  • To compare groups as in a posthoc analysis, click “Multiple comparisons of mean ranks for all groups”.

 


Correlations & Scatterplots

  • If you imported a new excel spreadsheet, you might need to add the file to your workbook; and make the sheet active.  Right click on the sheet and click "Use as Active Input"
  • Graphs... Scatterplot... On the Quick Tab, select Variables... Set your X and Y variables
  • Switch to the Advanced Tab, Check "Corr and p (linear fit)"
  • If you want to do it by group – Click the “By Group” tab, and then select “Cond”
  • OK
  • Your output should look like this:

 



  • You can copy the r value and p-value by double clicking on it and copy and paste into your excel spreadsheet


  • I also figured out how to get multiple categories illustrated on the same graph
    • Graphs… Scatterplot.  On the Quick Tab, select Variables... Set your X and Y variables.  Keep “Regular” selected as Graph Type.
    • Switch to the Advanced Tab, Check “R square” and "Corr and p (linear fit)"
    • Switch to Categorized Tab.  Check X-Categories On.  Click the radio button next to Codes.  At the bottom click “Specify Codes”, select the correct Variable (i.e. Cond), click “All”, OK.  OK.
    • You might have to check “Overlaid” under Layout to plot together.
    • The output should resemble the following:
    • There is an interaction between Cond * SpHD (covariate) which shows that at low SpHDs LTPs have bigger PSDs, and at higher SpHDs – the Controls have larger PSDs.  Also, we already know that PSD and spine dimensions are positively skewed (right-ward) from our analyses above, Item 3.  It might be necessary to perform a data transformation (you can either do log or natural log, square-root, arcsine, etc.)
    • This is how the graph looks after doing nl transformation.  We know that the ln make the data normally distributed, but now the slopes are parallel which is critical for ANCOVA.
  • Finally, if you need to plot on a log axis:
    • Click the Options 1 tab.  In the Scaling area, click Type: Logarithmic under Axis: X.  Repeat for Y.