STATISTICA

STATISTICA

Before you start

  • First, get your excel spreadsheet formatted properly.  Do not have any formulas or equations, these will leave strange blanks spots in your Statistica workbook.

  • Example (P15 and Adult Stats tab in Adult + Developmental SER Summary for 2011 CLM Retreat SfN.xlsx)

  • Import the sheet from excel, changed "Files of Type" to Excel Files.  Select your workbook... Open; then Import Selected Sheet to a Spreadsheet, pick the one you are interested in from those that appear... OK, Check "Get variable names from first row" and "import cell formatting", OK

  • If you get an error about text values, just check the box "Do this for all variables when the data is text" then click "Import as Text Labels"

Basic statistics: T-tests

  • Statistics... Basic Statistics...  T-test, independent, by groups...   then pick your Dependent Variables and Grouping Variable (I used "Age"); you will get a print out that should look like this:

Basic Statistics: Descriptive Statistics

  • After the spreadsheet is formatted and imported

  • Statistics… Basic Statistics… OK

  • Select the variable of interest (example Sp/um)

  • Switch to the Advanced Tab and select any additional measures (i.e. Median).  Add any By Group measures as needed (i.e. Condition). Then click Summary.

  • For Skewness analyses, switch to the Normality Tab, increase Number of intervals to 40, add a check mark to Shapiro-Wilks W test (if sample size is small).  Add any By Group measures as needed (i.e. Condition). Then click Histograms.

    • The Histogram will also give you p-values for KS stat & Lillefors

      • Large samples (n>300) don’t use Shapiro-Wilks because it’s too sensitive

      • KS uses theoreticals, while Lillifors uses your own data to decide.

 

Distribution Fitting

  • If the above descriptive statistics measures have led you to determine that your data is skewed, it might be worth further investigating as follows.

  • Statistics… More Distributions… ‘Fit Distribution’… OK

  • Under the Quick tab, click Variables… and select your continuous and/or discrete variables… OK… OK

    • For the majority of the postsynaptic data – the measurements are ‘continuous’ (can have an infinite number of values between two points, i.e. PSD area or SpHD) NOT discrete (count data, 0, 1, 2 etc – i.e. docked vesicles)

  • First click on “Summary Graph” to create a paneled window with your histogram & normal ‘red’ line and the KS & Lillifors statistics, the Normal P-Plot, Descriptive Statistics and a box plot

  • Next click on “Summary Statistics” for the complete Descriptive statistics table

  • “Distribution Summary Statistics” – which will provide the KS statistic and its associated p-value, and the Anderson-Darling (AD) statistic and it’s p-value

  • Finally, generate a Q-Q plot (you want these points to line up along the ‘red’ diagonal line) with minimal deviation at the tails

  • After you generate all of these items for your non-transformed data (i.e. SpHD), repeat on your transformed variable (nl SpHD) – ideally any issues of normality are resolved and you can proceed with parametric statistics.  If not – you are stuck with non-parametric tests (Kruskal-Wallis)

 

Factorial ANOVAs -  Used for Dendrite Data (i.e. sp/μm)

  • After the spreadsheet is formatted and imported

  • Statistics… ANOVA… Select either One-way or Factorial ANOVA… OK

  • Select the variables to be analyzed – Dependent is the variable that was measured or quantified (i.e. Sp/um) and Categorical Predictors (factors) are the independent variables (i.e. Experiment and Condition)

  • Then Click on Factor Codes, first select Zoom to make sure all of the categories are listed.  For example under condition, it should say Control and TBS.  Click OK, then ALL. Then OK again.  For this type of analysis leave Between Effects alone and run the Full factorial.  Click OK one last time.

  • Click More Results to expand the Menu, from this:

  •   To look like this:  

  • First switch tabs to the Summary Tab and click on Test all Effects to generate an ANOVA table

  • Switch to the Mean tab to generate your means and graphs.  Then click All Marginal Tables on the 2nd row – by Observed, Weighted.  Using Unweighted will give you the WRONG standard errors.  DO NOT USE!  If you want to plot an effect switch the drop down as necessary to examine the effect in question (i.e. Expt, Cond, or Expt*Cond).  Be sure to check Show/plot means +/- standard errors, otherwise your plot will have confidence intervals not SEMs.  Then click Plot (again on the second row).

  • Finally, if you have any significant interactions to examine, switch to the Post-hoc tab.  For Main Effects, there is no point in running post-hocs, since you know the effect is significant or not from the ANOVA table and the means have already been generated.  The post-hoc test is only for evaluating which means are different within an interaction.  Just like with the Means tab, you have to pick the effect you want to examine from the drop down menu.  Finally click on the test you want to use – in this case – Tukey HSD.

  • You can return to the Variable Menu to run the next comparison by clicking Modify on the Right.

  • The next important step is to check your Assumptions (Independence, Normality, Homogeneity of Variance; see below)

Hierarchical Nested ANOVAs – Used for Spine Dimensions and PSD area

  • These steps are very well described and illustrated in the SOP that KH created (Statistica Nested Anova Analysis.doc).  They are only summarized briefly here.

  • Import the Excel sheet you need to analyze (see steps above if you need to Import as Text Values)

  • Statistics… Advanced Models… General Linear… General Linear Models…

  • Select Cases if you want to analyze by a specific number (i.e. spines with head diameters < .55 um; or v40<.55) or if you want to sort the parameter by specific text, (Cross-sectioned synapses; or v25=‘xs’)

  • Ok

  • Variables…

    • Dependent (PSD area, “qntCONfa” or Delta PSD area, “Delta qntCONfa” or Spine Length, “ProtLen” or Delta spine length, “Delta ProtLen”) – Y axis

      • If you data have been determined to be skewed, you should normalize your data – with a log normal transform (http://www.unm.edu/~marcusj/datatransforms.pdf).

      • If the skew is now eliminated – proceed with parametric analysis, but remember to check the homogeneity of variance (see below).

    • Categorical (Experiment or Animal, Condition, Unique dendrite number, “Expt, Cond, CodeDenNum)

    • Continuous (none) – Only use for ANCOVAs

    • Ok

  • Factor Codes (click ALL for each)

  • Between Effects

    • Click the radio button for “Use Custom Effects for the between design”

    • Highlight all 3 on the left underneath “Predictor Variables Categorical”

    • Click “Hierarc. Nest” underneath “Method”

    • Under condition, “Cond”, highlight Expt

– Nesting each condition within experiment

  • Under unique dendrite name, “CodeDenNum”, highlight both Expt and condition) – Nesting each dendrite within experiment and condition

  • Finally to examine the main effect of condition, you need to click on Cond on the left (highlighted blue below) and then click Add under Method.

  • Ok, Ok, Ok

  • Ignore Missing Data warning, Ok

  • Click More results

  • If you want to analyze by individual groups, click By Group, Grouping Variable (t-post).  I’ll probably do my analyses by time-point using individual spreadsheets.

  • Summary… Test all effects

  • Means… “Plot or show means for effect:” (select any of the options you want the means).  Be sure to click on “All marginal tables” near “Observed, weighted”. To graph a difference, click on the “Plot” button by “Observed, weighted”.  Be sure to check the box “Show/plot means +/- standard errors.

  • Post-hoc… “Effect:” (select any effect that you want to explore with a post-hoc test from the drop-down list). Tukey HSD.

 

  • The last step is checking that your assumptions are correct: (Good website) - You have to take out the effects of all the Xs before you look at the distribution of Y.  (I just realized that you do this to your entire data set across groups)

  1. Independence of observations – this is an assumption of the model that simplifies the statistical analysis.

  2. Normality – the distributions itself and the distribution of the residuals are normal.

  3. Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same.

 

How to test?  You can check all three with a few residual plots–a Q-Q plot of the residuals for normality and a scatterplot of Residuals on X or Predicted values of Y to check 1 and 3.

 

  • How/Why we Test Assumptions (Much of the info was gleaned from the “Elementary Concepts in Statistics” section within STATISTICA’s help menu)

    • Homogeneity of Variances (i.e. the groups have the same variance, you want these to be non-significant)

      • Levene’s test (ANOVA), but F statistic is quite robust against violations of this assumption (from STATISTICA help)

      • Brown-Forsythe tests – uses the median to test, whereas Levene’s test uses the mean, which can be an issue if your data is skewed

        • Statistics… Basic Statistics… Breakdown & one-way ANOVA… OK

  • Variables… Dependent (i.e. Y-axis, PSD area, SpHD, nl SpHD, etc.) and Grouping Variable (i.e. X-axis, Cond)…

  • Codes for grouping variables… All… OK.  OK

  • ‘ANOVA & tests’ tab… Levene tests… Brown-Forsythe tests

  • I like to do these before and after the nl transformation (should be non-significant at least after the transformation).

  • Normality (see 3. Distribution Fitting above)

    • Can test with skewness and kurtosis (see above in Descriptive Statistics)

      • Any skewness > 2 and kurtosis > 3 are bad!

        • Can also calculate the skew z-score

          • sqrt [(6n(n-1)) / (n-2)(n+1)(n +3)]

          • Can test with Shapiro-Wilk test, but do not rely on this if your sample is large (n>300, http://www.surrey.ac.uk/psychology/current/statistics/index.htm). (see above in Descriptive Statistics)

            • Kolmogorov-Smirnov d, Lillifors, and Anderson-Darling tests are good alternatives

            • Q-Q and/or P-P plots

      • Overall, the F-test is remarkably robust to deviations from normality.  The skewness of the distribution usually does not have a sizable effect on the F statistic. If the N per cell is fairly large, then deviations from normality do not matter much at all because of the central limit theorem, according to which the sampling distribution of the mean approximates the normal distribution, regardless of the distribution of the variable in the population. (from STATISTICA help)

    • Residuals (distance between data and the linear fit)

      • Histogram – within-cell residuals relative to red ‘normal’ line (I still don’t think this graph tells you much)

      • Normal p-p plot – plots your data in comparison to a ‘normal’ red line.  It is the difference from the group mean.

        • If there is a general lack of fit, and the data seem to form a clear pattern (e.g., an S or U shape) around the line, then the variable may have to be transformed in some way (e.g., a log transformation to "pull-in" the tail of the distribution).

        • I still don’t understand why if this one looks good, why the other residual normal plot is an issue

          • Doesn’t it show that the log transform worked?

          • Also - Normality is the least important assumption; almost all of ANOVA procedures robust to minor departures from normality

 

  • Residuals 1 Tab

    • Click ‘Normal’ – test for whether your residuals are normal

      • If there is a general lack of fit, and the data seem to form a clear pattern (e.g., an S shape) around the line, then the variable may have to be transformed in some way (e.g., a log transformation to "pull-in" the tail of the distribution).

      • You can click ‘Predicted and residuals’ – to get the raw residuals (observed vs. predicted) and then can test if those are normal (see Descriptive Statistics Above)