Displaying Possible Normal Characteristics

Upon running the scanner under the Normal tab, the Possible Normal Characteristics area displays the predictions regarding each field, calculated by the KS6800A software using the Shapiro-Wilk test for normal distribution.

The columns that appear in this area are:

  1. Feature Id—Name of the measurement field considered for normal distribution analysis.
  2. Description—Prediction by the KS6800A software of whether a field is normally distributed or not. The types of description are:
  • There is a chance that <field_name> is normally distributed.
  • The field <field_name> is probably not normally distributed but may still be clustered around a mean.
  1. Score—A value that the KS6800A software derives by using the Shapiro-Wilk test, where smaller values indicate that the field is not normally distributed. Here, a score of 0.5 and above indicates strong chances of the field being normally distributed. This test is extremely sensitive when there are a large number (thousands) of samples, and may yield a poor score even if the data in the field is fairly normal.

You may view either 5, 10 (default), 25 or 50 rows per page. Use the navigation buttons to view the rows displayed on another page.

To view a graphical analysis of whether a field meets the conditions for normal distribution or not:

  1. Click the row you wish to analyze.

A page is displayed with four types of charts associated with normal distribution along with the options at the bottom of the page to consider / ignore the values of the corresponding field when scanning for outliers. You may perform some common functions across all chart types, which are explained in Modifying Display of Charts.
  1. Hovering the cursor on the charts displays the corresponding values.
  2. Click the legend to hide / view part of the plotted chart.
  3. Perform the following in-chart functions:
  1. Download as .png
  2. Zoom
  3. Pan
  4. Zoom in
  5. Zoom out
  6. Reset axes

Let us understand the chart types that are plotted in a normal distribution within the KS6800A environment:

  • Values—This chart shows the values of the selected field plotted against a normal distribution using bell curve, where the mean and the standard deviations are marked accordingly. Compare the values from the field (plotted in blue) against the mean and standard deviation values on the curve to many determine any outliers, in the form of extreme values.

  • QQ Plot—This chart, also known as Quantile-Quantile plots, shows two quantiles plotted against each other. In general, the purpose of QQ plots is to find out if two sets of data come from the same distribution. A 45 degree angle is plotted on the Q Q plot and if the values of the field are from a common distribution, the points will fall on that reference line. The image below shows quantiles from a theoretical normal distribution on the horizontal axis. It’s being compared to values from the field on the Y-axis. Some points are not clustered on the 45 degree line, suggesting that some values in this field do not follow the conditions for normal distribution.

  • Skewness—This chart shows the distortion or asymmetry, if any, in a symmetrical bell curve, or normal distribution, for the selected field. If the curve is shifted to the left or to the right, it is said to be skewed. Skewness can be quantified as a representation of the extent to which a given distribution varies from a normal distribution. The skewness for a normal distribution is zero, and any symmetric data should have a skewness near zero. Positive Skewness means when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode. Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side. The mean and median will be less than the mode. The image below shows an example of Positive Skewness. While the Normal Distribution is a distribution that has most of the data in the center with decreasing amounts evenly distributed to the left and the right, skewness is distribution with data gathered on one side or the other with decreasing amounts trailing off to the left or the right.

  • Kurtosis—This chart shows the measure of outliers present in the distribution. High kurtosis is an indicator that the field has fat tails or outliers, whereas low kurtosis indicates that the field has thinner tails or lacks outliers. In the image below, the actual tail is fatter than the theoretical tail, which indicates the possibility of outlier values in this field.
  1. After analyzing the charts, you may choose one of the following options:
  • Confirm <field_name> should be clustered around <mean_value>—indicates to the KS6800A software to consider the selected field and its values when scanning for extreme value outliers.
  • Ignore extreme values of <field_name>—indicates to the KS6800A software that the selected field and its values be excluded from scanning for extreme value outliers.

After selecting an option, the display returns to the previous page and the entry for the corresponding appears under Confirmed Normal Characteristics. For more details, see Displaying Confirmed Normal Characteristics.

  1. To go back to the previous page without choosing any option, use the back button on your browser window.
  2. To return to the View Data window, click Home on top of this page.