Quantitative Methods I: Psy 501 University of Mississippi, Fall 2005, Prof. Kenneth O. McGraw

Using Excel for Data Analysis and Graphs

The instructions below demonstrate the use of Excel to produce descriptive statistics and graphs that are then incorporated into an APA-style data report.  The data for this demonstration are 200 adult human heights.   Nonetheless, the steps in Exel for computing descriptive statistics, creating tables, and creating figures are the same.

To compute statistics in Excel, choose the cell where you want the value to go and then type in the name of the statistical function that you want preceded by an = sign. Examples of some functions you will need to use are in the text box to the left. Each is computed on a range of values with the range name of "Heights.  It is very handy to name the range of values that contain your data.  In this way you can refer to your data by name rather than by cell references (e.g., C3:H55). To create a name, use Insert/Name/Define from the menu bar or simply type the name you want to use in the "Name" box on the toolbar.

Tabular and Graphical Displays of Data
Tables of summary statistics

The most common table to appear in journal articles is a table of descriptive statistics, as illustrated by the table below, which gives descriptive statistics for the height data.. The table after that is a model table taken from the APA style manual. It shows all of the conventions for tables. Please follow them carefully.

Frequency tables. Data are grouped into convenient intervals with upper and lower real limits. Frequencies and cumulative frequencies are computed for each interval.

Table 1

Frequency Table for Adult Heights (N=200) in Two-Inch Intervals

 Interval Midpoint f Cumulative f Upper Limit Percentile

 51-53 52 1 1 0.5 53-55 54 1 2 1 55-57 56 2 4 2 57-59 58 3 7 3.5 59-61 60 8 15 7.5 61-63 62 16 31 15.5 63-65 64 30 61 30.5 65-67 66 39 100 50 67-69 68 49 149 74.5 69-71 70 22 171 85.5 71-73 72 22 193 96.5 73-75 74 6 199 99.5 75-77 76 1 200 100

To produce the data for a table like this, enter some "bin" values in a column of the spreadsheet. The values you enter will be the upper limit of the bin. In Excel bins are defined by their upper limits (not midpoints as you might expect.) Make the start value an upper limit for the bin that includes the smallest height in the set. So if the smallest value is 57.44, you could make 58 the upper limit to the first bin. Next determine how large the interval between bin values should be. You want to have 10-15 bins total. For the height data, an interval size of 2 will probably be appropriate. If so, 60 would be the upper limit to the second bin. At this point you can select the two bins already entered and then use the drag tool to fill in the remainder of the bins (as shown in class). Once you have these bins defined, you can use the histogram function available in Tools/Data Analysis to get the data you need.

Clicking OK produces all the data you need except for the cumulative frequency, which is easily obtained by adding the current bin frequency to the cumulative frequency above it.

Stem and Leaf plots. Conventions are not authoritatively established but a stem-and-leaf plot allows you to list all data in a way that reveals its distribution, its central tendency and dispersion, without losing any of the quantitative information in individual measurements. It is like a histogram, but better. To create a stem-and-leaf plot, sort your data using the Data/Sort function. Round the data to the "leaf" digit you want to use. This will be the tenths for your height data so you will want to round the data to one decimal place using the Format function. (Select the heights and then right click to get the Format menu.) Then using your word processor, create a table that follows the conventions given by the one in the text box.
Graphical Displays
Histograms
. Frequencies are plotted in bars. Technically, midpoints should be given on the the x-axis but Excel uses limits by default.  To change the values printed on the x-axis, just type over the upper limits in the frequency table that is output by the Histogram function.
Histograms in Excel are created using Tools/Data Analysis/Histogram.
The dialog box looks the same as the one used to produce frequency counts for bin values.  To get a histogram just choose "Chart Output."

Polygons. Beginning with the first interval with a frequency of 0 and going to the last with a frequency of zero, frequencies per interval are plotted as points and the points are connected with straight lines. Midpoints are again used on the x-axis.

You can turn the histogram created earlier into a frequency polygon by clicking on the completed histogram and choose ChartType.  Change the Chart Type to "Line."  Because polygons must begin and end on the x-axis you will have to add bins below and above the existing bins.  The frequencies in these will be zero.

Cumulative histograms and polygons. Cumulative plots are used to make relative rankings more obvious. Frequently a dual y-axis will be used with one showing cumulative frequencies and the other showing percentile ranks. Put upper limits to intervals on the abscissa, not the midpoints. When using Excel, you have the option to produce a dual axis graph that shows both a histogram and a cumulative frequency polygon for the same data.  Choose the "cumulative frequency" option in the dialog box that opens up when you select Tools/Data Analysis/Histogram.

The graphical methods above are all classical methods of showing distributional information. John Tukey in 1977 introduced some newer methods that are now catching on. One of the most popular is the box plot. (He also created the stem-and-leaf plot which was introduced as a "table" earlier but obviously has graphical properties.)

Box Plot: Box and whiskers plot was the original name. Need 75th percentile (upper hinge), 25th percentile (lower hinge), median, upper fence, and lower fence. Fences are invisible limits in the sense that you compute them but do not graph them. There are different conventions for computing fences. The ones we will use are computed relative to the upper and lower "hinges" (i.e., P75 and P25). The lower fence is at a distance of 1.5*(75th percentile pt.-25th percentile pt) below the lower hinge. Upper fence is at a distance of 1.5*(75th percentile pt. - 25th percentile pt.) above the upper hinge. Draw lines (whiskers) to the last data point within the fences, then use individual markers (such as the ◦'s used here, though commonly asterices are used) for the remaining points which are termed outliers.

Whiskers cover 99.3% of values in a normal distribution. [z=.674 is at 75th, -674 at 25th, interquartile range is 1.348. Whiskers extend to last values inside the range +2.696 and -2.696, an interval that includes 99.3% of values.) Any values outside of whiskers are plotted as asterices and are labeled outliers.

In its current version, Excel does not have a box plot charts option (but many statistical packages do). But you can still use Excel to create a graph that is pretty much like what you want. There are add-ins for Excel that produce these charts (along with a host of other statistical procedures not included in the native version of Excel), but they tend to be quite expensive.  I don't recommend them.  Spend your money on SPSS, Systat, SAS, or Sigma Plot instead.

To develop a box plot in Excel, compute the following statistics in the order given in a column of a worksheet:

P25, Min,P50,Max,P75. Open the Chart wizard using Insert/Chart. Choose Line graph, data in rows, and Finish. This will produce a graph with five data points lined up, each with its own unique marker.

Now click on the graph to open up the dialog box for formatting the data series. Choose options and check the boxes for high-low bars and up-down bars. You may also want to adjust the gap size. This produces a graph like the one below, which can be further edited to eliminate unwanted markers.