Sunday, 10 July 2011

This weblog is created for the purposes of assignment for the subject BYL 7134, Cyberlaw by Faculty of Management, Multimedia University (MMU), Malaysia. The materials posted on this weblog are for the purposes of the assignment as well as study and non-profit research only. Appropriate acknowledgements to the materials that do not belong to the weblog owner have been publicly made. If you are the author or a copyright or a trademark owner of any of the articles or images or graphics or pictures or any material posted in this weblog herein and you object to such posting on any grounds, including copyright or trademark infringement, please contact me and I will take your material down. I state herein that I am relying on the doctrine of fair use.

Thursday, 7 July 2011

INTERNAL CONSISTENCY RELIABILITY

Internal consistency reliability defines the consistency of the results delivered in a test, ensuring that the various items measuring the different constructs deliver consistent scores.
For example, an English test is divided into vocabulary, spelling, punctuation and grammar. The internal consistency reliability test provides a measure that each of these particular aptitudes is measured correctly and reliably.
One way of testing this is by using a test–retest method, where the same test is administered some after the initial test and the results compared.
However, this creates some problems and so many researchers prefer to measure internal consistency by including two versions of the same instrument within the same test. Our example of the English test might include two very similar questions about comma use, two about spelling and so on.
The basic principle is that the student should give the same answer to both – if they do not know how to use commas, they will get both questions wrong. A few nifty statistical manipulations will give the internal consistency reliability and allow the researcher to evaluate the reliability of the test.
There are three main techniques for measuring the internal consistency reliability, depending upon the degree, complexity and scope of the test.
They all check that the results and constructs measured by a test are correct, and the exact type used is dictated by subject, size of the data set and resources.

SPLIT-HALVES TEST

The split halves test for internal consistency reliability is the easiest type, and involves dividing a test into two halves.For example, a questionnaire to measure extroversion could be divided into odd and even questions. The results from both halves are statistically analysed, and if there is weak correlation between the two, then there is a reliability problem with the test.
The split halves test gives a measurement of in between zero and one, with one meaning a perfect correlation.
The division of the question into two sets must be random. Split halves testing was a popular way to measure reliability, because of its simplicity and speed.However, in an age where computers can take over the laborious number crunching, scientists tend to use much more powerful tests.

KUDAR-RICHARDSON TEST

The Kudar Richardson test for internal consistency reliability is a more advanced, and slightly more complex, version of the split halves test.In this version, the test works out the average correlation for all the possible split half combinations in a test. The Kudar Richardson test also generates a correlation of between zero and one, with a more accurate result than the split halves test. The weakness of this approach, as with split-halves, is that the answer for each question must be a simple right or wrong answer, zero or one.
For multi-scale responses, sophisticated techniques are needed to measure internal consistency reliability.

CRONBACH'S ALPHA TEST

The Cronbach's Alpha test not only averages the correlation between every possible combination of split halves, but it allows multi-level responses.For example, a series of questions might ask the subjects to rate their response between one and five. Cronbach's Alpha gives a score of between zero and one, with 0.7 generally accepted as a sign of acceptable reliability.
The test also takes into account both the size of the sample and the number of potential responses. A 40-question test with possible ratings of 1 – 5 is seen as having more accuracy than a ten-question test with three possible levels of response.
Of course, even with Cronbach's clever methodology, which makes calculation much simpler than crunching through every possible permutation, this is still a test best left to computers and statistics spreadsheet programmes.

SUMMARY

Internal consistency reliability is a measure of how well a test addresses different constructs and delivers reliable scores. The test–retest method involves administering the same test, after a period of time, and comparing the results.By contrast, measuring the internal consistency reliability involves measuring two different versions of the same item within the same test.
source: http://www.experiment-resources.com/internal-consistency-reliability.html

Interrater reliability (Kappa) Using SPSS

Interrater reliability (Kappa)
            Interrater reliability is a measure used to examine the agreement between two people (raters/observers) on the assignment of categories of a categorical variable. It is an important measure in determining how well an implementation of some coding or measurement system works.
A statistical measure of interrater reliability is Cohen’s Kappa which ranges generally from 0 to 1.0 (although negative numbers are possible) where large numbers mean better reliability, values near or less than zero suggest that agreement is attributable to chance alone.
Example Interrater reliability analysis
            Using an example from Fleiss (1981, p 213), suppose you have 100 subjects whose diagnosis is rated by two raters on a scale that rates the subject’s disorder as being either psychological, neurological, or organic.  The data are given below: (KAPPA.SAV)

RATER A
Psychological
Neurological
Organic

RATER
B
Psychological
75
1
4
Neurological
5
4
1
Organic
0
0
10

The data set KAPPA.SAV contains variables, Rater_A, Rater_B and Count. The figure below shows the data file in count (summarized) form. 

           

To analyze this data follow these steps:
1.     Open the file KAPPA.SAV. Before performing the analysis on this summarized data, you must tell SPSS that the Count variable is a “weighted” variable. Select Data/Weight Cases...and select the “weight cases by” option with Count as the Frequency variable
2.     Select Analyze/Descriptive Statistics/Crosstabs.
3.     Select Rater A as Row, Rater B as Col.
4.     Click on the Statistics button, select Kappa and Continue.
5.     Click OK to display the results for the Kappa test shown here:
            The results of the interrater analysis are Kappa = 0.676 with p < 0.001. This measure of agreement, while statistically significant, is only marginally convincing. As a rule of thumb values of Kappa from 0.40 to 0.59 are considered moderate, 0.60 to 0.79 substantial, and 0.80 outstanding (Landis & Koch, 1977). Most statisticians prefer for Kappa values to be at least 0.6 and most often higher than 0.7 before claiming a good level of agreement. Although not displayed in the output, you can find a 95 % confidence interval using the generic formula for 95% confidence intervals:
Estimate ± 1.96SE
Using this formula and the results in the table an approximate 95% confidence interval on Kappa is (0.504, 0.848).  Some statisticians prefer the use of a weighted Kappa, particularly if the categories are ordered.  The weighted Kappa allows “close” ratings to not simply be counted as “misses.”  However, SPSS does not calculate weighted Kappas.
A more complete list of how Kappa might be interpreted (Landis & Koch, 1977) is given in the following table
Kappa Interpretation
< 0
Poor agreement
0.0 – 0.20
Slight agreement
0.21 – 0.40
Fair agreement
0.41 – 0.60
Moderate agreement
0.61 – 0.80
Substantial agreement
0.81 – 1.00
Almost perfect agreement

Reporting the results of an interrater reliability analysis
The following illustrate how you might report this interrater analysis in a publication format.
Narrative for the methods section:
“An interrater reliability analysis using the Kappa statistic was performed to determine consistency among raters.”
Narrative for the results section:
“The interrater reliability for the raters was found to be Kappa = 0.68 (p <.0.001), 95% CI (0.504, 0.848). ”

Reference
Landis, J. R., Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33:159-174. 
source:http://www.stattutorials.com/SPSS/TUTORIAL-SPSS-Interrater-Reliability-Kappa.htm

reliability of test

Test reliability (consistency) is an essential requirement for test validity. Test validity is the degree to which a test measures what it is designed to measure.
       Researchers use four methods to check the reliability of a test: the test-retest method, alternate forms, internal consistency, and inter-scorer reliability. Not all of these methods are used for all tests. Each method provides research evidence that the responses are consistent under certain circumstances. There are four distinct types of reliability.

(1.) Test-Retest - a method of estimating test reliability in which a test developer or researcher gives the same test to the same group of research participants on two different occasions. The results from the two tests are then correlated to produce a stability coefficient. Studying the coefficients for a particular test allows the assessor to see how stable the test is over time.
Example: The information obtained for test-retest reliability of the WISC-IV was evaluated with information from 243 children. The WISC-IV was administered two separate times with the test-retest mean interval of 32 days. The average corrected Full Scale IQ stability coefficient was .93.
 
 (2.) Alternate Forms - This type of reliability makes a second form of a test consisting of similar items, but not the same items. Researchers administer this second “parallel” form of a test after having already administered the first form. This allows researchers to determine a reliability coefficient that reflects error due to different times and items and allow to control for test form. By administering form A to one group and form B to another group, and then form B to the first group and form A to the second group for the next administration of the test, researchers are able to find a coefficient of stability and equivalence. This is the correlation between scores on two forms and takes into account error of different times and forms.

Example: The ACT is is an academic test used in the college admission process. There are four academic subtests: English, mathematics, reading, and natural science reading. A standard score scale is used to report scores on the four academic tests. There is also a composite score – the average of standard scores on the four subtests. Scaled scores equivalents are provided for each four of the test by the equipercentile method based on the score distribution of an anchor form of the ACT. New forms of the test are equated to older forms by giving both forms to parallel samples of students and then equating the forms by the equipercentile method (Aiken, 1985).

(3.) Internal Consistency- three ways to measure the consistency of a test with only one form. 

          
What is Split-Half Reliability?
A test given and divided into halves and are scored separately, then the score of one half of test are compared to the score of the remaining half to test the reliability (Kaplan & Saccuzzo, 2001).
Why use Split-Half?
Split-Half Reliability is a useful measure when impractical or undesirable to assess reliability with two tests or to have two test administrations (because of limited time or money) (Cohen & Swerdlik, 2001).
How do I use Split-Half?
1st-divide test into halves. The most commonly used way to do this would be to assign odd numbered items to one half of the test and even numbered items to the other, this is called, Odd-Even reliability. 

2nd- Find the correlation of scores between the two halves by using the Pearson r formula.

3rd- Adjust or reevaluate correlation using Spearman-Brown formula which increases the estimate reliability even more. The longer the test the more reliable it is so it is necessary to apply the Spearman-Brown formula to a test that has been shortened, as we do in split-half reliability (Kaplan & Saccuzzo, 2001).

 Spearman-Brown formula
r = 2 r
1+ r
r = estimated correlation between two halves (Pearson r) (Kaplan & Saccuzzo, 2001).

          
(B)  Kuder-Richardson Formula

Another way to internally evaluate a test would be to use the Kuder-Richardson 20. This is only advisable if you have dichotomous item in a test (usually for right or wrong answers).

KR 20= r = N (S 2 alpha pq)
N-1 (S
2)

  KR20 = reliability estimate (r)
N= the number of items on the test
S2 = the variance of the total test score
p = proportion of people getting each item correct (this is found separately for each item)
q = the proportion of people getting each item incorrect. For each item q equals 1-p.
alpha p q = the sum of the products of p times q for each item on the test.
(Kaplan, Saccuzzo.2001)

         (C)  Cronbach Alpha/Coefficient Alpha
The Cronbach Alpha/Coefficient Alpha formula is a general formula for estimating the reliability of a test consisting of items on which different scoring weights may be assigned to different responses
insert equation here
k = the number of items
si2 = the variance of scores on item i
st2 = the variance of total test scores
(Aiken, 2003)

(4.)Inter scorer reliability- measures the degree of agreement between persons scoring a subjective test (like an essay exam) or rating an individual. In regards to the latter, this type of reliability is most often used when scorers have to observe and rate the actions of participants in a study. This research method reveals how well the scorers agreed when rating the same set of things. Other names for this type of reliability are inter-rater reliability or inter observer reliability.

Estimates of Inter-rater Agreement and Reliability

In addition to simple percentages of agreement, Cohen's kappa (KAPPA) was also calculated for the exact match percentages of agreement, and the results are shown in Table 2. The KAPPA coefficients indicate the extent of agreement between the raters, after removing that part of their agreement that is attributable to chance. As can be seen, the values of the KAPPA statistic are much lower than the simple percentages of agreement (Goodwin, 2001).
Simple percentages of agreement and kappa. To estimate inter-rater reliability of observational data, percentages of agreement are often calculated--especially if the number of scale points is small. Percentages of agreement can be calculated in a number of different ways, depending on the definition of agreement.
For Example:
In Table 2, the percentages of agreement between the raters for each occasion (day) are presented two ways: first, for the case in which agreement meant an exact match between raters in their assigned ratings; second, for the case in which agreement was defined more leniently as either exact agreement, or differences between the two raters' scores of not more than one point in either direction. (This latter definition of agreement has been used fairly often in the estimation of interrater agreement of some types of measures, such as parent-infant interaction scales [Goodwin & Sandall, 1988].) As would be expected, percentages of agreement are lower when agreement is defined in the more conservative way (exact match). The results shown in Table 2 demonstrate that the median percentage of agreement for the 6 days, when agreement was defined as exact match, was 20%; the median percentage of agreement for the 6 days, when the more liberal definition of agreement was used, was 80%. (Percentages of agreement were not calculated for the total scores in Table 1 because this approach to reliability estimation is rarely used if the range of scores is large; here, the total scores could range from 6 to 42.) 
source: http://web.sau.edu/WaterStreetMaryA/NEW%20intro%20to%20tests%20&%20measures%20website_files/reliability.htm