Simple Usability Test

Useful Information

This test plan does not take into account all the preparation needed to perform a usability test, including: writing test scripts, recruiting users, scheduling, etc. Also, this plan only hints at debriefing, documentation, reporting, and data analysis. These are more advanced topics and may be covered at a later time.


Prior to performing a test, the product team must define two things

1. Users to be tested. In the case of uPortal, we have started building personas which will act as guides for selecting test candidates.

2. Standards of quality or success criteria. At some point, the team needs to decide on when enough is enough. What level of quality does the design need to meet? I'll leave this up to the community to discuss – assuming there's an interest in testing.

To perform the test, we set out to measure two of the three ISO metrics: Effectiveness (objective) and User Satisfaction (subjective).

How We Measure Effectiveness

Effectiveness will be measured by the successful completion of a task. If a task is completed successfully it will be marked as a "success." A success mark is given the full credit of 100%.

Tasks that are not completed successfully will be given a "fail" mark. Fail marks are given zero (0%) credit. Unsuccessful tasks can include events such as users giving up, users requiring assistance (tech. support), users completing tasks incorrectly, etc.

Partial credit will also be made available in the form of a "partial" mark which will allow for 50% credit. Partial credit will be reserved for instances when users enter a calendar date other than what was instructed or delete a different briefcase folder than intended. In these instances it will be up to the discretion of the tester to determine if the mistake warrants partial credit rather than a fail mark.

The following table is an example of how this method will be evaluated:

 

Task 1

Task 2

Task 3

Task 4

Task 5

Task 6

User 1

Fail

Fail

Success

Fail

Fail

Success

User 2

Fail

Fail

Partial

Fail

Partial

Fail

User 3

Success

Fail

Success

Success

Partial

Success

User 4

Success

Fail

Success

Fail

Partial

Success

This example shows 6 tasks with 4 attempts per task, totaling 24 task attempts. 9 attempts were successful and 4 were partially successful. To arrive at the overall effectiveness rating for this set of tasks we use the following equation:

(9 + (4 * 0.5)) / 24 = 46%

This method provided by: Success Rate: The Simplest Usability Metric (Jakob Nielson's Alertbox)

How We Measure User Satisfaction

User satisfaction will be measured with post-task and post-test questions. In other words, questions will be asked after each task in the test script and also at the end of the entire test. The following are examples of questions that can be asked. The questions and asnwers should be structured using a 5 point Likert scale:

1. On a scale of 1-5 (1=not confident / 5=absolutely certain), how confident are you that you completed the task correctly?

2. On a scale of 1-5 (1=not confident / 5=absolutely certain), how confident are you that you can complete the task again?

3. On a scale of 1-5 (1=very difficult / 5=very easy), how difficult was it for you to complete the task?

4. On a scale of 1-5 (1=very frustrating / 5=not frustrating), how frustrating was this task to complete?

5. On a scale of 1-5 (1=did not like it / 5=liked it very much), how would you rate the look and feel of the design?

The following table is an example of how this method will be evaluated:

 

Question 1

Question 2

Question 3

Question 4

Question 5

User 1

3

3

5

1

3

User 2

4

4

2

3

4

User 3

3

5

5

2

5

User 4

4

4

3

2

4

Total

14

16

15

8

16

Using a 5 point Likert scale with a negative weighting to 1 and a positive weighting to 5, each question offers a possible positive response factor of 20 points or 100% satisfaction. To arrive at the satisfaction rating for each question, we use the following equation:

Question 1 = 14/20 (70% satisfaction)
Question 2 = 16/20 (80% satisfaction)
Question 3 = 15/20 (75% satisfaction)
Question 4 = 8/20 (40% satisfaction)
Question 5 = 16/20 (80% satisfaction)

It is recommend to rate each question independently so as to better understand the impact of the individual questions. However, if questions are picked carefully as to not conflict with each other, a final "overall" grade can be determined.

In addition to the Likert scale, it is also possible to ask freeform essay style questions. These questions however will only be used anecdotally.

How to define success criteria?

This is a fairly contentious subject. Ideally, success criteria are established by market research and baseline testing, comparing competitive products on a feature for feature basis. However, in the real world, this type of refinement is rarely achievable.

An alternative recommendation is to set an "internal" comfort level with quality. For instance, the uPortal community might say that our user base is comfortable with a 75% success range. This then is used as a guiding principle through the design process. If the design does not meet the 75% success and satisfaction standard, its back to the drawing board!