Correctness Tests

Correctness tests are used to completely test every students code. The mark they get on an assignment are heavily dependent on the results of the correctness tests. Testing methodologies is a huge topic in computer science. The main focus on this section will be tips and tricks specific to the testing software you will be working with.

Coming up with Cases

It's very helpful to always keep track of previous assignment test cases, so make sure that some sort of record of them exists, as they can often be reused.

The following is a good guideline to follow when coming up with cases:

If the problem is recursive, a base case such as an empty
1 or 2 "average" cases that have no special things being tested, will usually be found as example cases in the question
A case for each "path" in a problem. For example, if trees are being tested, a tree that only goes left and a tree that only goes right could be two good cases
Corner cases. This varies a lot depending on the question. If there are bounds on the question, be sure to check up to that bound (but not exactly on the bound in case of hardcoding)
Think of solutions that students may come up with. Use cases that are incorrect but these solutions may pass.

Be sure to not double up on test cases. Every single test should be testing one thing in particular. Do not test two corner cases at the same time, or any two possible problems at the same time. A general rule is that students should get marks for solving each type of case.

In the end, every question is different, so do not try to follow this too strictly. Use your best judgement when creating cases. These are guidelines, but cannot be applied to every single type of problem.

Test your Test

Testing the test you have written is by far the most important thing to do. This can be done with the rst command as in step three of the previous section; just change the pt to a 0. Make sure to test solutions other than your own. These are some useful steps to go through.

Use the instructors solution and other ISA solutions if available. This will help ensure that you understood the questions properly and wrote correct tests.
Test incorrect code against the test you have written. Check that this code missed a few tests and determine if the mark received was reasonable.
Lastly, test your correctness tests on a few students that have submitted. This will further help to catch any errors that may have been made.

It is impossible to prevent every issue that can happen. Be prepared to fix something that goes wrong and make it a learning experience for the future. If you would like to re-evaluate your tests after assignments have already been collected, you can make a copy of the test.0 folder called something like test.1 or test.0.q2_check and make changes in that folder before using it for rst/distrst.

If there are a lot of test cases before the question you're actually testing and rst is being slow, you can make your question be the one tested first. Since rst goes through the testing folders in order by name, you can do a command like "cp -r Q4 Q0004" so that the results for Q0004 are the ones run first. Remember to delete the temporary folder once you're done.

Keep it Fair

Testing for correctness is black box testing. This means that you should assume nothing about the implementation of the question and test based only off of the assignment specifications. This creates a fair and rigorous test suite for all students. With many students, comes many different solutions. Basing the correctness tests off of a particular solution only ensures a good test suite for solutions with a similar implementation.

Regarding the following paragraph: You can use the (value ...) setting in options.rkt to adjust how much each test case is worth. To do this, create an options.rkt file in the same folder as your test.rkt, and put (value ...) inside.

<strike>Remember that RST and MarkUs do not distinguish between your test cases; every test you add is given a weight of exactly 1. As such, it is best to make the tests as directed and equally spread-out as possible. What this means is that one test should have a single, distinct purpose. This may be testing a specific edge case, a largish input or something else. The important thing is to not overlap multiple edge cases with one test. This should actually be made into a different test called a corner case. It may be possible that four tests can fully tests a question for correctness, but a suite of directed tests is 16 tests. In this case a student could get 0/4 test right when they really deserve 12/16. It is often challenging to develop directed tests, but is crucial in giving students a fair evaluation. If you have multiple tests which consider different cases but follow the same idea, you can combine their weight into a single test case using an (and (equal? test1 expected1) (equal? test2 expected2) ...) statement, so that part of the question is not too heavily weighted.</strike>

Test valid input only. Surprisingly, it is a very common place for issues to arise. After developing a test suite, re-read the assignment and all clarifications from the instructors on Piazza, then use that to determine if every test contains input that is valid and/or fair for the students to consider. Doing black box testing is another good way to make sure you are testing valid input.

Another good rule is to never use the examples given in the assignment as correctness tests. Students may hardcode information from the example into their solution if they do not understand the question. This should be penalized; the easiest way to do that is to test using input that is not related to the given example in any way.

Finally, always ask first before testing efficiency. Efficiency is a messy topic in CS135. It is never explained formally, yet students are sometimes expected to have an idea of how to make something "more" efficient. By default, do not test efficiency. If you think that a problem should have one or two efficiency tests you must ask the instructor before including it. Testing efficiency usually involves creating a very large input and possibly lowering the time or memory limits for a question (see the section describing options.rkt on how to do this).

Make Testing Easy

Write tests in DrRacket as you do the assignment, this will just save you time in the long run. It is much easier to make thorough tests when you have your head wrapped around the problem. It is also more convenient to validate your tests in DrRacket than it is using rst on the course account. Lastly, you can easily convert you solutions to the sample solutions, just remove some of the harder tests you have written.

Do not be afraid to make constants and helper functions. For example, if the student has to write a solution to a card game, then a constant for every card and a few hands may be convenient. You can also make functions to compare the students result with the expected result. This is very helpful if your check for correctness needs to be more complicated than an equality statement. Making test constants and helper functions will be helpful on some of the earlier assignments, and necessary by the last assignment. You can even make functions which generate large test constants, like a height 10 BST (make sure they're valid!). If the assignment does not involve any structures it is very easy to make constants and helper functions following the steps bellow.

Create a racket file, say lib.rkt, containing constants and functions you want to use in the test.rkt file.
Add a provide line at the top of lib.rkt and specify all the constants and functions needed.
Add lib.rkt under the provided directory.
Add "lib.rkt" in the modules option for the options.rkt file for the question.

You may want to ask instructors if they are reusing questions from previous terms. If they are you can search the archives for the old test suites, just be sure to test them to make sure they work.

Pairing Testcases

Always pair test cases for empty and Boolean values with a different case at the same time. This is to make sure students do not get marks for a trivial function. For example, suppose students were asked to write a predicate good?. Have a look below at a potential submission:

(define (good? x) false)

This should probably receive no correctness marks. However, if half of the tests expected false this student would get half the marks for doing almost nothing. The way to fix this is to write test similar to below.


(result (and (good? input1) 
  (not (good? input2)))) 
(expected true)

If a student always outputs the same answer they would fail a test like this. With a correct implementation the student will still get the test correct. For the "helper" test, try to stick to really obvious cases which interfere as little as possible with the actual result. The point of the second test is to make sure that there exists a case where the student's function produces a different value, and it should definitely not introduce another check which makes it harder to pass the case. A basic test is always a good candidate to add in, because you can be relatively confident that the student passed it. For more complicated data structures, you may want to pre-emptively deploy two basic tests (one for true and one for false) so you can use them in the private tests and work under the assumption that students already passed them.

Topic revision: r2 - 2020-12-22 - JimmyPham

ISG Web

ISG Web Home
- Changes
- Index
- Search

Webs
- AIMAS
- CERAS
- CF
- CrySP
- External
- Faqtest
- HCI
- Himrod
- ISG
- Main
- Multicore
- Sandbox
- TWiki
- TestNewSandbox
- TestWebS
- UW

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit

Instructional Support Group, David R. Cheriton School of Computer Science, University of Waterloo