Development of an Automatic Grading System Based on Energy Circular Hough Transform and Causal Median Filter

Optical mark recognition machines are used for performing automatic grading of the exam papers that have multiple choice answers. They use some mathematical operations to achieve recognizing the answers marked by the ones who take the exam. In this study, an automatic grading system developed by the use of Hough transform and a filtering system is proposed. The system introduced brings a new perspective for grading the multiple choice exam papers. It focuses on adapting the energy based circular Hough transform for identifying the marked answer bubbles. The procedure is also combined with a data filtering method known as casual median filter. The filtering system, which targets for detecting the outliers and removing them, is commonly used by the robotics and mechatronics researchers for cleaning the unwanted data. The whole system is verified by testing more than 2500 exam answer sheets of the Technical English course offered to the second year Mechanical Engineering students of the Bulent Ecevit University located in Zonguldak, Turkey. The system performance is also tested by observing the results obtained in three different case studies designed and conducted for different


Introduction
Education opportunities have been increased all over the world in the recent years and are this increase is expected to be continued. In parallel with the rise in the number of opportunities, colleges, universities, students, courses, examinations, etc. has also rapidly grown. To catch up these rises and perform grading of the exams quickly and accurately, the multiple choice exams are generally preferred and grading operations are performed using the automatic grading systems / machines. As the automatic grading systems, optical mark recognition (OMR) machines are available in the market and they are used for scoring the exam answer sheets. Although these systems are one of the best alternatives for grading the exam papers, setting up and maintaining them require an amount of budget. The algorithms and mathematical models running inside the OMR machines are not also provided to the researchers and engineers who work on developing automatic grading systems. In the literature, some alternatives to the OMR machines have been proposed by the researchers. They have developed some mathematical algorithms aimed to detect the marks in the answer sheets and grade the exam paper. The procedures mostly use the digital data obtained by scanning the answer sheets via a paperscanner. Detection of the answer options, which are marked by the student, is achieved by using one of the image processing techniques. In this paper, a new approach for grading the exam papers having multiple choice answers is proposed. The study focuses on adapting the energy based circular Hough transform to develop an automatic grading system. The system introduced has five main steps. In the first step, the information on each answer sheet of the exam papers is converted to digital format. In other words, each answer sheet is scanned using a paper-scanner. In the second step, the answer choices, which are indicated by the circles (bubbles), are detected. In the third step, a filtering process adapted into the algorithm is used to remove the unwanted data. It does a safety check such as whether the detection system detects the correct circles or not. The filtering system is built by using the principles of casual median filter method. The next step uses the procedure developed using the energy based circular Hough transform technique. In this step, the circles (bubbles) marked (filled with a pencil) by the students are detected. To get success in this step, a line by line technique is created and accompanied with the developed algorithm. Completing the detection of marked circles, the automatic grading system passes to the final stage in which the correct answers are counted and each exam paper is graded. The automatic grading system introduced in this paper was tested using more than 2500 students' exam answer sheets in the Technical English course offered in the Mechanical Engineering Department of the Bulent Ecevit University, Zonguldak, Turkey. This paper is formed as follows: In Section.2 relevant literature studies are reviewed. In Section.3 the details of the energy based circular Hough transform method are given. In Section.4 the principles of the casual median filter, outlier detection and removal operations are presented. In Section.5 experimental results are illustrated. The paper is concluded in Section.6 with an analysis of the results.

Literature Review
Literature studies are reviewed into two groups. The first group focuses on the studies about grading exam papers automatically. The second group is related to the circular Hough transform, energy methods and their applications. Abdu and Mokji [1] proposed an automatic grading system which uses the principles of Hough transform technique. An empty ______________________________________________________________________________________________________________________________________________________ 1 Mechanical Engineering Department, Bulent Ecevit University, incivez Mah., 67100, Zonguldak, Turkey Corresponding Author: Email: bayar@beun.edu.tr answer paper is scanned and stored in the first stage. In the second stage, the region of interest algorithm integrated with the Hough transform detects the answers marked. While preparing the answer sheets, a set of rules has to be followed. Chinnasarn et al. [2] built an optical mark recognition system. In this system, a microcomputer and a scanner are used. An empty answer sheet is scanned and digital library is created. This information is used while performing detecting the answer marks. Fisteus et al. [3] proposed a grading system for multiple choice exam answer sheets. A low-lost system is designed and it is considered that it should be easily portable. In addition to detecting non-circular marks, the system recognizes hand writings. The developed system is constructed based on classical computer vision techniques. Nguyen et al. [4] created a multiple choice test scoring system. The system uses the data coming from a camera. An image processing technique commonly used by the robotics researchers is adapted to the system. Rakesh et al. [5] developed an optical reader system. A scanner, computer and a simple image processing algorithm are integrated to build the grading machine. Sattayakawee [6] developed a scoring system for non-optical grid answer sheets. Projection profile and thresholding techniques are adapted to the proposed system. Zampirolli et al. [7] proposed an automatic correction system for multiple choice exams. The system uses an image processing technique for detecting the answer options. Each answer sheet is scanned and transformed to binary format. Then simple morphological operations are used for segmentation. Atherton and Kerbyson [8,9] proposed a novel formulation of the circular Hough transform. They called their approach the coherent circle Hough transform. In this approach, phase to code for radiuses of circles technique is used. They used their approach for an isolated circle with Gaussian noise. Smereka and Duleba [10] developed a circular object detection system which uses modified Hough transform technique. The system introduced does an improvement on the detection of low-contrast circular objects. In order to increase the efficiency and improve the computational complexity, a modified version of the Hough transform is developed. Cherabit et al. [11] introduced a procedure for iris localization using circular Hough transform. The system focuses on developing a robust technique for specifying iris location and its features in frontal images. To achieve this objective, circular Hough transform is adapted into the system. Ito et al. [12] developed an eye detection system by adapting circular Hough transform and histogram of gradient. The circles of which radiuses are not initially known are detected first via using 2-D Hough transform technique. Then likelihood of each eye is evaluated by using the histogram of gradient and support vector machine methodologies. Rizon et al. [13] proposed an object detection system based on circular Hough transform. In this study, circular Hough transform technique is adapted to detect the presence of circular shapes in an image. Do et al. [14] presented a study related to adapting circular Hough transform for fingerprint tracking objective. The fingertip tracking system proposed is accompanied with the particle filter technique. The system lets the user observe the finger movements. Multi-scale edge detection model is offered to detect the edges. Circular image features are determined using a map created by the circular Hough transform. The literature studies indicate that multiple choice exams, their answer sheets and automatic grading systems become dominant for fast and accurate assessments in the learning environments. Automatic grading systems provide easy-to-use and easy-to-adapt opportunities to educators and examiners however they are expensive and need maintenance. They also use special forms and paper formats. Moreover the use of specific marks, circles or bubbles in their answer sheets are required. These systems don't allow the researchers who are willing to make some enhancements on them as well. What distinguishes this work from the ones presented above is that a new methodology for the automatic grading systems in a systematic and comprehensive way is addressed. Systematic because the study starts form the perspective of the end user and look for the solutions that use only low-cost equipment and gives fast, and accurate results. Comprehensive because the proposed system combines modeling and object detection techniques that have been commonly used by the robotics and mechatronics researchers. The system also uses a data filtering approach which is strongly advised and used by the sensor fusion investigators. As a result, the procedure proposed in this paper offers a new approach for building an automatic grading system which would be easily adapted. It also provides some highlights for the developers who wish to make improvement on this subject.

Circular Hough Transform
There are two methods (called Circular Hough transform and Elliptical Hough transform) commonly used by the researchers to obtain shapes of circles and ellipses. In the image processing, video processing, computer vision, robot vision and machine vision tasks, detecting of lines, circular and elliptical shapes are important since any geometry can be defined using such kind of shapes. In the literature, there are also other methods constructed based on converting gray scale images to binary format. They use some edge detection approaches and mathematical methods for defining shapes. Hough [15,16], Duda and Hart [17] presented the Hough transform method for detecting parametric curves in an image. This method has been used as a powerful curve definition tool since it was developed. The basic idea behind this method is a voting process. The process creates a parameter space and the voting system maps edge points in an image into manifolds in the parameter space created [10]. Each peak observed in the parameter space indicates the parameters of the curved shapes detected. A circle with radius R and center (a, b) is defined using the following equations: When the angle θ sweeps through a full 2л, the points (x, y) creates a perimeter of a circle. Suppose an image includes many points. Some of these points may fall on perimeters of circles. In order to detect each circle, the process should find parameters of a, b and R so that each circle can be defined. When the radius of a circle is known as R, the searching process should be performed in 2D. This allows that computational time and memory usage can be decreased. The search method with fixed radius R is shown in (Figure.    The search process, shown in ( Figure.1), runs a voting process. This process can be used for the case where multiple circles with known radius R are located in an image. As illustrated in ( Figure.2), the voting process can be run successfully for detecting the circles and their centers in the case of multiple circles.

Outlier Detection and Data Cleaning Filter
Outliers are observed frequently in any measured data. This is a significant practical problem and cannot be modelled always using traditional smoothing filters developed to remove the unwanted data and effects of noise [18]. In the robotics, control and automation, sensor fusion and data mining applications, outlier detection and removal approaches are one of solutions commonly used for data cleaning purposes. In our case while detecting circular marks in the exam answer papers, there are always possibilities to detect some circles which are undesired or misplaced. Furthermore, detection algorithm may detect some circles that should not be there (i.e., there are no such circles in the real answer sheet). A real example that shows this case in an exam answer sheet is illustrated in (Figure.8). In this figure a circle indicated by "Circle -1" is detected. However there is no such circle in the real answer sheet. In the same figure, there is another circle indicated by "Circle -2". This circle (bubble) should be detected by the algorithm; unfortunately it is not detected. In addition to programming and modelling deficiencies and errors, problems shown in ( Figure.8-top-right) may form due to low scanning quality, inefficient scanner light, charcoal strokes, piles of crumbs and the stained regions on the paper occurred after using rubber eraser. In order to find suchlike misdetections, an outlier detection procedure is adapted and a data cleaning filter is developed by following the principles of casual median filter. The filtering process is also conducted using the algorithm developed in Matlab. The flowchart of proposed procedure is shown in Figure.8-top-left. The pseudocode of the methodology is also given in bottom image of Figure.8. The filtering system is constructed using a moving data window. The idea behind the procedure can be summarized as follows: the last location information of the circle detected (Ck) is compared to the median value (Ck m ) of present and past values. Regarding to the information about the number of questions and the number of circular answer options on the answer sheet, a threshold value is defined. When the distance measured between the Ck and Ck m is bigger than the predefined threshold value, Ck is assigned as an outlier and added into the outlier library. Then the detected outlier is either removed from the system or replaced with a logical value (CL).
The outliers can be defined by using the models presented in [19] and [20]: where the data measured is shown by Ck. The nominal value of the data set is presented by Zk. The unwanted values in the data set are indicated by Ok. The value of Ck and the past values Ck-j (k ≥ j ≥ 0) are saved in a data window, Wk. The numbers of the data window is specified by N. 1 2 , ,...., The median value (Ckm) of the data set is found as: for N is odd / 2 for N is even A distance between the median value and the obtained value is defined: As long as the distance value is bigger than a predefined threshold value of Tk, the current obtained value of Ck is highlighted as an outlier. Then the outlier specified is either removed from the data set or replaced with a value (Ck L ) which is previously determined.
The result is to be stored in a filtered data set (Fk): Interested readers who wish to see the details of the outlier detection and removal processes may follow the reference study [14].

Experimental Studies
The automatic grading system presented in this study is tested on more than 2500 exam papers. Each exam question has 5 answer options. The number of questions can be 20, 25 or 50. The performance of the system is presented using the results obtained in 3 different case studies designed for different goals. In order to follow the answer bubbles line by line, a guidance system is created as exhibited in ( Figure.9). In the first stage, answer options ("A") of the first 25 questions are followed and identified. Secondly, the circle detection system is steered to specify bubbles, ("B") and this process goes on until the answer ("E") option of the question 25 is obtained. In case there is a misdetection, outlier detector grasps it and the filtering system advices a solution. When the answer sheet has two columns -i.e., each has 25 questions; the detection system is forced to cross between the columns as indicated in (Figure.9). The procedure introduced here would be easily implemented to any format where the number of columns and number of questions are more than the case illustrated in (Figure.9). Note that the starting and finishing points for the detection are highlighted with the red-filled dot and square, respectively. The crossing between the columns is indicated by a red-filled triangle. Figure 9. The illustration of the guidance system for the circle detection procedure in case the answer sheet has two columns and 2x25 questions. The starting and finishing points of the detection are indicated by redfilled dot and square, respectively. The crossing from one column to the next is shown by red-filled triangle

Case Study 1
In the first case study, the system is tested on 233 students' answer sheets of the exam papers (each of exam paper has 50 questions).
The answer sheet used in this case study is shown in ( Figure.9) (Note that the system proposed does not need to use a specific answer sheet format). One of the students' answer sheets is given in (Figure.  Completing the detection of the answer choices (bubbles), next objective is to determine which answer choice is marked in each question (using energy based circular Hough transform). The 2-D energy plots for the answer sheet (shown in Figure 10) are presented in (Figure.11). Left and right energy plots illustrate the first and second columns of the answer sheet. 3-D energy plots are also shown in (Figure.12). These plots provide more clear information about whether an answer bubble is marked or not.
To make a comparison and see the performance and accuracy of the proposed system, three different computational platforms (i.e., standard personal computers) are used. The computational platforms are specified as "CP-1", "CP-2" and "CP-3" in (Table.1) in which the technical properties of them are summarized. In this table, information about operating systems, types and speeds of the processors and RAM capacities are given.    To show the grading distributions obtained from the computational platforms, the results obtained in the first case study are presented via a boxplot given in (Figure. (Table.1)

Case Study 2
In the second case study, 86 students' answer sheets (each one has 50 questions) are taken into account. The results are given in (Table.3). The boxplot representation is also shown in (Figure.14).

Case Study 3
In this case study, answer sheets of 247 students are used to test the automatic grading system. Each answer sheet has 25 questions and the students' id numbers are also sought. Students mark their student id numbers specified by the circles and the automatic grading system is expected to identify the marked circles as well.
The proposed system could successfully grade all the answer sheets and identify the students' id numbers. It could achieve the objective of Case Study-3 with 100% accuracy. The results are given in (Table.4) and shown with boxplot presentation in Figure  15.   Figure 15. Boxplot presentation of the results of Case Study -3. In this case the students' id numbers are also detected by the system proposed

Analysis and Conclusion
Automatic grading systems used for scoring multiple choice exam papers are commonly preferred in schools, colleges and universities. They are opted to accomplish grading the answer sheets accurately and quickly. The grading systems, which are obtained in the technology markets, can only be used with some special forms and answer sheets. Other than the suggested answer forms cannot be graded by the use of these machines / systems. Furthermore, the features of the answer choices (bubbles) like shape, radius, distance between two bubbles, etc. have to follow the rules recommended by the manufacturers. In addition to these restrictions, any new approach, modelling strategy or mathematical procedure that may be used for grading operations cannot be adapted to these systems since the access to the algorithms running in is not permitted. In this study, a new and comprehensive solution for the grading purposes is introduced. It is developed by following the principles of energy based circular Hough transform method. The developed system includes also a filtering system constructed to remove the unwanted data. It is tested on more than 2500 students' exam answer sheets for verification. The performance of the system is also presented using the results obtained in different 3 Case Study. As a conclusion, the grading system introduced in this paper offers an alternative solution to the ones available in the market. It also gives some important research keys for the developers who work on developing automatic and intelligent grading systems.