Could we build a standardized testing system from crowdsourced content and open-source technology?
The two goals of this idea – to create data that teachers can use to inform their instruction and that can aid in evaluating teachers and schools – are familiar. But in the idea I describe below, every element of the system remains close to the classroom, with ongoing community input, and allows for improvement and correction, all at a relatively low cost, mostly paid locally.
Briefly, teachers in assessed content areas, following approved guidelines, would assemble, from a publicly created question bank, their own quarterly assessments which measure students’ understanding of whatever standards the state uses. Teachers grade their own students’ work and upload scores, again following state guidelines. School-wide results and the tests themselves become publicly available.
We would use an open-source website, the prototype of which was selected by stakeholders in a competition featuring a cash reward.
Sound interesting? Let’s drill deeper.
The Crowdsourced Question Bank
A crowdsourced question bank could be created in which anyone (a teacher, engineer, parent, or student) contributes a question that he or she thinks a student should be able to answer. Contributors could go to the website and click the “Submit Questions” link, then navigate to the correct grade level and subject. Once there, contributors would identify the specific standard the question addresses as well as an estimate of its difficulty, then submit the question and answer for review.
I’d say that teachers and students should be able to submit questions for free, but anyone else should pay a fair service charge. That would help pay for the system as well as discourage nuisance questions.
A cadre of qualified and compensated teachers could review each question and judge its pedagogical and topical appropriateness. They could accept or reject questions outright or modify them for clarity, difficulty rank, grade-level, alignment to a standard, and so forth. Once a question was approved by, say, three teachers, it would become publicly available for use. I think a question should be able to withstand one rejection but not two. (Keeping track of how often a question reviewer’s judgments align with two other reviewers could aid in evaluating the reviewers themselves.)
Details such as qualifications, compensation, and expectations of question reviewers would have to be negotiated. One possibility might be to have a fixed cadre of maybe 200 teachers, statewide, working for a stipend of maybe $2000 per year, who were expected to review a minimum number of questions. Another possibility would be to have a flexible number of teachers being paid maybe $50 for every 100 questions they review.
Using the Question Bank
The site would have to be easy to use for everyone. Question reviewers must be able to quickly access submitted questions and return their evaluations. A good feature would be to sort submitted questions into three categories: those that no one has looked at or have one rejection, those that one teacher has approved, and those that two teachers have approved.
Assembling tests or assignments should be as easy as online shopping. So searching by obvious features such as grade, content and standard is a must. But we should also be able to rate questions by quality, note how we’ll use them, and add comments. Then we could incorporate “customer” feedback into our decisions about which questions to select.
I should also be able to assemble an entire document one question at a time, in batches, or all at once with a form in which I set parameters like the number of questions from a standard and their level of difficulty. For that matter, why not have an “All in One” link that assembles an entire quarter, semester, or year-end final that complies with state requirements?
Regardless of how I create a document, I could then download, print, and make copies – which is about what I do for assignments and test anyway. (Except that I usually write my own exams from scratch.)
In the name of transparency and to help students study, I think the public should have complete access to the all these features.
Creating the Open Source Website
States could get really creative in producing their websites. For example, a state could offer a BIG, one-time, cash reward (not a contract!) for the best working prototype. A judging team could be made up of compensated representatives from all stake-holding groups. Anyone could submit a prototype: companies, freelance programmers, college students working in an advanced design class, and even high school kids.
Once a winning prototype was selected, the reward would be paid and the source code would become publicly available so that talented programmers could make improvements – sometimes at the state’s request, sometimes unsolicited. Really good improvements could receive a stipend based on their quality. With so much code publicly available, states could look around at what others are doing and freely borrow the best ideas.
(I have to admit to the fantasy of a high schooler stumping the pros and becoming a millionaire with a winning design.)
The final challenge of whether such a publicly produced standardized testing system would be feasible is whether it efficiently produces valid data that can be reasonably used. I think the technical considerations above are a good start, but what about the human element?
Measures would have to be in place to ensure consistency in delivery, grading, and uploading results. For example, maybe l would be expected to deliver a quarterly algebra assessment given during a window that extends from two weeks before the end of a quarter to two weeks after. The test I created would align with the standards to be covered that quarter. It would also have a set number of questions for each standard from each level of difficulty. All the questions must come, unedited, from the question bank.
I would deliver the test under approved conditions. Maybe, in algebra, I could only use questions I haven’t used in homework or previous tests; students could only use approved calculators; and they could use both sides of a single sheet of paper with their own notes. Students would be expected to finish the test in one sitting. This last would require some site coordination because it’s unlikely any meaningful quarterly assessment could be completed in a single period.
I’d say that students should be able to retake a quarterly assessment once or twice for a higher score – provided they were given a different version of the test and all retakes were completed in the time window.
After the test, I would grade my students’ work using a state-improved rubric and upload the results and a copy of the test(s) by the fourth week of the next quarter. I would store the students’ results for a fixed amount of time.
To address ethical fidelity, my principal and I would sign affidavits that I followed the approved guidelines. I would immediately report any accidental failure to do so (for example, using an unapproved calculator) as soon as I realized my mistake. Then, I would take prescribed steps to remediate any consequences of the error. Any intentional failure to comply with regulations could, after due process, carry heavy penalties, up to losing my teaching credentials.
I see some kind of outside auditing as desirable – at least to assure the public’s confidence in the process. Auditing might include having a third party, at random intervals, regrade a representative sample of my students’ completed tests to see whether I showed bias.
Using the Data
Knowing my students’ performance on previous work and knowing what the quarterly final would be would help me match my teaching to my students’ needs. How they do on the final assessment would probably not be much of a surprise, given that I work with them every day. It would be interesting, though, to see how my students fared compared to students from other schools. It’s true that tests wouldn’t be identical, but given the constraints in how they were created and delivered, they would be close enough to draw some reasonable conclusions.
And I would feel much more confident if scores used from this system were used as part of my evaluation and deciding my school’s grade than I have been with scores produced from the current system.
If anything like this were to take hold, I hope its implementation would be slow and thoughtful. For example, the winning prototype could be introduced to limited sites, one or two grades at a time, and covering only one or two standards. That would allow for us to learn early enough to improve the system elements (both technical and human) which worked best and worst.
Questions abound at every step of my thinking. Some are mundane – How would we assure that the submitted questions and the websites weren’t already someone’s intellectual property? Other questions are skeptical – Isn’t this still just a glorified version of teaching to the test? Still more are philosophical – Why should we have any standardized testing at all?
I don’t pretend to have definite answers to all these questions, and I’m sure you have more. But I do think a standardized system created from a crowd-sourced question bank and using open-source technology is worth debating and maybe pursuing, and I would love seeing lots more solutions-oriented questions and comments.
After all, even if it takes ten years to get something like this going, we would have a standardized, proven, and self-improving means of measuring student achievement.