Prof. finds major faults with teacher evaluation process

This is a two-part essay on the role of student evaluation on teaching. This part addresses issues related to its educational merit. Next week, I will address the role of the faculty in establishing assessment and evaluation policies.

Representatives of the Office of Institutional Research, Assessment and Effectiveness visited the Faculty Senate Committee on Faculty Affairs last year to work with the committee on formulating “questions” to be listed in a university-wide online “Student Evaluation of Teaching.”

The goal of the IRAE office is to establish a university-wide uniform system of SET so that the administration can conduct comparative analysis of faculty course performance as perceived by students. We should applaud the goal of ensuring the delivery of quality instruction, but there must be a better way. Many members of the senate committee expressed strong opposition to the IRAE plan, questioning the merit of Drexel having an “online, centrally administered, across-the-board, uniform, one-size-fits-all, centralized depository system” of evaluations.

Instead, I would like to propose we 1) establish a faculty-based University Advisory Committee, per Article 13 of the Board of Trustees-approved Charter of Faculty Governance, to formulate a single holistic university-wide assessment and evaluation policy for our educational programs, integrating all current assessment activities, including a professional evaluation of instructional effectiveness and student evaluations; 2) convert back to in-class evaluations, to be conducted periodically for selected courses and analyzed by the course instructor, who submits a summary to the department head during the faculty annual review. Student evaluations should be used to help instructors improve their teaching and, when necessary, for improving the quality of instruction (e.g., instituting a faculty renewal program), rather than imposing economic consequences or the other ineffective disciplinary measures currently practiced.

The SET concept was introduced into academic life in the United States in the early 1970s, as a result of heavy pressure from parents and legislators. With the ever-rising cost of tuition, academic institutions were expected to demonstrate their responsibility and accountability toward taxpayers and parents. Academic administrators often adopted the approach of simply asking the ‘customer.’

SET quickly morphed into Student Satisfaction Surveys. In the process, the distinction between student learning and student satisfaction became blurred. Treating students as ‘customers’ is an academic misnomer: I always declare to my students, publicly, that I consider them my junior colleagues.

The advent of the online toolbox led administrators to use online-generated student surveys. Digitized information is the most cost-effective and consumes the least time to procure and process. It also provides the least nuanced reports, which managers like, but this undermines the basic values of academia.

Although most academics and scholars consider SET a flawed instrument for improving instructional quality, SETs are routinely used in academia in lieu of evaluating student learning — unlike the teaching evaluation approach implemented in the K-12 system.

Many scholarly studies show that SET is influenced by numerous intrinsic and extrinsic factors. Among the intrinsic factors one may include, for example, the various course categories: mandatory vs. elective; demanding vs. relaxed course requirements; course level; lecture vs. recitation; course structure; the grading rubric, student expectations; student general academic preparedness, etc.

SET also depends upon many extrinsic factors such as course schedule (day of week and time of day), cramped vs. spacious classrooms, instructor’s dress, humor, gender, race, age, instructor-student “chemistry,” instructional resources, etc. Many more variables could be added to both categories.

Arguably, SET diminishes course quality, damages the academic program and diminishes student professional prospects. At times, the SET affects instructors’ morale. Too often, instructors adjust courses merely to reduce irrelevant negative comments. Too often SETs are considered instructor popularity contests while potentially dumbing down course quality. This is unprofessional, irresponsible and un-academic.

Many question the appropriateness of student evaluations for measuring teaching effectiveness. Harvard professor Harvey Mansfeld wrote that “course evaluation by students … undermines the authority of professors. … What at first might be justified as useful feedback from students ends up distorting the relationship between professors and students.”

Each of the many studies addresses a particular facet related to the complex issue of SET. Yet, most reach the same conclusion, questioning its merit and legitimacy. Taking them in the aggregate leads to the clear conclusion that SET, as proposed by Drexel’s IRAE, will have a negative effect.

There are “tricks” to attain excellent SETs, which can be found (via the Internet) in “how to” articles.

Are other professional communities subjected to such non-professional, customer-based scrutiny? Are lawyers, physicians, architects, artists, economists, engineers or scientists? Teaching is not merely a vocation –– it is a profession, and it should be treated as such.

Therefore, to defend the integrity of this profession, instructional performance should be assessed and evaluated primarily by professionals, as is recommended by the American Association of University Professors.

Such an approach is implemented in Pennsylvania throughout the K-12 educational system, where there are well-codified processes (such as the Danielson Framework for Teaching) for evaluating teaching effectiveness and quality, and where students and parents have little input. Why is it that a 17-18 year old student has no say as a high school senior but is qualified to pass judgment on faculty teaching effectiveness as a university freshman?

The literature also shows that the introduction of SET into academia has directly caused grade inflation. For example, Henry Rosovsky, former Dean of Faculty of the Arts and Sciences at Harvard, wrote that: “Research has shown that grades were significantly correlated with student ratings of faculty performance. … Thus … good evaluations could be partially ‘bought’ by assigning good grades.”

Grade inflation is detrimental to academic quality. Several universities have taken actions against grade inflation, e.g., instituting rigorous grading rubrics. Drexel remains dormant.

My own analysis of GPAs in my department, of many thousands of mechanical engineering seniors graduating in the 1980-2012 period, indicates significant grade inflation. Arguably, this is partly a result of implementing online SET in the College of Engineering, via the Academic Evaluation, Feedback and Intervention System.

Beyond the direct effect of SET on grade inflation, having SET as the sole measuring system of teaching effectiveness often compels instructors to lower their expectations of student performance, thus dumbing down course content. Instead of challenging our students to the best of their ability, we too often succumb to their pressure to lower the bar. It is my experience that students often adjust their efforts to ensure that they pass over the bar at the height set by the instructor.

It is presumptuous of any untrained person, who studies a given topic for the first time, to evaluate an instructor’s qualifications, clarity, and expectations, or to evaluate the germaneness of course content, structure, number and level of examinations, grading policy and more. Such SETs are merely superficial evaluations that should be considered student satisfaction surveys, like the customer surveys my car dealer kept sending me.

Further, SETs submitted online are often of questionable value and statistically meaningless. The average rate of return in my department is 25-33 percent. The distribution of many of the SETs is bimodal. Those who prefer the easy way are dissatisfied with a challenging instructor, while those who expect to be challenged are dissatisfied by the same instructor for not challenging them enough.

Those who dropped the class before week seven or whose grade is below their expectations are also dissatisfied; they all have to blame someone — who but the instructor? Do students who plagiarize or those who use their smartphone during final exams, or those who multitask with their laptops during lectures, deserve “voting rights”?

Therefore, in too many cases the distribution is bimodal: the conventional wisdom among faculty and students is that it is primarily those who either “hate or “love” the instructor –– for whatever reason –– who respond to online SET. What conclusion should be drawn from the SET returns of a class of 25 students with an average response rate of 33 percent, which is about eight students?

No wonder that most tenured professors do not care much about their SET scores, while most students do not care to respond, and the teaching faculty (approximately half of full time faculty) either have to shut up or be shipped out.

Young and inexperienced students are simply not equipped to evaluate the basic aspects of course quality, just as I am not qualified to determine if my physician administered the correct treatment. Similarly, it would be presumptuous of me to evaluate the effectiveness of my lawyer, accountant, or financial advisor based simply on my dilettante expectations or perceptions.

Such surveys, filed online, outside the classroom, often while multi-tasking, constitute an affront to the professoriate. Professors are experts who have spent over 12 years in extensive post-secondary learning with five, 10, 20 or more years of instructional experience.

I do not oppose SET per-se: in fact, in each class I conduct an extensive and comprehensive in-class SET. The response rate is normally 98 percent, compared with my 15 percent average for the college-administered online SET. I analyze these SETs myself, manually. It takes a few hours but is highly instructive. I often conduct midterm SETs as well.

After so many years, I am perplexed by the fact that my in-class SETs are consistently markedly different from my online results. Although we are enslaved to digital technology, I still prefer the in-class surveys.

What I oppose is having a University-wide, online, centrally administered, centralized-depository, uniform, ‘one-size-fit-all,’ flawed instrument as the only measure for evaluating my teaching effectiveness, reviewed at will by any academic administrator who has the access key to that SET depository.

There may be a few instructors who are poorly prepared, do not care, are dismissive, miss classes and who prioritize their research over teaching. These are valid points and such cases should be rapidly identified and corrected. However, the elaborate SET system proposed by IRAE, shaking the entire system, is like shooting a fly with a howitzer. It causes significant collateral damage to the educational enterprise, as described earlier. You cannot see, smell or feel the “damage” –– but the rumbling underneath will ultimately surface.

Next week, I will explain the role of the academic administrator in teaching evaluation and why faculty should be the primary driving force in formulating and implementing a comprehensive and holistic university-wide assessment and evaluation program, in which teaching performance is one of many components.

Jonathan Awerbuch is a professor of mechanical engineering at Drexel University. He can be contacted at [email protected].