Hybrid analysis: How human and computer analysis of log data can work together to optimize results

To prepare students for the 21st century workforce, we must teach them to work effectively in teams, keeping in mind that team members may be in the same room or on different continents. Although working collaboratively is widely recognized as an effective and efficient way to use a company’s manpower, most classroom work continues to stress individual performance rather than tapping the collective synergy to be found in teamwork.

The Teaching Teamwork: Electronics Instruction in a Collaborative Environment project, funded by the National Science Foundation, addressed the mismatch between the importance of teamwork in the modern STEM workplace and the difficulty of teaching students to collaborate while also evaluating them individually.

We developed a distributed computer simulation that presented a problem to teams of students and recorded their actions as they attempted to solve the problem. The simulation took the form of an electronic circuit, divided into subcircuits. Working on separate computers, each member of the team was presented with a portion of the circuit, along with test equipment, an onboard calculator, and a chat window for communicating with teammates. The subcircuits were connected over the Internet in such a way that changes made to one subcircuit affected the state of the others. The students could observe, but not alter, their teammates’ subcircuits. Their collective goal was to place a given circuit into a particular goal state. There were four levels of increasing difficulty (by virtue of the information available to each team member). The challenges were designed such that they could not be achieved by individual students working alone, but required them to collaborate.

The computer program logged all student actions as a time-stamped record: their changes to the circuit, calculations, measurements, and communications. We analyzed each team’s performance to evaluate the overall content knowledge of each member as well as their collaborative problem-solving skills. We were interested in determining not only whether or not the team members succeeded or failed at each level, but also in how they did so: what strategies they employed, and what role each member of the team played in implementing that strategy.

Algorithms designed to detect strategies

A typical challenge generated between 10,000 and 20,000 individual log items, so our dataset comprises hundreds of thousands of log items. Since hand analysis of such a large database would be impractical, we created and refined a data management tool to filter, analyze, and visualize the data, and we implemented algorithms designed to detect the presence of specific strategies.

It is easy to determine whether or not a team succeeded at a given level simply by observing the subset of actions whereby team members submitted answers. These actions are of two kinds: the submission of voltage results (in which students click “We got it!” when they believe that they have achieved the goal voltages) and, at the higher levels, the submission of numerical values for the external voltage, E, and the external resistance, R0.

Obviously, correct submissions indicate success, but further analysis revealed the strategy that the team had employed to achieve their goal. Thus, in the case of E and R0 submissions, we frequently observed a strategy of “guess and check” in which students, aided by the fact that E, though random, was always chosen to be an integer, would repeatedly submit different guesses for that variable and then, having found the correct choice, sometimes repeat the process by repeatedly guessing R0, using only the permitted resistor values. Such an efficient-but-suboptimal strategy was easily discernible from the log data. Alternatively, if the answers for E and R0 were only submitted once, the inference was that their values had been determined in some other way. How, exactly, that had been accomplished could then be determined by an examination of the chats, resistor changes, calculations, and measurements performed by the team.

The nature and degree of collaboration between team members could also be inferred from an analysis of their actions. For instance, a common strategy for achieving the goal voltages was for one member of the team to calculate the goal resistances for each of the members that would result in the desired outcome and then to communicate that information to his or her teammates. This strategy can be inferred by the observation of a well-defined chain of actions: first, a series of chats that contain the voltage goals of each team member, followed by a set of calculations by one member that compute the goal resistances of each member, followed by one or more chats that communicate those goal resistances, and finally by the appropriate resistance changes. Each of these actions can be identified automatically by the computer, simply by comparing the numbers involved to the appropriate targets.

But the nominal strategy described above for achieving the goal voltages was not always followed by the teams. In fact, we frequently observed in the data an alternate strategy in which team members would iteratively change their resistance in an attempt to achieve their individual goal voltage. This behavior, which we dubbed the “voltage regulator strategy,” actually converges rather quickly to the desired state of the circuit, provided that the team members don’t all change their resistances at once. So a certain amount of coordination, designed to impose a turn-taking strategy, was required for a team to be ensured of success. This kind of regulation between team members could be difficult to infer automatically, since to do so unambiguously would involve parsing natural language chats such as “Leave yours alone while I change mine!” but could sometimes be inferred by an analysis of the pattern of resistor changes.

Analyzing chats that contain text and numbers

The most difficult student actions to analyze, of course, are their chats—unconstrained strings of natural language that are resistant to automatic parsing. To address this problem we made use of the fact that the chats take place in a specific context: either the achievement of a specific goal voltage across a resistor or the determination of the values of the variables E and R0. Each of these tasks necessarily involves the communication of numeric quantities, and (assuming that students don’t resort to spelling out the numbers!) those quantities can be identified automatically. Having isolated one or more numbers within a given chat, the system then compares them to known quantities, such as the instantaneous or goal values of resistances, voltages, or currents, and when the match succeeds it adds that instance to a growing database of variable references. This enables us, in post-processing of the log data, to determine automatically whether or not all the members of a team have communicated their goal voltages.

This hybrid method of analyzing data is in contrast with traditional methods for parsing natural language utterances, whether written or spoken, that make use of Bayesian statistical techniques for detecting patterns without explicitly relying on knowledge of the context. Such techniques are powerful and generalizable, but they require a volume of data far in excess of that we were able to gather on this project. By directing our attention to the portion of the chats that we are able to parse unambiguously, we have traded generalizability for analytic power — a tradeoff that we have found worthwhile and that we believe other researchers may as well.

As an example of how we used this technique, we developed a scoring rubric for chats, assigning one point to a variable that is known to all team members, two points if the variable is known only to the team member that referred to it, three points if the variable is not known to that team member but is known to another member, and four points if the variable was not previously known to anyone (typically, the result of a calculation). This rubric rewards collaboration as well as content knowledge, and we consider it a first step toward the goal of characterizing both of these constructs and assigning them both to individual students and to the team as a whole.

In the long run, we found that to optimize our information gathering we had to rely on a combination of human and computer-driven analysis of the log data. (Learn more in the case study analyzed in “What Happens When Students Try to Work Collaboratively?” in the spring @Concord.) Computer-based analysis can be very useful — for example, for recognizing the significance of a number embedded in a calculation or a chat or for characterizing resistor changes in the context of voltage goals — but proved inadequate to replace the kind of insights that human evaluators well-versed in the requisite content knowledge could bring to bear.

Next steps

We are continuing to revise our data analysis software for the NSF-funded Measuring Collaboration in Complex Computerized Performance Assessments project with collaborators at ETS and CORD. In particular, development will focus on creating the ability to search log files for transactions — sequences of two or more coordinated actions (e.g., a message sent with an R value followed by, within 30 seconds, a resistor change by a teammate to that value), as well as the creation of a user-friendly interface that will enable researchers to specify such patterns as targets for retrieval. We expect that this will be an important contribution to the handling of large log files, unleashing the power of automated analysis without ignoring the human element, and providing researchers with a powerful new tool with which to pose and answer questions as they arise from thoughtful and perceptive interaction with the data.