Monthly Archives: September 2013

National Science Foundation funds research that puts engineering design processes under a big data "microscope"

The National Science Foundation has awarded us $1.5 million to advance big data research on engineering design. In collaboration with Professors Şenay Purzer and Robin Adams at Purdue University, we will conduct a large-scale study involving over 3,000 students in Indiana and Massachusetts in the next five years.

This research will be based on our Energy3D CAD software that can automatically collect large process data behind the scenes while students are working on their designs. Fine-grained CAD logs possess all four characteristics of big data defined by IBM:
  1. High volume: Students can generate a large amount of process data in a complex open-ended engineering design project that involves many building blocks and variables; 
  2. High velocity: The data can be collected, processed, and visualized in real time to provide students and teachers with rapid feedback; 
  3. High variety: The data encompass any type of information provided by a rich CAD system such as all learner actions, events, components, properties, parameters, simulation data, and analysis results; 
  4. High veracity: The data must be accurate and comprehensive to ensure fair and trustworthy assessments of student performance.
These big data provide a powerful "microscope" that can reveal direct, measurable evidence of learning with extremely high resolution and at a statistically significant scale. Automation will make this research approach highly cost-effective and scalable. Automatic process analytics will also pave the road for building adaptive and predictive software systems for teaching and learning engineering design. Such systems, if successful, could become useful assistants to K-12 science teachers.

Why is big data needed in educational research and assessment? Because we all want students to learn more deeply and deep learning generates big data.

In the context of K-12 science education, engineering design is a complex cognitive process in which students learn and apply science concepts to solve open-ended problems with constraints to meet specified criteria. The complexity, open-endedness, and length of an engineering design process often create a large quantity of learner data that makes learning difficult to discern using traditional assessment methods. Engineering design assessment thus requires big data analytics that can track and analyze student learning trajectories over a significant period of time.
Deep learning generates big data.

This differs from research that does not require sophisticated computation to understand the data. For example, in typical pre/post-tests using multiple-choice assessment, the selection data of individual students are directly used as performance indices -- there is basically no depth in these self-evident data. I call this kind of data usage "data picking" -- analyzing them is just like picking up apples already fallen to the ground (as opposed to data mining that requires some computational efforts).

Process data, on the other hand, contain a lot of details that may be opaque to researchers at first glance. In the raw form, they often appear to be stochastic. But any seasoned teacher can tell you that they are able to judge learning by carefully watching how students solve problems. So here is the challenge: How can computer-based assessment accomplish what experienced teachers (human intelligence plus disciplinary knowledge plus some patience) can do based on observation data? This is the thesis of computational process analytics, an emerging subject that we are spearheading to transform educational research and assessment using computation. Thanks to NSF, we are now able to advance this subject.

Scoring explanation-certainty items in High-Adventure Science

One of the questions unique to the High-Adventure Science project is what we call the explanation-certainty item set. These item sets consist of four separate questions:

  1. Claim
  2. Explanation
  3. Rating of certainty
  4. Certainty rationale

In the first High-Adventure Science project, we developed these items as a reliable way to assess student argumentation and developed rubrics to score the items, which I’ll explain below. (You can also look at Exploring the Unknown, our first publication in The Science Teacher. Check out our Publications tab for a list of (and link to) all of the publications generated from High-Adventure Science.)

Scoring the Claim item:

The scoring for this portion of the explanation-certainty item set is fairly straight-forward. Where there is a correct answer, the correct answer gets a point, and the incorrect answer gets zero points. Where there is no correct answer (because the problem is so nuanced and/or there is not enough information to make a definitively correct answer), we score into categories. For instance, in this question from the water module, there is no definitively correct answer:

A farmer drills a well to irrigate some nearby fields.
Could the well supply a consistent supply of water for irrigation?

No-one knows if the answer is yes or no until the wells run dry!

Scoring the Explanation item:

Explain your answer.

The scoring for this portion of the explanation-certainty item set follows the generic rubric seen below. Basically, we’re assessing whether (and to what extent) the student is able to make scientific claims.

What’s a scientific claim?

Scientific claims are backed by evidence. The more links a student is able to make between the evidence and the argument, the higher on the scale s/he scores.

Screen shot 2013-09-19 at 12.13.35 PM

It’s helpful to look at a couple of examples to really understand how this works.

Here are some “student” responses to the explanation portion following the claim question about irrigation of fields.  (Note that I made these up to be illustrative; they are not actually from students.)

  • Student A: I don’t know.
  • Student B: The well could supply irrigation easily for many years.
  • Student C: The farmer might be drilling into a confined aquifer so the well wouldn’t last forever.
  • Student D: After the water is used, it will sink back into the ground and be ready to pump up again.
  • Student E: If the well is pumping from a confined aquifer, it won’t be recharged by precipitation. That means that the well won’t last forever.
  • Student F: If the farmer had a limited amount of crops to irrigate, and the well was drilled into an unconfined aquifer so that it could be recharged by the rainfall, then the well might last forever. But if the well went into a confined aquifer, it would eventually run out.
  • Student G: If the farmer drilled into an unconfined aquifer, the well might last forever. But that depends on how much water is being pumped out vs. how much can be recharged by precipitation. If the sediments above the aquifer are very permeable, then the aquifer will recharge quickly, but if they are not super-permeable, the aquifer will take some time to recharge, so it’s possible to pump the well dry, if only temporarily. If the farmer drills into a confined aquifer, the water might last a really long time (fi the aquifer is huge), but since it can’t be recharged because the sediments above it are impermeable, it would eventually run out of water.

How would you score these responses?

The first thing to think about is what the “best answer” looks like. Some of the sample answers are pretty good. But how do you distinguish between good, pretty good, and excellent?

The answer lies in the number of ideas in the answer and whether those ideas are linked. For instance, the main ideas to consider in the “best answer” are:

  • wells pump from aquifers
  • aquifers can be confined or unconfined
  • unconfined aquifers can be recharged by precipitation
  • confined aquifers are not recharged by precipitation
  • recharge happens more quickly when the sediments overlying the aquifer are more permeable (and more slowly when sediments are less permeable)
  • the amount of water in the aquifer is a limiting constraint (you can’t pump more than exists!)

Making links between these ideas is the key to a good scientific argument. The ideas for the “best answer” vary between explanation items, but the scoring idea is the same across all High-Adventure Science explanation items.

  • Score 0: no links, no scientific claim present
  • Score 1: no links (if a claim is present)
  • Score 2: any one idea
  • Score 3: one link between ideas
  • Score 4: two or more links between ideas

So, scoring the student responses:

  • Student A: This one is easy. The student did not make any claim or provide any evidence. This response scores a 0.
  • Student B: There’s a claim, but does it contain any key ideas? No, so this scores a 1.
  • Student C: This student brings up the idea of a confined aquifer. That’s one idea, so it’s a score of 2.
  • Student D: This student recognizes that water flows in a cycle and that water sprayed on the crops will percolate down through the soil. That’s only one of the main ideas, so this response also gets a score of 2.
  • Student E: This student makes the link between recharge and unconfined aquifers. This response scores a 3.
  • Student F: This student brings in three links: unconfined aquifers are recharged; confined aquifers are not recharged; and rainfall provides recharge. This response scores a 4.
  • Student G: This student brings in all of the ideas. There is a discussion of why unconfined aquifers are recharged by precipitation while confined aquifers are not. There is a discussion about how the permeability of sediments affects the rate of recharge. There is a discussion about the size of the aquifer. This response also scores a 4.

Scoring the Certainty Rating item:

How certain are you about your claim based on your explanation?

We use this as an indication of how strongly a student is confident in his/her argument. There are no right or wrong answers here.

Scoring the Certainty Rationale item:

Explain what influenced your certainty rating.

Like the Explanation item, the scoring for this item follows a rubric. Unlike the rubric for the Explanation item, however, this follows a rubric that’s very easy to generalize across all topic areas.

Screen shot 2013-09-19 at 12.19.38 PM

Students use many different reasons for their uncertainty, but they can be broadly categorized as personal, scientific within the investigation, and scientific beyond the investigation.

Personal reasons include:

  • My teacher told me.
  • I learned it from a science show.
  • I read it in a magazine.
  • I’m not really good at this topic.  (or conversely, I’m really good at this topic.)
  • I haven’t learned this yet. (or conversely, we learned this last year.)

Scientific within the investigation reasons include evidence from within the question or specific knowledge directly related to the question.

Scientific beyond the investigation reasons include:

  • questioning the data or evidence presented in the question
  • recognizing limitations in scientific knowledge about the topic
  • recognizing the inherent uncertainty in the phenomenon (in this case, the uncertainty of knowing the type of aquifer from which the well is pulling its water)

So there you have it – a nutshell view of how we score the certainty-explanation items in High-Adventure Science.

If you want to use parts of this rubric to score your students’ responses for your own grading, that’s great! Feel free to ask questions as they come up. The scoring is not always easy!  🙂

Measuring the effects of an intervention using computational process analytics

"At its core, scientific inquiry is the same in all fields. Scientific research, whether in education, physics, anthropology, molecular biology, or economics, is a continual process of rigorous reasoning supported by a dynamic interplay among methods, theories, and findings. It builds understanding in the form of models or theories that can be tested."  —— Scientific Research in Education, National Research Council, 2002
Actions caused by the intervention
Computational process analytics (CPA) is a research method that we are developing in the spirit of the above quote from the National Research Council report. It is a whole class of data mining methods for quantitatively studying the learning dynamics in complex scientific inquiry or engineering design projects that are digitally implemented. CPA views performance assessment as detecting signals from the noisy background often present in large learner datasets due to many uncontrollable and unpredictable factors in classrooms. It borrows many computational techniques from engineering fields such as signal processing and pattern recognition. Some of these analytics can be considered as the computational counterparts of traditional assessment methods based on student articulation, classroom observation, or video analysis.

Actions unaffected by the intervention
Computational process analytics has wide applications in education assessments. High-quality assessments of deep learning hold a critical key to improving learning and teaching. Their strategic importance has been highlighted in President Obama’s remarks in March 2009: “I am calling on our nation’s Governors and state education chiefs to develop standards and assessments that don’t simply measure whether students can fill in a bubble on a test, but whether they possess 21st century skills like problem-solving and critical thinking, entrepreneurship, and creativity.” However, the kinds of assessments the President wished for often require careful human scoring that is far more expensive to administer than multiple-choice tests. Computer-based assessments, which rely on the learning software to automatically collect and sift learner data through unobtrusive logging, are viewed as a promising solution to assessing increasingly prevalent digital learning.

While there have been a lot of work on computer-based assessments for STEM education, one foundational question has rarely been explored: How sensitive can the logged learner data be to instructions?

Actions caused by the intervention.
According to the assessment guru Popham, there are two main categories of evidence for determining the instructional sensitivity of an assessment tool: judgmental evidence and empirical evidence. Computer logs provide empirical evidence based on user data recording—the logs themselves provide empirical data for assessment and their differentials before and after instructions provide empirical data for evaluating the instructional sensitivity. Like any other assessment tools, computer logs must be instructionally sensitive if they are to provide reliable data sources for gauging student learning under intervention. 


Actions unaffected by the intervention.
Earlier studies have used CAD logs to capture the designer’s operational knowledge and reasoning processes. Those studies were not designed to understand the learning dynamics occurring within a CAD system and, therefore, did not need to assess students’ acquisition and application of knowledge and skills through CAD activities. Different from them, we are studying the instructional sensitivity of CAD logs, which describes how students react to interventions with CAD actions. Although interventions can be either carried out by human (such as teacher instruction or group discussion) or generated by the computer (such as adaptive feedback or intelligent tutoring), we have focused on human interventions in this phase of our research. Studying the instructional sensitivity to human interventions will enlighten the development of effective computer-generated interventions for teaching engineering design in the future (which is another reason, besides cost effectiveness, why research on automatic assessment using learning software logs is so promising).

The study of instructional effects on design behavior and performance is particularly important, viewing from the perspective of teaching science through engineering design, a practice now mandated by the newly established Next Generation Science Standards of the United States. A problem commonly observed in K-12 engineering projects, however, is that students often reduce engineering design challenges to construction or craft activities that may not truly involve the application of science. This suggests that other driving forces acting
Distribution of intervention effect across 65 students.
on learners, such as hunches and desires for how the design artifacts should look, may overwhelm the effects of instructions on how to use science in design work. Hence, the research on the sensitivity of design behavior to science instruction requires careful analyses using innovative data analytics such as CPA to detect the changes, however slight they might be. The insights obtained from studying this instructional sensitivity may result in the actionable knowledge for developing effective instructions that can reproduce or amplify those changes.

Our preliminary CPA results have shown that CAD logs created using our Energy3D CAD tool are instructionally sensitive. The first four figures embedded in this post show two pairs of opposite cases with one type of action sensitive to an instruction that occurred outside the CAD tool and the other not. This is because the instruction was related to one type of action and had nothing to do with the other type. The last figure shows that the distribution of instructional sensitivity across 65 students. In this figure, the largest number means higher instructional sensitivity. A number close to one means that the instruction has no effect. From the graph, you can see that the three types of actions that are not related to the instruction fluctuate around one whereas the fourth type of action is strongly sensitive to the instruction.

These results demonstrate that software logs can not only record what students do with the software but also capture the effects of what happen outside the software.