Tag Archives: Machine Learning

Energy3D uses intelligent agents to create adaptive feedback based on analyzing the "DNA of design"

Fig. 1: A simple case of teaching thermal insulation.
Energy3D is a "smart" CAD tool because it can monitor the designer's behavior in real time, based on which it can generate feedback to the designer to regulate the design behavior. This capacity has tremendous implications to learning and teaching scientific inquiry and engineering design with open-ended nature that requires, ideally, one-to-one tutoring so intense that no teacher can easily provide in real classrooms.

The computational mechanism for generating feedback in Energy3D is based on intelligent agents, which consist of sensors and actuators (in very generic terms). In Energy3D, all the events are logged behind the scenes. The events provide the raw data stream from which various sensors produce signals based on subsets of the raw data. For instance, a sensor can be created to monitor any event related to solar panels of a house. An agent then uses a decision tree model to determine which actuators should be called to provide feedback to the user or direct Energy3D to change its state. For instance, if a solar panel is detected to be placed on the north-facing roof, the agent can remind the designer to rethink about the decision. Just like what a teacher may do, the agent can even suggest a comparative analysis between a solar panel on the north-facing roof and a solar panel on the south-facing or west-facing roof. Although this type of inquiry and design can be also taught using directly scaffolded instruction that guides students to explore step by step, in practice we have found the effect of this approach often diminishes because many students do not read instruction carefully enough and remember them long enough. It is also challenging for teachers to guide the whole class through this kind of long learning process as students often pace differently. Adaptive feedback provides a way to help students only when they need or just when a need is detected, thus providing a better chance to deliver effective instruction.

Let's look at a very simple example. Figure 1 shows a learning activity, the goal of which is to teach how the thermal property of a wall, called the U-value, affects the energy use of a house. Many students may walk away with a shallow understanding that the higher the U-value is, the more energy a house uses. The challenge is to help them deepen their understanding. For example, how can we make sure that students will collect enough data points to discover that the energy a house uses is proportional to the U-value? How can we support them to find out that the relationship is independent of seasonal change, wall orientation, and solar radiation (e.g., a lower U-value is good in both summer and winter, irrespective of whether or not the wall faces the sun). Helping students accomplish this level of understanding through inquiry-based activities is by no means a trivial task, even in this seemingly simple example. Let's explore what we may do in Energy3D now that we have a way to monitor students' interactions with it.

Fig. 2: An event sequence coded like a DNA sequence.
In nearly all software that support learning and teaching, the events during a process can be coded as a string with characters representing the events and ordered by their timestamps, such as Figure 2. In this case, A represents an analysis event in the Energy3D CAD tool, U represents an event of changing the U-value of a wall, C represents an event of changing the date for the energy simulation, a questionmark (?) represents an event of requesting help from the software, an underscore (_) represents an inactive time period longer than a certain threshold, and * is a wildcard that represents any other event "silenced" in this expression in order to reduce the dimensionality of the problem. For those who know a bit about bioinformatics, this resembles a DNA sequence. In the context of Energy3D, we may also call it as the DNA of a design, if that helps your imagination.

Now that we have converted the sequence of events into a string, we can use all sorts of techniques that have been developed to analyze strings to analyze these events, including those developed in bioinformatics such as sequence alignment or those developed in natural language processing. In this article, I am going to show how the widely-supported regular expressions (regex) can be used as a technique to detect whether a certain type of event or a certain combination of events occurred or how many times it occurred. I feel that regex, in our case, may be more accurate than edit distances such as the Levenshtein distance in matching the pattern. For example, a single substitution of event may represent a very different process despite the short edit distance.

Fig. 3: A sequence that shows high usage of feedback
We know that, a fundamental skill of inquiry is to keep everything else fixed but change only one variable at a time and then test how the system's output depends on that variable. Through this process of inquiry, we learn the meaning of that variable, as explained by Bruce Alberts, former president of the National Academy of Sciences and former Editor-in-Chief of the Science Magazine. In the example discussed here, that variable is the U-value of a selected wall of the house and the test is the simulation-based analysis. A pattern that has alternating U and A characters in the event string suggests a high probability of inquiry, which can be captured using a simple regex such as (U[_\\*\\?]*A)+. Between U and A, however, there may be other types of events that may or may not exist to weaken the probability or compromise the rigor. For example, changing the color of the wall between U and A may also result in an additional difference in energy use of the house that originates from the absorption of solar radiation by the external surface of the wall and has nothing to do with its U-value. In this case, changing multiple variables at a time appears to be a violation of the aforementioned inquiry principle that should be called out by the agent using another regex to analyze the substring between U and A.

An interesting feature in Energy3D is that feedback itself is also logged. Figure 3 shows a sequence that has an alternation pattern similar to that of Figure 2, but it records a type of behavior showing that the user may rely overly on feedback from the system to learn (the questionmarks in the string stand for feedback requests made by the user) and avoid deep thinking on their own. This may be a common problem in many intelligent tutors (sometimes this behavior is called "gaming the system").

The development of data mining and intelligent agents in Energy3D is opening interesting opportunities of research that will only grow more important in the era of artificial intelligence (AI). We are excited to be part of this wave of AI innovation.

Artificial intelligence research for engineering design

Have you ever thought about what a pity it is when a senior engineer with 40 years of problem-solving experience retires? Have you ever thought about what a loss it is when a senior teacher with 40 years of teaching experience retires? Imagine what we could do for humanity if we find a way to somehow preserve their experience, expertise, and intelligence automatically before these incredible treasures are taken to the graveyard...

Heat map visualizations of different patterns of design task transition
Funded by the National Science Foundation, I have been working on the research and development of artificial intelligence (AI) for engineering design for a number of years and have been developing the Visual Process Analytics for visualizing and analyzing engineering design process data. This exciting intersection among AI (basically everything about how intelligence can be realized), engineering (basically a generative and creative discipline), and cognitive science (basically everything about how humans acquire intelligence) is full of tremendous challenges, but it also creates unprecedented opportunities that constantly entice and enlighten me.

I have recently written a short article to explain my research to the lay people (mostly educators, but the implications are not limited only to education). Check it out at http://energy.concord.org/~xie/papers/aired.pdf

What’s new in Visual Process Analytics Version 0.3


Visual Process Analytics (VPA) is a data mining platform that supports research on student learning through using complex tools to solve complex problems. The complexity of this kind of learning activities of students entails complex process data (e.g., event log) that cannot be easily analyzed. This difficulty calls for data visualization that can at least give researchers a glimpse of the data before they can actually conduct in-depth analyses. To this end, the VPA platform provides many different types of visualization that represent many different aspects of complex processes. These graphic representations should help researchers develop some sort of intuition. We believe VPA is an essential tool for data-intensive research, which will only grow more important in the future as data mining, machine learning, and artificial intelligence play critical roles in effective, personalized education.

Several new features were added to Version 0.3, described as follows:

1) Interactions are provided through context menus. Context menus can be invoked by right-clicking on a visualization. Depending on where the user clicks, a context menu provides the available actions applicable to the selected objects. This allows a complex tool such as VPA to still have a simple, pleasant user interface.

2) Result collectors allow users to gather analysis results and export them in the CSV format. VPA is a data browser that allows users to navigate in the ocean of data from the repositories it connects to. Each step of navigation invokes some calculations behind the scenes. To collect the results of these calculations in a mining session, VPA now has a simple result collector that automatically keeps track of the user's work. A more sophisticated result manager is also being conceptualized and developed to make it possible for users to manage their data mining results in a more flexible way. These results can be exported if needed to be analyzed further using other software tools.

3) Cumulative data graphs are available to render a more dramatic view of time series. It is sometimes easier to spot patterns and trends in cumulative graphs. This cumulative analysis applies to all levels of granularity of data supported by VPA (currently, the three granular levels are Top, Medium, and Fine, corresponding to three different ways to categorize action data). VPA also provides a way for users to select variables from a list to be highlighted in cumulative graphs.

Many other new features were also added in this version. For example, additional information about classes and students are provided to contextualize each data set. In the coming weeks, the repository will incorporate data from more than 1,200 students in Indiana who have undertaken engineering design projects using our Energy3D software. This unprecedented large-scale database will potentially provide a goldmine of research data in the area of engineering design study.

For more information about VPA, see my AERA 2016 presentation.

Visual Process Analytics (VPA) launched


Visual Process Analytics (VPA) is an online analytical processing (OLAP) program that we are developing for visualizing and analyzing student learning from complex, fine-grained process data collected by interactive learning software such as computer-aided design tools. We envision a future in which every classroom would be powered by informatics and infographics such as VPA to support day-to-day learning and teaching at a highly responsive level. In a future when every business person relies on visual analytics every day to stay in business, it would be a shame that teachers still have to read through tons of paper-based work from students to make instructional decisions. The research we are conducting with the support of the National Science Foundation is paving the road to a future that would provide the fair support for our educational systems that is somehow equivalent to business analytics and intelligence.

This is the mission of VPA. Today we are announcing the launch of this cyberinfrastructure. We decided that its first version number should be 0.1. This is just a way to indicate that the research and development on this software system will continue as a very long-term effort and what we have done is a very small step towards a very ambitious goal.


VPA is written in plain JavaScript/HTML/CSS. It should run within most browsers -- best on Chrome and Firefox -- but it looks and works like a typical desktop app. This means that while you are in the middle of mining the data, you can save what we call "the perspective" as a file onto your disk (or in the cloud) so that you can keep track of what you have done. Later, you can load the perspective back into VPA. Each perspective opens the datasets that you have worked on, with your latest settings and results. So if you are half way through your data mining, your work can be saved for further analyses.

So far Version 0.1 has seven analysis and visualization tools, each of which shows a unique aspect of the learning process with a unique type of interactive visualization. We admit that, compared with the daunting high dimension of complex learning, this is a tiny collection. But we will be adding more and more tools as we go. At this point, only one repository -- our own Energy3D process data -- is connected to VPA. But we expect to add more repositories in the future. Meanwhile, more computational tools will be added to support in-depth analyses of the data. This will require a tremendous effort in designing a smart user interface to support various computational tasks that researchers may be interested in defining.

Eventually, we hope that VPA will grow into a versatile platform of data analytics for cutting-edge educational research. As such, VPA represents a critically important step towards marrying learning science with data science and computational science.

Learning analytics is the "crystallography" for educational research

To celebrate 100 years of dazzling history of crystallography, the year of 2014 has been declared by UNESCO as the International Year of Crystallography. To this date, 29 Nobel Prizes have been awarded to scientific achievements related to crystallography. On March 7th, the Science Magazine honored crystallographers with a special issue.

Why is crystallography such a big deal? Because it enables scientists to "see" atoms and molecules and discover the molecular structures of substances. One of the most famous examples is the discovery of the DNA helix by Rosalind Franklin in 1952, followed by Crick, Watson, and Wilkins' double helix model. Enough ink has been spilled on the importance of this discovery.

Science fundamentally relies on techniques such as crystallography for detecting and visualizing invisible things. Educational research needs this kind of techniques, too, to decode students' minds that are opaque to researchers. Up to this point, educational researchers depend on methods such as pre/post-tests, observations, and interviews. But these traditional methods are either insufficient or inefficient for measuring learning in complex processes such as scientific inquiry and engineering design. To achieve a level of truly "no child left behind," we will need to develop a research technique that can monitor every student for every minute in the classroom.

Such a technique has to be based on an integrated informatics system that can engage students with meaningful learning tasks, tease out what are in their minds, and capture every bit of information that may be indicative of learning. This involves development in all areas of learning sciences, including technology, curriculum, pedagogy, and assessment. Eventually, what we have is a comprehensive set of data through which we will sift to find patterns of learning or evaluate the effectiveness of an intervention.

The whole process is not unlike crystallography. At the end, it is the learning analytics that concludes the research. Today we are seeing a lot of learner data, but we probably have no idea what they actually mean. We can either say there is no significance in those data and shrug off, or we can try to figure out the right kind of data analytics to decipher them. Which attitude to choose probably depends on which universe we live in. But the history of crystallography can give us a clue. It was Max von Laue who created the first X-ray diffraction pattern in 1912. He couldn't interpret it, however. It wasn't until William Henry Bragg and William Lawrence Bragg's groundbreaking work later in the same year that scientists became able to infer molecular structures from those patterns. In educational research, the equivalent of this is the learning analytics -- a critical piece that will give data meaning.

For more information, read my new article "Visualizing Student Learning."