Category Archives: Molecular Workbench

Energy3D makes designing realistic buildings easy

The annual yield and cost benefit analyses of rooftop solar panels based on sound scientific and engineering principles are critical steps to the financial success of building solarization. Google's Project Sunroof provides a way for millions of property owners to get recommendations for the right solar solutions.



Another way to conduct accurate scientific analysis of solar panel outputs based on their layout on the rooftop is to use a computer-aided engineering (CAE) tool to do a three-dimensional, full-year analysis based on ab initio scientific simulation. Under the support of the National Science Foundation since 2010, we have been developing Energy3D, a piece of CAE software that has the goal of bringing the power of sophisticated scientific and engineering simulations to children and laypersons. To achieve this goal, a key step is to support users to rapidly sketch up their own buildings and the surrounding objects that may affect their solar potentials. We feel that most CAD tools out there are probably too difficult for average users to create realistic models of their own houses. This forces us to invent new solutions.

We have recently added countless new features to Energy3D to progress towards this goal. The latest version allows many common architectural styles found in most parts of the US to be created and their solar potential to be studied. The screenshots embedded in this article demonstrate this capability. With the current version, each of these designs took myself approximately an hour to create from scratch. But we will continue to push the limit.

The 3D construction user interface has been developed based on the tenet of supporting users to create any structure using a minimum set of building blocks and operations. Once users master a relatively small set of rules, they are empowered to create almost any shape of building as they wish.

Solar yield analysis of the first house
The actual time-consuming part is to get the right dimension and orientation of a real building and the surrounding tall objects such as trees.
Google's 3D map may provide a way to extract these data. Once the approximate geometry of a building is determined, users can easily put solar panels anywhere on the roof to check out their energy yield. They can then try as many different layouts as they wish to compare the yields and select an optimal layout. This is especially important for buildings that may have partial shades and sub-optimal orientations. CAE tools such as Energy3D can be used to do spatial and temporal analysis and report daily outputs of each panel in the array, allowing users to obtain fine-grained, detailed results and thus providing a good simulation of solar panels in day-to-day operation.

The engineering principles behind this solar design, assessment, and optimization process based on science is exactly what the Next Generation Science Standards require K-12 students in the US to learn and practice. So why not ask children for help to solarize their own homes, schools, and communities, at least virtually? The time for doing this can never be better. And we have paved the road for this vision by creating one of easiest 3D interfaces with compelling scientific visualizations that can potentially entice and engage a lot of students. It is time for us to test the idea.

To see more designs, visit this page.

Personal thermal vision could turn millions of students into the cleantech workforce of today

So we have signed the Paris Agreement and cheered about it. Now what?

More than a year ago, I wrote a proposal to the National Science Foundation to test the feasibility of empowering students to help combat the energy issues of our nation. There are hundreds of millions of buildings in our country and some of them are pretty big energy losers. The home energy industry currently employs probably 100,000 people at most. It would take them a few decades to weatherize and solarize all these residential and commercial buildings (let alone educating home owners so that they would take such actions).

But there are millions of students in schools who are probably more likely to be concerned about the world that they are about to inherit. Why not ask them to help?

You probably know a lot of projects on this very same mission. But I want to do something different. Enough messaging has been done. We don't need to hand out more brochures and flyers about the environmental issues that we may be facing. It is time to call for actions!

For a number of years, I have been working on infrared thermography and building energy simulation to knock down the technical barriers that these techniques may pose to children. With NSF awarding us a $1.2M grant last year and FLIR releasing a series of inexpensive thermal cameras, the time of bringing these tools to large-scale applications in schools has finally arrived.

For more information, see our poster that will be presented at a NSF meeting next week. Note that this project has just begun so we haven't had a chance to test the solarization part. But the results from the weatherization part based on infrared thermography has been extremely encouraging!

Listen to the data with the Visual Process Analytics


Visual analytics provides a powerful way for people to see patterns and trends in data by visualizing them. In real life, we use both our eyes and ears. So can we hear patterns and trends if we listen to the data?

I spent a few days studying the JavaScript Sound API and adding simple data sonification to our Visual Process Analytics (VPA) to explore this question. I don't know where including the auditory sense to the analytics toolkit may lead us, but you never know. It is always good to experiment with various ideas.


Note that the data sonification capabilities of VPA is very experimental at this point. To make the matter worse, I am not a musician by any stretch of the imagination. So the generated sounds in the latest version of VPA may sound horrible to you. But this represents a step forward to better interactions with complex learner data. As my knowledge about music improves, the data should sound less terrifying.

The first test feature added to VPA is very simple: It just converts a time series into a sequence of notes and rests. To adjust the sound, you can change a number of parameters such as pitch, duration, attack, decay, and oscillator types (sine, square, triangle, sawtooth, etc.). All these options are available through the context menu of a time series graph.

At the same time as the sound plays, you can also see a synchronized animation of VPA (as demonstrated by the embedded videos). This means that from now on VPA is a multimodal analytic tool. But I have no plan to rename it as data visualization is still and will remain dominant for the data mining platform.

The next step is to figure out how to synthesize better sounds from multiple types of actions as multiple sources or instruments (much like the Song from Pi). I will start with sonifying the scatter plot in VPA. Stay tuned.

What’s new in Visual Process Analytics Version 0.3


Visual Process Analytics (VPA) is a data mining platform that supports research on student learning through using complex tools to solve complex problems. The complexity of this kind of learning activities of students entails complex process data (e.g., event log) that cannot be easily analyzed. This difficulty calls for data visualization that can at least give researchers a glimpse of the data before they can actually conduct in-depth analyses. To this end, the VPA platform provides many different types of visualization that represent many different aspects of complex processes. These graphic representations should help researchers develop some sort of intuition. We believe VPA is an essential tool for data-intensive research, which will only grow more important in the future as data mining, machine learning, and artificial intelligence play critical roles in effective, personalized education.

Several new features were added to Version 0.3, described as follows:

1) Interactions are provided through context menus. Context menus can be invoked by right-clicking on a visualization. Depending on where the user clicks, a context menu provides the available actions applicable to the selected objects. This allows a complex tool such as VPA to still have a simple, pleasant user interface.

2) Result collectors allow users to gather analysis results and export them in the CSV format. VPA is a data browser that allows users to navigate in the ocean of data from the repositories it connects to. Each step of navigation invokes some calculations behind the scenes. To collect the results of these calculations in a mining session, VPA now has a simple result collector that automatically keeps track of the user's work. A more sophisticated result manager is also being conceptualized and developed to make it possible for users to manage their data mining results in a more flexible way. These results can be exported if needed to be analyzed further using other software tools.

3) Cumulative data graphs are available to render a more dramatic view of time series. It is sometimes easier to spot patterns and trends in cumulative graphs. This cumulative analysis applies to all levels of granularity of data supported by VPA (currently, the three granular levels are Top, Medium, and Fine, corresponding to three different ways to categorize action data). VPA also provides a way for users to select variables from a list to be highlighted in cumulative graphs.

Many other new features were also added in this version. For example, additional information about classes and students are provided to contextualize each data set. In the coming weeks, the repository will incorporate data from more than 1,200 students in Indiana who have undertaken engineering design projects using our Energy3D software. This unprecedented large-scale database will potentially provide a goldmine of research data in the area of engineering design study.

For more information about VPA, see my AERA 2016 presentation.

Infrared imaging evidence of geothermal energy in a basement

Geothermal energy is the thermal energy generated or stored in the Earth. The ground maintains a nearly constant temperature six meter (20 feet) under, which is roughly equal to the average annual air temperature at the location. In Boston, this is about 13 °C (55 °F).

You can feel the effect of the geothermal energy in a basement, particularly in a hot summer day in which the basement can be significantly cooler. But IR imaging provides a unique visualization of this effect.

I happen to have a sub-basement that is partially buried in the ground. When I did an IR inspection of my basement in an attempt to identify places where heat escapes in a cold night, something that I did not expect struck me: As I scanned the basement, the whole basement floor appeared to be 4-6 °F warmer than the walls. Both the floor and wall of my basement are simply concrete -- there is no insulation, but the walls are partially or fully exposed to the outside air, which was about 24 °F at that time.

This temperature distribution pattern is opposite to the typical temperature gradient observed in a heated room where the top of a wall is usually a few degrees warmer than the bottom of a wall or the floor as hot air rises to warm up the upper part.

The only explanation of this warming of the basement floor is geothermal energy, caught by the IR camera.

Visualizing thermal equilibration: IR imaging vs. Energy2D simulation

Figure 1
A classic experiment to show thermal equilibration is to put a small Petri dish filled with some hot or cold water into a larger one filled with tap water around room temperature, as illustrated in Figure 1. Then stick one thermometer in the inner dish and another in the outer dish and take their readings over time.

With a low-cost IR camera like the FLIR C2 camera or FLIR ONE camera, this experiment becomes much more visual (Figure 2). As an IR camera provides a full-field view of the experiment in real time, you get much richer information about the process than a graph of two converging curves from the temperature data read from the two thermometers.
Figure 2

The complete equilibration process typically takes 10-30 minutes, depending on the initial temperature difference between the water in the two dishes and the amount of water in the inner dish. A larger temperature difference or a larger amount of water in the inner dish will require more time to reach the thermal equilibrium.

Another way to quickly show this process is to use our Energy2D software to create a computer simulation (Figure 3). Such a simulation provides a visualization that resembles the IR imaging result. The advantage is that it runs very fast -- only 10 seconds or so are needed to reach the thermal equilibrium. This allows you to test various conditions rapidly, e.g., changing the initial temperature of the water in the inner dish or the outer dish or changing the diameters of the dishes.

Figure 3
Both real-world experiments and computer simulations have their own pros and cons. Exactly which one to use depends on your situation. As a scientist, I believe nothing beats real-world experiments in supporting authentic science learning and we should always favor them whenever possible. However, conducting real-world experiments requires a lot of time and resources, which makes it impractical to implement throughout a course. Computer simulations provide an alternative solution that allows students to get a sense of real-world experiments without entailing the time and cost. But the downside is that a computer simulation, most of the time, is an overly simplified scientific model that does not have the many layers of complexity and the many types of interactions that we experience in reality. In a real-world experiment, there are always unexpected factors and details that need to be attended to. It is these unexpected factors and details that create genuinely profound and exciting teachable moments. This important nature of science is severely missing in computer simulations, even with a sophisticated computational fluid dynamics tool such as Energy2D.

Here is my balancing of this trade-off equation: It is essential for students to learn simplified scientific models before they can explore complex real-world situations. The models will give students the frameworks needed to make sense of real-world observation. A fair strategy is to use simulations to teach simplified models and then make some time for students to conduct experiments in the real world and learn how to integrate and apply their knowledge about the models to solve real problems.

A side note: You may be wondering how well the Energy2D result agrees with the IR result on a quantitative basis. This is kind of an important question -- If the simulation is not a good approximation of the real-world process, it is not a good simulation and one may challenge its usefulness, even for learning purposes. Figure 4 shows a comparison of a test run. As you can see, the while the result predicted by Energy2D agrees in trend with the results observed through IR imaging, there are some details in the real data that may be caused by either human errors in taking the data or thermal fluctuations in the room. What is more, after the thermal equilibrium was reached, the water in both dishes continued to cool down to room temperature and then below due to evaporative cooling. The cooling to room temperature was modeled in the Energy2D simulation through a thermal coupling to the environment but evaporative cooling was not.

Figure 4

Time series analysis tools in Visual Process Analytics: Cross correlation

Two time series and their cross-correlation functions
In a previous post, I showed you what autocorrelation function (ACF) is and how it can be used to detect temporal patterns in student data. The ACF is the correlation of a signal with itself. We are certainly interested in exploring the correlations among different signals.

The cross-correlation function (CCF) is a measure of similarity of two time series as a function of the lag of one relative to the other. The CCF can be imagined as a procedure of overlaying two series printed on transparency films and sliding them horizontally to find possible correlations. For this reason, it is also known as a "sliding dot product."

The upper graph in the figure to the right shows two time series from a student's engineering design process, representing about 45 minutes of her construction (white line) and analysis (green line) activities while trying to design an energy-efficient house with the goal to cut down the net energy consumption to zero. At first glance, you probably have no clue about what these lines represent and how they may be related.

But their CCFs reveal something that appears to be more outstanding. The lower graph shows two curves that peak at some points. I know you have a lot of questions at this point. Let me try to see if I can provide more explanations below.

Why are there two curves for depicting the correlation of two time series, say, A and B? This is because there is a difference between "A relative to B" and "B relative to A." Imagine that you print the series on two transparency films and slide one on top of the other. Which one is on the top matters. If you are looking for cause-effect relationships using the CCF, you can treat the antecedent time series as the cause and the subsequent time series as the effect.

What does a peak in the CCF mean, anyways? It guides you to where more interesting things may lie. In the figure of this post, the construction activities of this particular student were significantly followed by analysis activities about four times (two of them are within 10 minutes), but the analysis activities were significantly followed by construction activities only once (after 10 minutes).

Time series analysis tools in Visual Process Analytics: Autocorrelation

Autocorrelation reveals a three-minute periodicity
Digital learning tools such as computer games and CAD software emit a lot of temporal data about what students do when they are deeply engaged in the learning tools. Analyzing these data may shed light on whether students learned, what they learned, and how they learned. In many cases, however, these data look so messy that many people are skeptical about their meaning. As optimists, we believe that there are likely learning signals buried in these noisy data. We just need to use or invent some mathematical tricks to figure them out.

In Version 0.2 of our Visual Process Analytics (VPA), I added a few techniques that can be used to do time series analysis so that researchers can find ways to characterize a learning process from different perspectives. Before I show you these visual analysis tools, be aware that the purpose of these tools is to reveal the temporal trends of a given process so that we can better describe the behavior of the student at that time. Whether these traits are "good" or "bad" for learning likely depends on the context, which often necessitates the analysis of other co-variables.

Correlograms reveal similarity of two time series.
The first tool for time series analysis added to VPA is the autocorrelation function (ACF), a mathematical tool for finding repeating patterns obscured by noise in the data. The shape of the ACF graph, called the correlogram, is often more revealing than just looking at the shape of the raw time series graph. In the extreme case when the process is completely random (i.e., white noise), the ACF will be a Dirac delta function that peaks at zero time lag. In the extreme case when the process is completely sinusoidal, the ACF will be similar to a damped oscillatory cosine wave with a vanishing tail.

An interesting question relevant to learning science is whether the process is autoregressive (or under what conditions the process can be autoregressive). The quality of being autoregressive means that the current value of a variable is influenced by its previous values. This could be used to evaluate whether the student learned from the past experience -- in the case of engineering design, whether the student's design action was informed by previous actions. Learning becomes more predictable if the process is autoregressive (just to be careful, note that I am not saying that more predictable learning is necessarily better learning). Different autoregression models, denoted as AR(n) with n indicating the memory length, may be characterized by their ACFs. For example, the ACF of AR(2) decays more slowly than that of AR(1), as AR(2) depends on more previous points. (In practice, partial autocorrelation function, or PACF, is often used to detect the order of an AR model.)

The two figures in this post show that the ACF in action within VPA, revealing temporal periodicity and similarity in students' action data that are otherwise obscure. The upper graphs of the figures plot the original time series for comparison.

Visual Process Analytics (VPA) launched


Visual Process Analytics (VPA) is an online analytical processing (OLAP) program that we are developing for visualizing and analyzing student learning from complex, fine-grained process data collected by interactive learning software such as computer-aided design tools. We envision a future in which every classroom would be powered by informatics and infographics such as VPA to support day-to-day learning and teaching at a highly responsive level. In a future when every business person relies on visual analytics every day to stay in business, it would be a shame that teachers still have to read through tons of paper-based work from students to make instructional decisions. The research we are conducting with the support of the National Science Foundation is paving the road to a future that would provide the fair support for our educational systems that is somehow equivalent to business analytics and intelligence.

This is the mission of VPA. Today we are announcing the launch of this cyberinfrastructure. We decided that its first version number should be 0.1. This is just a way to indicate that the research and development on this software system will continue as a very long-term effort and what we have done is a very small step towards a very ambitious goal.


VPA is written in plain JavaScript/HTML/CSS. It should run within most browsers -- best on Chrome and Firefox -- but it looks and works like a typical desktop app. This means that while you are in the middle of mining the data, you can save what we call "the perspective" as a file onto your disk (or in the cloud) so that you can keep track of what you have done. Later, you can load the perspective back into VPA. Each perspective opens the datasets that you have worked on, with your latest settings and results. So if you are half way through your data mining, your work can be saved for further analyses.

So far Version 0.1 has seven analysis and visualization tools, each of which shows a unique aspect of the learning process with a unique type of interactive visualization. We admit that, compared with the daunting high dimension of complex learning, this is a tiny collection. But we will be adding more and more tools as we go. At this point, only one repository -- our own Energy3D process data -- is connected to VPA. But we expect to add more repositories in the future. Meanwhile, more computational tools will be added to support in-depth analyses of the data. This will require a tremendous effort in designing a smart user interface to support various computational tasks that researchers may be interested in defining.

Eventually, we hope that VPA will grow into a versatile platform of data analytics for cutting-edge educational research. As such, VPA represents a critically important step towards marrying learning science with data science and computational science.

The National Science Foundation funds large-scale applications of infrared cameras in schools


We are pleased to announce that the National Science Foundation has awarded the Concord Consortium, Next Step Living, and Virtual High School a grant of $1.2M to put innovative technologies such as infrared cameras into the hands of thousands of secondary students. This education-industry collaborative will create a technology-enhanced learning pathway from school to home and then to cognate careers, establishing thereby a data-rich testbed for developing and evaluating strategies for translating innovative technology experiences into consistent science learning and career awareness in different settings. While there have been studies on connecting science to everyday life or situating learning in professional scenarios to increase the relevance or authenticity of learning, the strategies of using industry-grade technologies to strengthen these connections have rarely been explored. In many cases, often due to the lack of experiences, resources, and curricular supports, industry technologies are simply used as showcases or demonstrations to give students a glimpse of how professionals use them to solve problems in the workplace.


Over the last few years, however, quite a number of industry technologies have become widely accessible to schools. For example, Autodesk has announced that their software products will be freely available to all students and teachers around the world. Another example is infrared cameras that I have been experimenting and blogging since 2010. Due to the continuous development of electronics and optics, what used to be a very expensive scientific instrument is now only a few hundred dollars, with the most affordable infrared camera falling below $200.

The funded project, called Next Step Learning, will be the largest-scale application of infrared camera in secondary schools -- in terms of the number of students that will be involved in the three-year project. We estimate that dozens of schools and thousands of students in Massachusetts will participate in this project. These students will use infrared cameras provided by the project to thermally inspect their own homes. The images in this blog post are some of the curious images I took in my own house using the FLIR ONE camera that is attached to an iPhone.

In the broader context, the Next Generation Science Standards (NGSS) envisions “three-dimensional learning” in which the learning of disciplinary core ideas and crosscutting concepts is integrated with science and engineering practices. A goal of the NGSS is to make science education more closely resemble the way scientists and engineers actually think and work. To accomplish this goal, an abundance of opportunities for students to practice science and engineering through solving authentic real-world problems will need to be created and researched. If these learning opportunities are meaningfully connected to current industry practices using industry-grade technologies, they can also increase students’ awareness of cognate careers, help them construct professional identities, and prepare them with knowledge and skills needed by employers, attaining thereby the goals of both science education and workforce development simultaneously. The Next Step Learning project will explore, test, and evaluate this strategy.