Category Archives: Molecular Workbench

Time series analysis tools in Visual Process Analytics: Autocorrelation

Autocorrelation reveals a three-minute periodicity
Digital learning tools such as computer games and CAD software emit a lot of temporal data about what students do when they are deeply engaged in the learning tools. Analyzing these data may shed light on whether students learned, what they learned, and how they learned. In many cases, however, these data look so messy that many people are skeptical about their meaning. As optimists, we believe that there are likely learning signals buried in these noisy data. We just need to use or invent some mathematical tricks to figure them out.

In Version 0.2 of our Visual Process Analytics (VPA), I added a few techniques that can be used to do time series analysis so that researchers can find ways to characterize a learning process from different perspectives. Before I show you these visual analysis tools, be aware that the purpose of these tools is to reveal the temporal trends of a given process so that we can better describe the behavior of the student at that time. Whether these traits are "good" or "bad" for learning likely depends on the context, which often necessitates the analysis of other co-variables.

Correlograms reveal similarity of two time series.
The first tool for time series analysis added to VPA is the autocorrelation function (ACF), a mathematical tool for finding repeating patterns obscured by noise in the data. The shape of the ACF graph, called the correlogram, is often more revealing than just looking at the shape of the raw time series graph. In the extreme case when the process is completely random (i.e., white noise), the ACF will be a Dirac delta function that peaks at zero time lag. In the extreme case when the process is completely sinusoidal, the ACF will be similar to a damped oscillatory cosine wave with a vanishing tail.

An interesting question relevant to learning science is whether the process is autoregressive (or under what conditions the process can be autoregressive). The quality of being autoregressive means that the current value of a variable is influenced by its previous values. This could be used to evaluate whether the student learned from the past experience -- in the case of engineering design, whether the student's design action was informed by previous actions. Learning becomes more predictable if the process is autoregressive (just to be careful, note that I am not saying that more predictable learning is necessarily better learning). Different autoregression models, denoted as AR(n) with n indicating the memory length, may be characterized by their ACFs. For example, the ACF of AR(2) decays more slowly than that of AR(1), as AR(2) depends on more previous points. (In practice, partial autocorrelation function, or PACF, is often used to detect the order of an AR model.)

The two figures in this post show that the ACF in action within VPA, revealing temporal periodicity and similarity in students' action data that are otherwise obscure. The upper graphs of the figures plot the original time series for comparison.

Visual Process Analytics (VPA) launched


Visual Process Analytics (VPA) is an online analytical processing (OLAP) program that we are developing for visualizing and analyzing student learning from complex, fine-grained process data collected by interactive learning software such as computer-aided design tools. We envision a future in which every classroom would be powered by informatics and infographics such as VPA to support day-to-day learning and teaching at a highly responsive level. In a future when every business person relies on visual analytics every day to stay in business, it would be a shame that teachers still have to read through tons of paper-based work from students to make instructional decisions. The research we are conducting with the support of the National Science Foundation is paving the road to a future that would provide the fair support for our educational systems that is somehow equivalent to business analytics and intelligence.

This is the mission of VPA. Today we are announcing the launch of this cyberinfrastructure. We decided that its first version number should be 0.1. This is just a way to indicate that the research and development on this software system will continue as a very long-term effort and what we have done is a very small step towards a very ambitious goal.


VPA is written in plain JavaScript/HTML/CSS. It should run within most browsers -- best on Chrome and Firefox -- but it looks and works like a typical desktop app. This means that while you are in the middle of mining the data, you can save what we call "the perspective" as a file onto your disk (or in the cloud) so that you can keep track of what you have done. Later, you can load the perspective back into VPA. Each perspective opens the datasets that you have worked on, with your latest settings and results. So if you are half way through your data mining, your work can be saved for further analyses.

So far Version 0.1 has seven analysis and visualization tools, each of which shows a unique aspect of the learning process with a unique type of interactive visualization. We admit that, compared with the daunting high dimension of complex learning, this is a tiny collection. But we will be adding more and more tools as we go. At this point, only one repository -- our own Energy3D process data -- is connected to VPA. But we expect to add more repositories in the future. Meanwhile, more computational tools will be added to support in-depth analyses of the data. This will require a tremendous effort in designing a smart user interface to support various computational tasks that researchers may be interested in defining.

Eventually, we hope that VPA will grow into a versatile platform of data analytics for cutting-edge educational research. As such, VPA represents a critically important step towards marrying learning science with data science and computational science.

The National Science Foundation funds large-scale applications of infrared cameras in schools


We are pleased to announce that the National Science Foundation has awarded the Concord Consortium, Next Step Living, and Virtual High School a grant of $1.2M to put innovative technologies such as infrared cameras into the hands of thousands of secondary students. This education-industry collaborative will create a technology-enhanced learning pathway from school to home and then to cognate careers, establishing thereby a data-rich testbed for developing and evaluating strategies for translating innovative technology experiences into consistent science learning and career awareness in different settings. While there have been studies on connecting science to everyday life or situating learning in professional scenarios to increase the relevance or authenticity of learning, the strategies of using industry-grade technologies to strengthen these connections have rarely been explored. In many cases, often due to the lack of experiences, resources, and curricular supports, industry technologies are simply used as showcases or demonstrations to give students a glimpse of how professionals use them to solve problems in the workplace.


Over the last few years, however, quite a number of industry technologies have become widely accessible to schools. For example, Autodesk has announced that their software products will be freely available to all students and teachers around the world. Another example is infrared cameras that I have been experimenting and blogging since 2010. Due to the continuous development of electronics and optics, what used to be a very expensive scientific instrument is now only a few hundred dollars, with the most affordable infrared camera falling below $200.

The funded project, called Next Step Learning, will be the largest-scale application of infrared camera in secondary schools -- in terms of the number of students that will be involved in the three-year project. We estimate that dozens of schools and thousands of students in Massachusetts will participate in this project. These students will use infrared cameras provided by the project to thermally inspect their own homes. The images in this blog post are some of the curious images I took in my own house using the FLIR ONE camera that is attached to an iPhone.

In the broader context, the Next Generation Science Standards (NGSS) envisions “three-dimensional learning” in which the learning of disciplinary core ideas and crosscutting concepts is integrated with science and engineering practices. A goal of the NGSS is to make science education more closely resemble the way scientists and engineers actually think and work. To accomplish this goal, an abundance of opportunities for students to practice science and engineering through solving authentic real-world problems will need to be created and researched. If these learning opportunities are meaningfully connected to current industry practices using industry-grade technologies, they can also increase students’ awareness of cognate careers, help them construct professional identities, and prepare them with knowledge and skills needed by employers, attaining thereby the goals of both science education and workforce development simultaneously. The Next Step Learning project will explore, test, and evaluate this strategy.

Seeing student learning with visual analytics

Technology allows us to record almost everything happening in the classroom. The fact that students' interactions with learning environments can be logged in every detail raises the interesting question about whether or not there is any significant meaning and value in those data and how we can make use of them to help students and teachers, as pointed out in a report sponsored by the U.S. Department of Education:
New technologies thus bring the potential of transforming education from a data-poor to a data-rich enterprise. Yet while an abundance of data is an advantage, it is not a solution. Data do not interpret themselves and are often confusing — but data can provide evidence for making sound decisions when thoughtfully analyzed.” — Expanding Evidence Approaches for Learning in a Digital World, Office of Educational Technology, U.S. Department of Education, 2013
A radar chart of design space exploration.
A histogram of action intensity.
Here we are not talking about just analyzing students' answers to some multiple-choice questions, or their scores in quizzes and tests, or their frequencies of logging into a learning management system. We are talking about something much more fundamental, something that runs deep in cognition and learning, such as how students conduct a scientific experiment, solve a problem, or design a product. As learning goes deeper in those directions, data produced by students grows bigger. It is by no means an easy task to analyze large volumes of learner data, which contain a lot of noisy elements that cast uncertainty to assessment. The validity of an assessment inference rests on  the strength of evidence. Evidence construction often relies on the search for relations, patterns, and trends in student data.With a lot of data, this mandates some sophisticated computation similar to cognitive computing.

Data gathered from highly open-ended inquiry and design activities, key to authentic science and engineering practices that we want students to learn, are often intensive and “messy.” Without analytic tools that can discern systematic learning from random walk, what is provided to researchers and teachers is nothing but a DRIP (“data rich, information poor”) problem.

A scatter plot of action timeline.
Recognizing the difficulty in analyzing the sheer volume of messy student data, we turned to visual analytics, a whole category of techniques extensively used in cutting-edge business intelligence systems such as software developed by SAS, IBM, and others. We see interactive, visual process analytics key to accelerating the analysis procedures so that researchers can adjust mining rules easily, view results rapidly, and identify patterns clearly. This kind of visual analytics optimally combines the computational power of the computer, the graphical user interface of the software, and the pattern recognition power of the brain to support complex data analyses in data-intensive educational research.

A digraph of action transition.
So far, I have written four interactive graphs and charts that can be used to study four different aspects of the design action data that we collected from our Energy3D CAD software. Recording several weeks of student work on complex engineering design challenges, these datasets are high-dimensional, meaning that it is improper to treat them from a single point of view. For each question we are interested in getting answers from student data, we usually need a different representation to capture the outstanding features specific to the question. In many cases, multiple representations are needed to address a question.

In the long run, our objective is to add as many graphic representations as possible as we move along in answering more and more research questions based on our datasets. Given time, this growing library of visual analytics would develop sufficient power to the point that it may also become useful for teachers to monitor their students' work and thereby conduct formative assessment. To guarantee that our visual analytics runs on all devices, this library is written in JavaScript/HTML/CSS. A number of touch gestures are also supported for users to use the library on a multi-touch screen. A neat feature of this library is that multiple graphs and charts can be grouped together so that when you are interacting with one of them, the linked ones also change at the same time. As the datasets are temporal in nature, you can also animate these graphs to reconstruct and track exactly what students do throughout.

Book review: "Simulation and Learning: A Model-Centered Approach" by Franco Landriscina

Interactive science (Image credit: Franco Landriscina)
If future historians were to write a book about the most important contributions of technology to improving science education, it would be hard for them to skip computer modeling and simulation.

Much of our intelligence as humans originates from our ability to run mental simulations or thought experiments in our mind to decide whether it would be a good idea to do something or not to do something. We are able to do this because we have already acquired some basic ideas or mental models that can be applied to new situations. But how do we get those ideas in the first place? Sometimes we learn from our experiences. Sometimes we learn from listening to someone. Now, we can learn from computer simulation, which was carefully programmed by someone who knows the subject matter well and is typically expressed by a computer through interactive visualization based on some sort of calculation. In the cases when the subject matter is entirely alien to students such as atoms and molecules, computer simulation is perhaps the most effective form of instruction. Given the importance of mental simulation in scientific reasoning, there is no doubt that computer simulation, bearing some similarity with mental simulation, should have great potential in fostering learning.

Constructive science (Image credit: Franco Landriscina)
Although enough ink has been spilled on this topic and many thoughts have existed in various forms for decades, I found the book "Simulation and Learning: A Model-Centered Approach" by Dr. Franco Landriscina, an experimental psychologist in Italy, is a masterpiece that I must have on my desk and chew over from time to time. What Dr. Landriscina has accomplished in a book less than 250 pages is amazingly deep and wide. He starts with fundamental questions in cognition and learning that are related to simulation-based instruction. He then gradually builds a solid theoretical foundation for understanding why computer simulation can help people learn and think by grounding cognition in the interplay between mental simulation (internal) and computer simulation (external). This intimate coupling of internalization and externalization leads to some insights as for how the effectiveness of computer simulation as an instructional tool can be maximized in various cases. For example, Landriscina's two illustrations, embedded in this blog post, represent how two ways of using simulations in learning, which I coined as "Interactive Science" and "Constructive Science," differ in terms of the relationships among the foundational components in cognition and simulation.

This book is not only useful to researchers. Developers should benefit from reading it, too. Developers tend to create educational tools and materials based on the learning goals set by some education standards, with less consideration on how complex learning actually happens through interaction and cognition in reality. This succinct book should provide a comprehensive, insightful, and intriguing guide for those developers who would like to understand more deeply about simulation-based learning in order to create more effective educational simulations.

SimBuilding on iPad

SimBuilding (alpha version) is a 3D simulation game that we are developing to provide a more accessible and fun way to teach building science. A good reason that we are working on this game is because we want to teach building science concepts and practices to home energy professionals without having to invade someone's house or risk ruining it (well, we have to create or maintain some awful cases for teaching purposes, but what sane property owner would allow us to do so?). We also believe that computer graphics can be used to create some cool effects that demonstrate the ideas more clearly, providing complementary experiences to hands-on learning. The project is funded by the National Science Foundation to support technical education and workforce development.

SimBuilding is based on three.js, a powerful JavaScript-based graphics library that renders 3D scenes within the browser using WebGL. This allows it to run on a variety of devices, including the iPad (but not on a smartphone that has less horsepower, however). The photos in this blog post show how it looks on an iPad Mini, with multi-touch support for navigation and interaction.

In its current version, SimBuilding only supports virtual infrared thermography. The player walks around in a virtual house, challenged to correctly identify home energy problems in a house using a virtual IR camera. The virtual IR camera will show false-color IR images of a large number of sites when the player inspects them, from which the player must diagnose the causes of problems if he believes the house has been compromised by problems such as missing insulation, thermal bridge, air leakage, or water damage. In addition to the IR camera, a set of diagnostics tools is also provided, such as a blower-door system that is used to depressurize a house for identifying infiltration. We will also provide links to our Energy2D simulations should the player become interested in deepening their understanding about heat transfer concepts such as conduction, convection, and radiation.

SimBuilding is a collaborative project with New Mexico EnergySmart Academy at Santa Fe. A number of industry partners such as FLIR Systems and Building Science Corporation are also involved in this project. Our special thanks go to Jay Bowen of FLIR, who generously provided most of the IR images used to create the IR game scenes free of charge.

Comparing two smartphone-based infrared cameras

Figure 1
With the releases of two competitively priced IR cameras for smartphones, the year 2014 has become a milestone for IR imaging. Early in 2014, FLIR unveiled the $349 FLIR ONE, the first IR camera that can be attached to an iPhone. Months later, a startup company Seek Thermal released a $199 IR camera that has an even higher resolution and is attachable to most smartphones. In addition, another company Therm-App released an Android mobile thermal camera that specializes in long-range night vision and high-resolution thermography, priced at $1,600. The race is on... Into 2015, FLIR announced a new version of FLIR ONE that supports both Android and iOS and will probably be even more aggressively priced.

Figure 2
All these game changers can take impressive IR images just like taking conventional photos and record IR videos just like recording conventional videos, and then share them online through an app. The companies also provide a software developers kit (SDK) for a third party to create apps linked to their cameras. Excited by these new developments, researchers at several Swedish universities and I have embarked an international collaboration towards the vision that IR cameras will one day become as necessary as microscopes in science labs.

Figure3
To test these new IR cameras, I did an easy-to-do experiment (Figure 1) that shows a paradoxical warming effect on a piece of paper placed on top of a cup of (slightly cooler than) room-temperature water. This seemingly simple experiment actually leads to very deep science at the molecular level, as blogged before.

I took images using FLIR ONE (Figure 2) and SEEK (Figure 3), respectively. These images are shown to the right for comparison. As you can see, both cameras are sensitive enough to capture the small temperature rise caused by water absorption and condensation underside the paper.

The FLIR ONE has a nice feature that contextualizes the false-color IR image by overlaying it on top of the edges (where brightness changes sharply) of the true-color image taken at the same time by the conventional camera of the smartphone. With this feature, you can see the sharp edges of the paper in Figure 2.

Beautiful Chemistry


It is hard for students to associate chemistry with beauty. The image of chemistry in schools is mostly linked to something dangerous, dirty, or smelly. Yet Dr. Yan Liang, a collaborator and a materials scientist with a Ph.D. degree from the University of Minnesota, is launching a campaign to change that image. The result of his work is now online at beautifulchemistry.net.

To bring the beauty of chemistry to the general public, Dr. Liang uses 4K UltraHD cameras and special lenses to capture chemical reactions in astonishing detail and advanced computer graphics to render stunning images of molecular structures.

Using the beauty of science to interest students has rarely been taken seriously by educators. The federal government has invested billions of dollars in instructional materials development. But from a layman's point of view, it is hard to imagine how children can be engaged in science if they do not fall in love with it. Beautiful Chemistry represents an attempt that could inspire a whole new genre of high-quality educational materials based on breathtaking scientific visualizations. How about Beautiful Physics and Beautiful Biology?

Our work is well aligned with this vision. Our interactive, visual Energy2D simulations bring a beautiful world of heat and mass flow to students like never seen before; our Energy3D software creates splendid 3D scenes based on scientific calculations; and our infrared visualization of the real world has uncovered a beautiful hidden universe through an IR lens. These materials demonstrate computational and experimental ways to marry science and beauty and have resulted in great enticements in science classrooms.

BTW, Dr. Liang is the artist who designed the splash panes of Energy2D and Energy3D.

Accurate prediction of solar radiation using Energy3D: Part II

About a week ago, I reported our progress in modeling worldwide solar radiation with our Energy3D software. While our calculated insolation data for a horizontal surface agreed quite well with the data provided by the National Solar Radiation Data Base, those for a south-facing vertical surface did not work out as well. I suspected that the discrepancy was partly caused by missing the reflection of short-wave radiation: not all sunlight is absorbed by the Earth. A certain portion is reflected. The ability of a material to reflect sunlight is known as albedo. For example, fresh snow can reflect up to 90% of solar energy. People who live in the northern part of the country often experience strong reflection from snow or ice in the winter.

Figure 1. Calculated and measured insolation on a south-facing surface.
In the summer, the Sun is high in the sky. A south-facing plate doesn't get as much energy as in other seasons, especially near the Equator where the Sun is just above your head (such as Honolulu as included in the figures above). However, the ambient reflection can be significant. After incorporating this component into our equations following the convention in the ASHRAE solar radiation model, the agreement between the calculated and measured results significantly improves -- you can see this big improvement by comparing Figure 1 (new algorithm) with Figure 2 (old algorithm).

Figure 2. Results without considering reflected short-wave radiation.
This degree of accuracy is critically important to supporting meaningful engineering design projects on renewable energy sources that might be conducted by students across the country. We are working to refine our computational algorithms further based on 50 years' research on solar science. This work will lend Energy3D the scientific integrity needed for rational design, be it about sustainable architecture, urban planning, or solar parks.

Go to Part I and Part III.

Accurate prediction of solar radiation using Energy3D: Part I

Solar engineering and building design rely on accurate prediction of solar radiation at any given location. This is a core functionality of our Energy3D CAD software. We are proud to announce that, through continuous improvements of our mathematical model, Energy3D is now capable of modeling solar radiation with an impressive precision.

Figure 1. Comparison of measured and calculated solar radiation on a horizontal plate at 10 US locations.
Figure 1 shows that Energy3D's calculated results of solar energy density on a horizontal plate agree remarkably well with, the National Solar Radiation Database that houses 30 years of data measured by the National Renewable Energy Laboratory of the U.S. Department of Energy -- for 10 cities across the US. One striking success is the prediction of a dip of solar radiation in June for Miami, FL (see the second image of the first row). Overall, the predicted results are slightly smaller than the measured ones. 

Note that these results are theoretical calculations, not numerical fits (such as using an artificial neural network to predict based on previous data). It is pretty amazing if you think about this: Through some complex calculations the number for each month and each city come very close to the data measured for three decades at those weather stations scattered around the country! This is the holy grail of computer simulation. This success lays a solid foundation for our Energy3D software to be scientifically and engineeringly relevant.

Figure 2. Comparison of measured and calculated solar radiation on a south-facing plate at 10 US locations.
The National Renewable Energy Laboratory also measured the solar radiation on surfaces that tilt at different angles. The predicted trends for the solar energy density on an upright south-facing plate agree reasonably well (Figure 2) with the measured data. For example, both measured and calculated data show that solar radiation on a south-facing plate peaks in the spring and fall for most northern locations and in the winter for tropical locations. It is amazing that Energy3D also correctly predicts the exception --  Anchorage in Alaska, where the solar data peak only in the spring!

Quantitatively, Energy3D seems to underestimate the solar radiation more than in the horizontal case shown in Figure 1, especially for the summer months. We suspect that this is because a vertical plate has a larger contribution from the ambient radiation and reflection than a horizontal plate (which faces the sky). We are now working towards a better model to correct this problem.

For Energy3D to serve a global audience, we have collected geographical and climate data of more than 150 domestic and foreign locations and integrated them into the software (Version 3.2). If you live in the US, you are guaranteed to find at least one location in your state.

Go to Part II and Part III.