We often rely on shapes and patterns when navigating the world. Poison ivy or an innocent plant? A nasty rash or the imprint of the textured wall you were leaning against? Similarly, scientists often use shapes and patterns to interpret datasets. Do the points follow a straight line? Appear in clusters? On the street and in the lab, shapes help us organize information, interpret data, and even make predictions.
While some sets of data are relatively straightforward to interpret, others get messy quickly. It can be difficult to extract useful information from maps of complicated situations like the relationship between diseases and their associated genes. This is because the structures that emerge often depend on parameters chosen by researchers through a somewhat arbitrary process, making it difficult to tell when a structure is really significant. In new research recently published in the American Physical Society’s journal Physical Review E, a team of scientists from Université Laval in Canada, the Politecnico di Torino and the ISI Foundation in Italy introduce a valuable tool for determining whether the shape of a complex dataset is actually significant.
Diseases and their associated genes are just one example of what scientists classify as a complex system. Many other systems fall into this category too—the Earth’s climate, living cells, the human brain, social structures—really any system that is difficult to describe because it contains so many moving, interacting pieces. Of course, understanding these same systems can have a profound effect on our quality of life, enabling early warning systems, targeted treatments, and effective interventions.
A common way of studying these systems is with complex networks, a way of visually representing components and their interactions and looking at the structures that emerge. For example, the traditional network approach considers each component to be a node and each interaction between two components as a line linking them together. Research shows that the network approach is effective in helping us understand many systems. However, you can lose important information by applying it to a complex system that can’t be broken down into a set of clean interactions between two components.
An alternative that works better for systems like the brain and social structures is the simplicial complexes approach. This approach uses mathematical objects to capture complicated interactions. Once you encode the data from a complex system into a simplicial complex, you can extract information by looking at its shape. For example, you can ask questions like: What is the shape of this dataset from a brain? Does the shape of this dataset tell me anything about the health of the brain?
Researchers have demonstrated that this approach is effective, says Jean-Gabriel Young, a researcher at Université Laval who co-led the project with Alice Patania (Politecnico di Torino and ISI Foundation), but has lacked a statistical foundation. In other words, when you organize information into simplicial complexes it can be hard to determine whether the shape is surprising or random, whether a pattern can be explained by chance or has some significant meaning. Making this determination requires a precise model that lets you compare your observed data to its random results—and such a model hasn’t existed until now.
Young and Patania met at the Santa Fe Institute in the summer of 2015, as attendees at a summer school for PhD students focused on complexity. Young was a physics student studying network science and Patania a math student studying topological data analysis, an area focused on finding and quantifying the shape of datasets. The two worked together over the summer and kept in touch. Over the course of two later visits, one by Young to Italy and one by Patania to Canada (the latter trip funded by a grant from the Young Researcher Network on Complex Systems), the researchers combined their expertise in different areas to address the question of how to quantify randomness in the shapes of complex systems.
Working with Patania’s advisors Giovanni Petri and Francesco Vaccarino, the two developed what they call the Simplicial Configuration Model (SCM). The SCM describes all of ways in which the components of a simplicial complex can be arranged. The team also developed an algorithm for generating many random versions of a real dataset. Together, these tools enable a researcher to compare the simplicial complexes most likely to be produced by a random system to the real data. In this way, you can determine whether an observed pattern is statistically different than one produced by chance.
The team investigated three real datasets using their model:
• the relationship between flower-visiting insects and plants,
• the relationship between human disease and genes linked by known disorder-gene associations, and
• the relationship between the individuals involved in crimes in St. Louis (suspects, victims, and witnesses).
The test results show that the observed shape of the pollinator dataset is random and has no higher-level organizational system. The structure is simply the result of random interactions among the insects and plants. In the other two cases, however, the observed structures have high levels of statistical significance. The patterns are very different than what would be produced by chance alone. In the case of the crime data, this significance comes at least in part from the way the data was collected—by looking up ties between suspects, victims, and witness already in a database. In the disease case, the significance doesn’t come from data collection, implying that the system self-organizes in some way.
The SCM can already be applied to real situations, as shown in the case studies, although there are some technical questions that still need to be addressed. The model sets the simplicial complexes approach on a more solid, objective footing and the researchers say it could lead to new insights on the emergence of patterns and higher-order structural properties in complex systems. Given the complex systems and massive datasets that make our world go ’round, that’s a good thing for all of us.