Data Mining for Dark Matter

They say that to see is to believe. Nowhere is that more true than for astronomers who, despite their ongoing attempts to understand what dark matter is, have witnessed countless examples of the elusive stuff in action.

What astronomers have observed so far is that the only way in which dark matter reveals itself is through its gravitational influence on visible matter. Therefore, to better understand dark matter, astronomers are in the process of mapping where in our universe it disrupts the visible matter around it.

Considering that dark matter accounts for about 26 percent of the mass in our universe, there are quite a few locations where it leaves its gravitational mark – too many for an individual to count.

Computers can make quick work of large data sets, but some of the tracking software for spotting evidence of dark matter is not as efficient as astronomers might like and by no means is ready for the deluge of data that will come from telescopes like the Large Synoptic Survey Telescope. In an attempt to solve this problem, astronomer David Harvey approached a company called Kaggle.

The long streaks of light stretched across the photo are images of galaxies whose light has been bent due to the influence of gravity. The process of gravity bending light and distorting how distant objects appear from Earth is called gravitational lensing. Dark matter is one of the causes of gravitational lensing. Credit: NASA Goddard Spaceflight Center

Kaggle deals in data mining and crowdsourcing. It establishes competitions, daring data scientists across the globe to design improved software programs than what companies like Allstate, Ford, Facebook, GE and NASA currently use to analyze certain data sets.

How Kaggle works is that a company approaches them with a data set and instructions on what they want to extract from that data. Kaggle then uploads the data and instructions to their website where anyone can download it.

Kaggle awards a cash prize, which can reach up to tens of thousands of dollars, to who can develop a model that beats the current software. In a way, Kaggle competitions are similar to Galaxy Zoo in that they use crowdsourcing. But, in other ways, it’s different explained Harvey who is a postgraduate student at the University of Edinburg in Edinburgh, UK.

“It’s like citizen crowdsourcing science but it’s with experts,” Harvey said. “Not…everyone can do it, it requires computer literature to do it. The person who won it had a PhD in physics and the person who was second was an expert in computer science.”

Harvey, with help from colleagues in the US, Netherlands and Portugal, launched the Kaggle competition “Observing Dark Worlds” in October 2012. The competitor who submitted the best overall code was handsomely rewarded a $20,000 prize.

Competitors mined through synthetic images of galaxy clusters that Harvey had simulated. The images were not just of galaxy clusters, however. They included the gravitational footprints of dark matter halos.

Dark matter halos are large collections of dark matter that astronomers think encompass galaxies and galaxy clusters — like a bubble surrounding a pocket of air. If light from a distant object passes too close to a galaxy or cluster of galaxies, the gravitational tug from the dark matter halo surrounding that galaxy or cluster (as well as the visible matter) will bend the light’s path.

This process is called gravitational lensing and distorts images of objects as they appear from Earth. A similar trick of the eye occurs when light bends, or refracts, upon entering water, causing submerged objects to look larger.

Examples of images in the Kaggle competition “Observing Dark Worlds.” Captions and images courtesy of David Harvey of the University of Edinburgh. 

Some times, gravitational lensing will make galaxies appear more elliptical than they already are, and it is these distortions that Kaggle competitors were after. Their goal was to develop an algorithm that could reconstruct the positions of dark matter halos from simulated images of clusters of galaxies. A scoring system helped competitors monitor the accuracy and progress of their models.

The two-month-long, comparatively popular competition to other Kaggle challenges attracted 357 competitors who worked from synthetic images of 120 simulated galaxy clusters. The top three algorithms, which Harvey and his colleagues discuss in their recent publication in the journal Astronomy and Computing, improved upon one of the benchmark gravitational lensing software, LENSTOOL, by 30 percent.

Harvey said that although the top codes that came from the Kaggle competition are an improvement, they are not ready to replace tools like LENSTOOL, yet. One of the reasons being that the code is based from simulated images instead of the real thing. Moreover, the codes sometimes achieve the desired end result but through a process that would not necessarily work on real data.

“The extra work between the end of a competition and getting algorithms through takes lots of time and effort and collaboration… I’m not sure if the algorithm will turn into something,” said Harvey who explained that the contacts he made from the competition were as equally important as the code he and his colleagues gained. “It’s about getting contacts as well as algorithms,” he said.

Harvey recently finished another crowdsourcing project with Galaxy Zoo, the data from which he hopes to use to develop algorithms that classify galaxies.

You may also read these articles