Tag Archives: patterns

Automating big-data analysis : MIT Research

System that replaces human intuition with algorithms outperforms 615 of 906 human teams.

By Larry Hardesty

Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.

In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

Between the lines

Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk fordropping out of online courses.

“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”

In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITxdoesn’t record either of those statistics, but it does collect data from which they can be inferred.

Featured composition

Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.

For instance, one table might list retail items and their costs; another might list items included in individual customers’ purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.

It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.

Once it’s produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.

“The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem,” says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. “I think what they’ve done is going to become the standard quickly — very quickly.”

Source: MIT News Office


Wrinkle predictions:New mathematical theory may explain patterns in fingerprints, raisins, and microlenses.

By Jennifer Chu

CAMBRIDGE, Mass. – As a grape slowly dries and shrivels, its surface creases, ultimately taking on the wrinkled form of a raisin. Similar patterns can be found on the surfaces of other dried materials, as well as in human fingerprints. While these patterns have long been observed in nature, and more recently in experiments, scientists have not been able to come up with a way to predict how such patterns arise in curved systems, such as microlenses.

Now a team of MIT mathematicians and engineers has developed a mathematical theory, confirmed through experiments, that predicts how wrinkles on curved surfaces take shape. From their calculations, they determined that one main parameter — curvature — rules the type of pattern that forms: The more curved a surface is, the more its surface patterns resemble a crystal-like lattice.

The researchers say the theory, reported this week in the journal Nature Materials, may help to generally explain how fingerprints and wrinkles form.

“If you look at skin, there’s a harder layer of tissue, and underneath is a softer layer, and you see these wrinkling patterns that make fingerprints,” says Jörn Dunkel, an assistant professor of mathematics at MIT. “Could you, in principle, predict these patterns? It’s a complicated system, but there seems to be something generic going on, because you see very similar patterns over a huge range of scales.”

The group sought to develop a general theory to describe how wrinkles on curved objects form — a goal that was initially inspired by observations made by Dunkel’s collaborator, Pedro Reis, the Gilbert W. Winslow Career Development Associate Professor in Civil Engineering.

In past experiments, Reis manufactured ping pong-sized balls of polymer in order to investigate how their surface patterns may affect a sphere’s drag, or resistance to air. Reis observed a characteristic transition of surface patterns as air was slowly sucked out: As the sphere’s surface became compressed, it began to dimple, forming a pattern of regular hexagons before giving way to a more convoluted, labyrinthine configuration, similar to fingerprints.

“Existing theories could not explain why we were seeing these completely different patterns,” Reis says.

Denis Terwagne, a former postdoc in Reis’ group, mentioned this conundrum in a Department of Mathematics seminar attended by Dunkel and postdoc Norbert Stoop. The mathematicians took up the challenge, and soon contacted Reis to collaborate.

Ahead of the curve

Reis shared data from his past experiments, which Dunkel and Stoop used to formulate a generalized mathematical theory. According to Dunkel, there exists a mathematical framework for describing wrinkling, in the form of elasticity theory — a complex set of equations one could apply to Reis’ experiments to predict the resulting shapes in computer simulations. However, these equations are far too complicated to pinpoint exactly when certain patterns start to morph, let alone what causes such morphing.

Combining ideas from fluid mechanics with elasticity theory, Dunkel and Stoop derived a simplified equation that accurately predicts the wrinkling patterns found by Reis and his group.

“What type of stretching and bending is going on, and how the substrate underneath influences the pattern — all these different effects are combined in coefficients so you now have an analytically tractable equation that predicts how the patterns evolve, depending on the forces that act on that surface,” Dunkel explains.

In computer simulations, the researchers confirmed that their equation was indeed able to reproduce correctly the surface patterns observed in experiments. They were therefore also able to identify the main parameters that govern surface patterning.

As it turns out, curvature is one major determinant of whether a wrinkling surface becomes covered in hexagons or a more labyrinthine pattern: The more curved an object, the more regular its wrinkled surface. The thickness of an object’s shell also plays a role: If the outer layer is very thin compared to its curvature, an object’s surface will likely be convoluted, similar to a fingerprint. If the shell is a bit thicker, the surface will form a more hexagonal pattern.

Dunkel says the group’s theory, although based primarily on Reis’ work with spheres, may also apply to more complex objects. He and Stoop, together with postdoc Romain Lagrange, have used their equation to predict the morphing patterns in a donut-shaped object, which they have now challenged Reis to reproduce experimentally. If these predictions can be confirmed in future experiments, Reis says the new theory will serve as a design tool for scientists to engineer complex objects with morphable surfaces.

“This theory allows us to go and look at shapes other than spheres,” Reis says. “If you want to make a more complicated object wrinkle — say, a Pringle-shaped area with multiple curvatures — would the same equation still apply? Now we’re developing experiments to check their theory.”

This research was funded in part by the National Science Foundation, the Swiss National Science Foundation, and the MIT Solomon Buchsbaum Fund.

Source: MIT News Office