Tag Archives: machine

Automating big-data analysis : MIT Research

System that replaces human intuition with algorithms outperforms 615 of 906 human teams.

By Larry Hardesty


Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.

In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

Between the lines

Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk fordropping out of online courses.

“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”

In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITxdoesn’t record either of those statistics, but it does collect data from which they can be inferred.

Featured composition

Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.

For instance, one table might list retail items and their costs; another might list items included in individual customers’ purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.

It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.

Once it’s produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.

“The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem,” says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. “I think what they’ve done is going to become the standard quickly — very quickly.”

Source: MIT News Office

 

Characteristics of a universal simulator|Study narrows the scope of research on quantum computing

Despite a lot of work being done by many research groups around the world, the field of Quantum computing is still in its early stages. We still need to cover a lot of grounds to achieve the goal of developing a working Quantum computer capable of doing the tasks which are expected or predicted. Recent research by a SISSA led team has tried to give the future research in the area of Quantum computing some direction based on the current state of research in the area.


“A quantum computer may be thought of as a ‘simulator of overall Nature,” explains Fabio Franchini, a researcher at the International School for Advanced Studies (SISSA) of Trieste, “in other words, it’s a machine capable of simulating Nature as a quantum system, something that classical computers cannot do”. Quantum computers are machines that carry out operations by exploiting the phenomena of quantum mechanics, and they are capable of performing different functions from those of current computers. This science is still very young and the systems produced to date are still very limited. Franchini is the first author of a study just published in Physical Review Xwhich establishes a basic characteristic that this type of machine should possess and in doing so guides the direction of future research in this field.

The study used analytical and numerical methods. “What we found” explains Franchini, “is that a system that does not exhibit ‘Majorana fermions’ cannot be a universal quantum simulator”. Majorana fermions were hypothesized by Ettore Majorana in a paper published 1937, and they display peculiar characteristics: a Majorana fermion is also its own antiparticle. “That means that if Majorana fermions meet they annihilate among themselves,” continues Franchini. “In recent years it has been suggested that these fermions could be found in states of matter useful for quantum computing, and our study confirms that they must be present, with a certain probability related to entanglement, in the material used to build the machine”.

Entanglement, or “action at a distance”, is a property of quantum systems whereby an action done on one part of the system has an effect on another part of the same system, even if the latter has been split into two parts that are located very far apart. “Entanglement is a fundamental phenomenon for quantum computers,” explains Franchini.

“Our study helps to understand what types of devices research should be focusing on to construct this universal simulator. Until now, given the lack of criteria, research has proceeded somewhat randomly, with a huge consumption of time and resources”.

The study was conducted with the participation of many other international research institutes in addition to SISSA, including the Massachusetts Institute of Technology (MIT) in Boston, the University of Oxford and many others.

More in detail…

“Having a quantum computer would open up new worlds. For example, if we had one today we would be able to break into any bank account,” jokes Franchini. “But don’t worry, we’re nowhere near that goal”.

At the present time, several attempts at quantum machines exist that rely on the properties of specific materials. Depending on the technology used, these computers have sizes varying from a small box to a whole room, but so far they are only able to process a limited number of information bits, an amount infinitely smaller than that processed by classical computers.

However, it’s not correct to say that quantum computers are, or will be, more powerful than traditional ones, points out Franchini. “There are several things that these devices are worse at. But, by exploiting quantum mechanics, they can perform operations that would be impossible for classical computers”.

Source: International School of Advanced Studies (SISSA)