Tag Archives: computer

Researchers devise efficient power converter for internet of things

Researchers devise efficient power converter for internet of things

By Larry Hardesty


 

CAMBRIDGE, Mass. – The “internet of things” is the idea that vehicles, appliances, civil structures, manufacturing equipment, and even livestock will soon have sensors that report information directly to networked servers, aiding with maintenance and the coordination of tasks.

Those sensors will have to operate at very low powers, in order to extend battery life for months or make do with energy harvested from the environment. But that means that they’ll need to draw a wide range of electrical currents. A sensor might, for instance, wake up every so often, take a measurement, and perform a small calculation to see whether that measurement crosses some threshold. Those operations require relatively little current, but occasionally, the sensor might need to transmit an alert to a distant radio receiver. That requires much larger currents.

Generally, power converters, which take an input voltage and convert it to a steady output voltage, are efficient only within a narrow range of currents. But at the International Solid-State Circuits Conference last week, researchers from MIT’s Microsystems Technologies Laboratories (MTL) presented a new power converter that maintains its efficiency at currents ranging from 500 picoamps to 1 milliamp, a span that encompasses a 200,000-fold increase in current levels.

“Typically, converters have a quiescent power, which is the power that they consume even when they’re not providing any current to the load,” says Arun Paidimarri, who was a postdoc at MTL when the work was done and is now at IBM Research. “So, for example, if the quiescent power is a microamp, then even if the load pulls only a nanoamp, it’s still going to consume a microamp of current. My converter is something that can maintain efficiency over a wide range of currents.”

Paidimarri, who also earned doctoral and master’s degrees from MIT, is first author on the conference paper. He’s joined by his thesis advisor, Anantha Chandrakasan, the Vannevar Bush Professor of Electrical Engineering and Computer Science at MIT.

Packet perspective

The researchers’ converter is a step-down converter, meaning that its output voltage is lower than its input voltage. In particular, it takes input voltages ranging from 1.2 to 3.3 volts and reduces them to between 0.7 and 0.9 volts.

“In the low-power regime, the way these power converters work, it’s not based on a continuous flow of energy,” Paidimarri says. “It’s based on these packets of energy. You have these switches, and an inductor, and a capacitor in the power converter, and you basically turn on and off these switches.”

The control circuitry for the switches includes a circuit that measures the output voltage of the converter. If the output voltage is below some threshold — in this case, 0.9 volts — the controllers throw a switch and release a packet of energy. Then they perform another measurement and, if necessary, release another packet.

If no device is drawing current from the converter, or if the current is going only to a simple, local circuit, the controllers might release between 1 and a couple hundred packets per second. But if the converter is feeding power to a radio, it might need to release a million packets a second.

To accommodate that range of outputs, a typical converter — even a low-power one — will simply perform 1 million voltage measurements a second; on that basis, it will release anywhere from 1 to 1 million packets. Each measurement consumes energy, but for most existing applications, the power drain is negligible. For the internet of things, however, it’s intolerable.

Clocking down

Paidimarri and Chandrakasan’s converter thus features a variable clock, which can run the switch controllers at a wide range of rates. That, however, requires more complex control circuits. The circuit that monitors the converter’s output voltage, for instance, contains an element called a voltage divider, which siphons off a little current from the output for measurement. In a typical converter, the voltage divider is just another element in the circuit path; it is, in effect, always on.

But siphoning current lowers the converter’s efficiency, so in the MIT researchers’ chip, the divider is surrounded by a block of additional circuit elements, which grant access to the divider only for the fraction of a second that a measurement requires. The result is a 50 percent reduction in quiescent power over even the best previously reported experimental low-power, step-down converter and a tenfold expansion of the current-handling range.

“This opens up exciting new opportunities to operate these circuits from new types of energy-harvesting sources, such as body-powered electronics,” Chandrakasan says.

The work was funded by Shell and Texas Instruments, and the prototype chips were built by the Taiwan Semiconductor Manufacturing Corporation, through its University Shuttle Program.

Source: MIT News Office

Click on the image to know more about Prime Embedded Solutions

Automating big-data analysis : MIT Research

System that replaces human intuition with algorithms outperforms 615 of 906 human teams.

By Larry Hardesty


Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.

In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

“We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

Between the lines

Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk fordropping out of online courses.

“What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”

In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITxdoesn’t record either of those statistics, but it does collect data from which they can be inferred.

Featured composition

Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.

For instance, one table might list retail items and their costs; another might list items included in individual customers’ purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.

It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.

Once it’s produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.

“The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem,” says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. “I think what they’ve done is going to become the standard quickly — very quickly.”

Source: MIT News Office

 

Longstanding problem put to rest:Proof that a 40-year-old algorithm is the best possible will come as a relief to computer scientists.

By Larry Hardesty


CAMBRIDGE, Mass. – Comparing the genomes of different species — or different members of the same species — is the basis of a great deal of modern biology. DNA sequences that are conserved across species are likely to be functionally important, while variations between members of the same species can indicate different susceptibilities to disease.

The basic algorithm for determining how much two sequences of symbols have in common — the “edit distance” between them — is now more than 40 years old. And for more than 40 years, computer science researchers have been trying to improve upon it, without much success.

At the ACM Symposium on Theory of Computing (STOC) next week, MIT researchers will report that, in all likelihood, that’s because the algorithm is as good as it gets. If a widely held assumption about computational complexity is correct, then the problem of measuring the difference between two genomes — or texts, or speech samples, or anything else that can be represented as a string of symbols — can’t be solved more efficiently.

In a sense, that’s disappointing, since a computer running the existing algorithm would take 1,000 years to exhaustively compare two human genomes. But it also means that computer scientists can stop agonizing about whether they can do better.

“This edit distance is something that I’ve been trying to get better algorithms for since I was a graduate student, in the mid-’90s,” says Piotr Indyk, a professor of computer science and engineering at MIT and a co-author of the STOC paper. “I certainly spent lots of late nights on that — without any progress whatsoever. So at least now there’s a feeling of closure. The problem can be put to sleep.”

Moreover, Indyk says, even though the paper hasn’t officially been presented yet, it’s already spawned two follow-up papers, which apply its approach to related problems. “There is a technical aspect of this paper, a certain gadget construction, that turns out to be very useful for other purposes as well,” Indyk says.

Squaring off

Edit distance is the minimum number of edits — deletions, insertions, and substitutions — required to turn one string into another. The standard algorithm for determining edit distance, known as the Wagner-Fischer algorithm, assigns each symbol of one string to a column in a giant grid and each symbol of the other string to a row. Then, starting in the upper left-hand corner and flooding diagonally across the grid, it fills in each square with the number of edits required to turn the string ending with the corresponding column into the string ending with the corresponding row.

Computer scientists measure algorithmic efficiency as computation time relative to the number of elements the algorithm manipulates. Since the Wagner-Fischer algorithm has to fill in every square of its grid, its running time is proportional to the product of the lengths of the two strings it’s considering. Double the lengths of the strings, and the running time quadruples. In computer parlance, the algorithm runs in quadratic time.

That may not sound terribly efficient, but quadratic time is much better than exponential time, which means that running time is proportional to 2N, where N is the number of elements the algorithm manipulates. If on some machine a quadratic-time algorithm took, say, a hundredth of a second to process 100 elements, an exponential-time algorithm would take about 100 quintillion years.

Theoretical computer science is particularly concerned with a class of problems known as NP-complete. Most researchers believe that NP-complete problems take exponential time to solve, but no one’s been able to prove it. In their STOC paper, Indyk and his student Artūrs Bačkurs demonstrate that if it’s possible to solve the edit-distance problem in less-than-quadratic time, then it’s possible to solve an NP-complete problem in less-than-exponential time. Most researchers in the computational-complexity community will take that as strong evidence that no subquadratic solution to the edit-distance problem exists.

Can’t get no satisfaction

The core NP-complete problem is known as the “satisfiability problem”: Given a host of logical constraints, is it possible to satisfy them all? For instance, say you’re throwing a dinner party, and you’re trying to decide whom to invite. You may face a number of constraints: Either Alice or Bob will have to stay home with the kids, so they can’t both come; if you invite Cindy and Dave, you’ll have to invite the rest of the book club, or they’ll know they were excluded; Ellen will bring either her husband, Fred, or her lover, George, but not both; and so on. Is there an invitation list that meets all those constraints?

In Indyk and Bačkurs’ proof, they propose that, faced with a satisfiability problem, you split the variables into two groups of roughly equivalent size: Alice, Bob, and Cindy go into one, but Walt, Yvonne, and Zack go into the other. Then, for each group, you solve for all the pertinent constraints. This could be a massively complex calculation, but not nearly as complex as solving for the group as a whole. If, for instance, Alice has a restraining order out on Zack, it doesn’t matter, because they fall in separate subgroups: It’s a constraint that doesn’t have to be met.

At this point, the problem of reconciling the solutions for the two subgroups — factoring in constraints like Alice’s restraining order — becomes a version of the edit-distance problem. And if it were possible to solve the edit-distance problem in subquadratic time, it would be possible to solve the satisfiability problem in subexponential time.

Source: MIT News Office

Better debugger

System to automatically find a common type of programming bug significantly outperforms its predecessors.

By Larry Hardesty


CAMBRIDGE, Mass. – Integer overflows are one of the most common bugs in computer programs — not only causing programs to crash but, even worse, potentially offering points of attack for malicious hackers. Computer scientists have devised a battery of techniques to identify them, but all have drawbacks.

This month, at the Association for Computing Machinery’s International Conference on Architectural Support for Programming Languages and Operating Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) will present a new algorithm for identifying integer-overflow bugs. The researchers tested the algorithm on five common open-source programs, in which previous analyses had found three bugs. The new algorithm found all three known bugs — and 11 new ones.

The variables used by computer programs come in a few standard types, such as floating-point numbers, which can contain decimals; characters, like the letters of this sentence; or integers, which are whole numbers. Every time the program creates a new variable, it assigns it a fixed amount of space in memory.

If a program tries to store too large a number at a memory address reserved for an integer, the operating system will simply lop off the bits that don’t fit. “It’s like a car odometer,” says Stelios Sidiroglou-Douskos, a research scientist at CSAIL and first author on the new paper. “You go over a certain number of miles, you go back to zero.”

In itself, an integer overflow won’t crash a program; in fact, many programmers use integer overflows to perform certain types of computations more efficiently. But if a program tries to do something with an integer that has overflowed, havoc can ensue. Say, for instance, that the integer represents the number of pixels in an image the program is processing. If the program allocates memory to store the image, but its estimate of the image’s size is off by several orders of magnitude, the program will crash.

Charting a course

Any program can be represented as a flow chart — or, more technically, a graph, with boxes that represent operations connected by line segments that represent the flow of data between operations. Any given program input will trace a single route through the graph. Prior techniques for finding integer-overflow bugs would start at the top of the graph and begin working through it, operation by operation.

For even a moderately complex program, however, that graph is enormous; exhaustive exploration of the entire thing would be prohibitively time-consuming. “What this means is that you can find a lot of errors in the early input-processing code,” says Martin Rinard, an MIT professor of computer science and engineering and a co-author on the new paper. “But you haven’t gotten past that part of the code before the whole thing poops out. And then there are all these errors deep in the program, and how do you find them?”

Rinard, Sidiroglou-Douskos, and several other members of Rinard’s group — researchers Eric Lahtinen and Paolo Piselli and graduate students Fan Long, Doekhwan Kim, and Nathan Rittenhouse — take a different approach. Their system, dubbed DIODE (for Directed Integer Overflow Detection), begins by feeding the program a single sample input. As that input is processed, however — as it traces a path through the graph — the system records each of the operations performed on it by adding new terms to what’s known as a “symbolic expression.”

“These symbolic expressions are complicated like crazy,” Rinard explains. “They’re bubbling up through the very lowest levels of the system into the program. This 32-bit integer has been built up of all these complicated bit-level operations that the lower-level parts of your system do to take this out of your input file and construct those integers for you. So if you look at them, they’re pages long.”

Trigger warning

When the program reaches a point at which an integer is involved in a potentially dangerous operation — like a memory allocation — DIODE records the current state of the symbolic expression. The initial test input won’t trigger an overflow, but DIODE can analyze the symbolic expression to calculate an input that will.

The process still isn’t over, however: Well-written programs frequently include input checks specifically designed to prevent problems like integer overflows, and the new input, unlike the initial input, might fail those checks. So DIODE seeds the program with its new input, and if it fails such a check, it imposes a new constraint on the symbolic expression and computes a new overflow-triggering input. This process continues until the system either finds an input that can pass the checks but still trigger an overflow, or it concludes that triggering an overflow is impossible.

If DIODE does find a trigger value, it reports it, providing developers with a valuable debugging tool. Indeed, since DIODE doesn’t require access to a program’s source code but works on its “binary” — the executable version of the program — a program’s users could run it and then send developers the trigger inputs as graphic evidence that they may have missed security vulnerabilities.

Source: News Office

Teaching programming to preschoolers: MIT Research

System that lets children program a robot using stickers embodies new theories about programming languages.

By Larry Hardesty


Researchers at the MIT Media Laboratory are developing a system that enables young children to program interactive robots by affixing stickers to laminated sheets of paper.

Not only could the system introduce children to programming principles, but it could also serve as a research tool, to help determine which computational concepts children can grasp at what ages, and how interactive robots can best be integrated into educational curricula.

Last week, at the Association for Computing Machinery and Institute of Electrical and Electronics Engineers’ International Conference on Human-Robot Interaction, the researchers presented the results of an initial study of the system, which investigated its use by children ages 4 to 8.

“We did not want to put this in the digital world but rather in the tangible world,” says Michal Gordon, a postdoc in media arts and sciences and lead author on the new paper. “It’s a sandbox for exploring computational concepts, but it’s a sandbox that comes to the children’s world.”

In their study, the MIT researchers used an interactive robot called Dragonbot, developed by the Personal Robots Group at the Media Lab, which is led by associate professor of media arts and sciences Cynthia Breazeal. Dragonbot has audio and visual sensors, a speech synthesizer, a range of expressive gestures, and a video screen for a face that can assume a variety of expressions. The programs that children created dictated how Dragonbot would react to stimuli.

“It’s programming in the context of relational interactions with the robot,” says Edith Ackermann, a developmental psychologist and visiting professor in the Personal Robots Group, who with Gordon and Breazeal is a co-author on the new paper. “This is what children do — they’re learning about social relations. So taking this expression of computational principles to the social world is very appropriate.”

Lessons that stick

The root components of the programming system are triangular and circular stickers — which represent stimuli and responses, respectively — and arrow stickers, which represent relationships between them. Children can first create computational “templates” by affixing triangles, circles, and arrows to sheets of laminated paper. They then fill in the details with stickers that represent particular stimuli — like thumbs up or down — and responses — like the narrowing or widening of Dragonbot’s eyes. There are also blank stickers on which older children can write their own verbal cues and responses.

Researchers in the Personal Robotics Group are developing a computer vision system that will enable children to convey new programs to Dragonbot simply by holding pages of stickers up to its camera. But for the purposes of the new study, the system’s performance had to be perfectly reliable, so one of the researchers would manually enter the stimulus-and-response sequences devised by the children, using a tablet computer with a touch-screen interface that featured icons depicting all the available options.

To introduce a new subject to the system, the researchers would ask him or her to issue an individual command, by attaching a single response sticker to a small laminated sheet. When presented with the sheet, Dragonbot would execute the command. But when it’s presented with a program, it instead nods its head and says, “I’ve got it.” Thereafter, it will execute the specified chain of responses whenever it receives the corresponding stimulus.

Even the youngest subjects were able to distinguish between individual commands and programs, and interviews after their sessions suggested that they understood that programs, unlike commands, modified the internal state of the robot. The researchers plan additional studies to determine the extent of their understanding.

Paradigm shift

The sticker system is, in fact, designed to encourage a new way of thinking about programming, one that may be more consistent with how computation is done in the 21st century.

“The systems we’re programming today are not sequential, as they were 20 or 30 years back,” Gordon says. “A system has many inputs coming in, complex state, and many outputs.” A cellphone, for instance, might be monitoring incoming transmissions over both Wi-Fi and the cellular network while playing back a video, transmitting the audio over Bluetooth, and running a timer that’s set to go off when the rice on the stove has finished cooking.

As a graduate student in computer science at the Weizmann Institute of Science in Israel, Gordon explains, she worked with her advisor, David Harel, on a new programming paradigm called scenario-based programming. “The idea is to describe your code in little scenarios, and the engine in the back connects them,” she explains. “You could think of it as rules, with triggers and actions.” Gordon and her colleagues’ new system could be used to introduce children to the principles of conventional, sequential programming. But it’s well adapted to scenario-based programming.

“It’s actually how we think about how programs are written before we try to integrate it into a whole programming artifact,” she says. “So I was thinking, ‘Why not try it earlier?’”

Source : MIT News Office


Software that knows the risks

Planning algorithms evaluate probability of success, suggest low-risk alternatives.

By Larry Hardesty


CAMBRIDGE, Mass. – Imagine that you could tell your phone that you want to drive from your house in Boston to a hotel in upstate New York, that you want to stop for lunch at an Applebee’s at about 12:30, and that you don’t want the trip to take more than four hours. Then imagine that your phone tells you that you have only a 66 percent chance of meeting those criteria — but that if you can wait until 1:00 for lunch, or if you’re willing to eat at TGI Friday’s instead, it can get that probability up to 99 percent.

That kind of application is the goal of Brian Williams’ group at MIT’s Computer Science and Artificial Intelligence Laboratory — although the same underlying framework has led to software that both NASA and the Woods Hole Oceanographic Institution have used to plan missions.

At the annual meeting of the Association for the Advancement of Artificial Intelligence (AAAI) this month, researchers in Williams’ group will present algorithms that represent significant steps toward what Williams describes as “a better Siri” — the user-assistance application found in Apple products. But they would be just as useful for any planning task — say, scheduling flights or bus routes.

Together with Williams, Peng Yu and Cheng Fang, who are graduate students in MIT’s Department of Aeronautics and Astronautics, have developed software that allows a planner to specify constraints — say, buses along a certain route should reach their destination at 10-minute intervals — and reliability thresholds, such as that the buses should be on time at least 90 percent of the time. Then, on the basis of probabilistic models — which reveal data such as that travel time along this mile of road fluctuates between two and 10 minutes — the system determines whether a solution exists: For example, perhaps the buses’ departures should be staggered by six minutes at some times of day, 12 minutes at others.

If, however, a solution doesn’t exist, the software doesn’t give up. Instead, it suggests ways in which the planner might relax the problem constraints: Could the buses reach their destinations at 12-minute intervals? If the planner rejects the proposed amendment, the software offers an alternative: Could you add a bus to the route?

Short tails

One aspect of the software that distinguishes it from previous planning systems is that it assesses risk. “It’s always hard working directly with probabilities, because they always add complexity to your computations,” Fang says. “So we added this idea of risk allocation. We say, ‘What’s your budget of risk for this entire mission? Let’s divide that up and use it as a resource.’”

The time it takes to traverse any mile of a bus route, for instance, can be represented by a probability distribution — a bell curve, plotting time against probability. Keeping track of all those probabilities and compounding them for every mile of the route would yield a huge computation. But if the system knows in advance that the planner can tolerate a certain amount of failure, it can, in effect, assign that failure to the lowest-probability outcomes in the distributions, lopping off their tails. That makes them much easier to deal with mathematically.

At AAAI, Williams and another of his students, Andrew Wang, have a paper describing how to evaluate those assignments efficiently, in order to find quick solutions to soluble planning problems. But the paper with Yu and Fang — which appears at the same conference session — concentrates on identifying those constraints that prevent a problem’s solution.

There’s the rub

Both procedures are rooted in graph theory. In this context, a graph is a data representation that consists of nodes, usually depicted as circles, and edges, usually depicted as line segments connecting the nodes. Any scheduling problem can be represented as a graph. Nodes represent events, and the edges indicate the sequence in which events must occur. Each edge also has an associated weight, indicating the cost of progressing from one event to the next — the time it takes a bus to travel between stops, for instance.

Yu, Williams, and Fang’s algorithm first represents a problem as a graph, then begins adding edges that represent the constraints imposed by the planner. If the problem is soluble, the weights of the edges representing constraints will everywhere be greater than the weights representing the costs of transitions between events. Existing algorithms, however, can quickly home in on loops in the graph where the weights are imbalanced. The MIT researchers’ system then calculates the lowest-cost way of rebalancing the loop, which it presents to the planner as a modification of the problem’s initial constraints.

Source: MIT News Office

More-flexible digital communication

New theory could yield more-reliable communication protocols.

By Larry Hardesty


Communication protocols for digital devices are very efficient but also very brittle: They require information to be specified in a precise order with a precise number of bits. If sender and receiver — say, a computer and a printer — are off by even a single bit relative to each other, communication between them breaks down entirely.

Humans are much more flexible. Two strangers may come to a conversation with wildly differing vocabularies and frames of reference, but they will quickly assess the extent of their mutual understanding and tailor their speech accordingly.

Madhu Sudan, an adjunct professor of electrical engineering and computer science at MIT and a principal researcher at Microsoft Research New England, wants to bring that type of flexibility to computer communication. In a series of recent papers, he and his colleagues have begun to describe theoretical limits on the degree of imprecision that communicating computers can tolerate, with very real implications for the design of communication protocols.

“Our goal is not to understand how human communication works,” Sudan says. “Most of the work is really in trying to abstract, ‘What is the kind of problem that human communication tends to solve nicely, [and] designed communication doesn’t?’ — and let’s now see if we can come up with designed communication schemes that do the same thing.”

One thing that humans do well is gauging the minimum amount of information they need to convey in order to get a point across. Depending on the circumstances, for instance, one co-worker might ask another, “Who was that guy?”; “Who was that guy in your office?”; “Who was that guy in your office this morning?”; or “Who was that guy in your office this morning with the red tie and glasses?”

Similarly, the first topic Sudan and his colleagues began investigating is compression, or the minimum number of bits that one device would need to send another in order to convey all the information in a data file.

Uneven odds

In a paper presented in 2011, at the ACM Symposium on Innovations in Computer Science (now known as Innovations in Theoretical Computer Science, or ITCS), Sudan and colleagues at Harvard University, Microsoft, and the University of Pennsylvania considered a hypothetical case in which the devices shared an almost infinite codebook that assigned a random string of symbols — a kind of serial number — to every possible message that either might send.

Of course, such a codebook is entirely implausible, but it allowed the researchers to get a statistical handle on the problem of compression. Indeed, it’s an extension of one of theconcepts that longtime MIT professor Claude Shannon used to determine the maximum capacity of a communication channel in the seminal 1948 paper that created the field of information theory.

In Sudan and his colleagues’ codebook, a vast number of messages might have associated strings that begin with the same symbol. But fewer messages will have strings that share their first two symbols, fewer still strings that share their first three symbols, and so on. In any given instance of communication, the question is how many symbols of the string one device needs to send the other in order to pick out a single associated message.

The answer to that question depends on the probability that any given interpretation of a string of symbols makes sense in context. By way of analogy, if your co-worker has had only one visitor all day, asking her, “Who was that guy in your office?” probably suffices. If she’s had a string of visitors, you may need to specify time of day and tie color.

Existing compression schemes do, in fact, exploit statistical regularities in data. But Sudan and his colleagues considered the case in which sender and receiver assign different probabilities to different interpretations. They were able to show that, so long as protocol designers can make reasonable assumptions about the ranges within which the probabilities might fall, good compression is still possible.

For instance, Sudan says, consider a telescope in deep-space orbit. The telescope’s designers might assume that 90 percent of what it sees will be blackness, and they can use that assumption to compress the image data it sends back to Earth. With existing protocols, anyone attempting to interpret the telescope’s transmissions would need to know the precise figure — 90 percent — that the compression scheme uses. But Sudan and his colleagues showed that the protocol could be designed to accommodate a range of assumptions — from, say, 85 percent to 95 percent — that might be just as reasonable as 90 percent.

Buggy codebook

In a paper being presented at the next ITCS, in January, Sudan and colleagues at Columbia University, Carnegie Mellon University, and Microsoft add even more uncertainty to their compression model. In the new paper, not only do sender and receiver have somewhat different probability estimates, but they also have slightly different codebooks. Again, the researchers were able to devise a protocol that would still provide good compression.

They also generalized their model to new contexts. For instance, Sudan says, in the era of cloud computing, data is constantly being duplicated on servers scattered across the Internet, and data-management systems need to ensure that the copies are kept up to date. One way to do that efficiently is by performing “checksums,” or adding up a bunch of bits at corresponding locations in the original and the copy and making sure the results match.

That method, however, works only if the servers know in advance which bits to add up — and if they store the files in such a way that data locations correspond perfectly. Sudan and his colleagues’ protocol could provide a way for servers using different file-management schemes to generate consistency checks on the fly.

“I shouldn’t tell you if the number of 1’s that I see in this subset is odd or even,” Sudan says. “I should send you some coarse information saying 90 percent of the bits in this set are 1’s. And you say, ‘Well, I see 89 percent,’ but that’s close to 90 percent — that’s actually a good protocol. We prove this.”

“This sequence of works puts forward a general theory of goal-oriented communication, where the focus is not on the raw data being communicated but rather on its meaning,” says Oded Goldreich, a professor of computer science at the Weizmann Institute of Science in Israel. “I consider this sequence a work of fundamental nature.”

“Following a dominant approach in 20th-century philosophy, the work associates the meaning of communication with the goal achieved by it and provides a mathematical framework for discussing all these natural notions,” he adds. “This framework is based on a general definition of the notion of a goal and leads to a problem that is complementary to the problem of reliable communication considered by Shannon, which established information theory.”

 

Source: MIT News Office

Recommendation theory

Model for evaluating product-recommendation algorithms suggests that trial and error get it right.

By Larry Hardesty

Devavrat Shah’s group at MIT’s Laboratory for Information and Decision Systems (LIDS) specializes in analyzing how social networks process information. In 2012, the group demonstrated algorithms that could predict what topics would trend on Twitter up to five hours in advance; this year, they used the same framework to predict fluctuations in the prices of the online currency known as Bitcoin.

Next month, at the Conference on Neural Information Processing Systems, they’ll present a paper that applies their model to the recommendation engines that are familiar from websites like Amazon and Netflix — with surprising results.

“Our interest was, we have a nice model for understanding data-processing from social data,” says Shah, the Jamieson Associate Professor of Electrical Engineering and Computer Science. “It makes sense in terms of how people make decisions, exhibit preferences, or take actions. So let’s go and exploit it and design a better, simple, basic recommendation algorithm, and it will be something very different. But it turns out that under that model, the standard recommendation algorithm is the right thing to do.”

The standard algorithm is known as “collaborative filtering.” To get a sense of how it works, imagine a movie-streaming service that lets users rate movies they’ve seen. To generate recommendations specific to you, the algorithm would first assign the other users similarity scores based on the degree to which their ratings overlap with yours. Then, to predict your response to a particular movie, it would aggregate the ratings the movie received from other users, weighted according to similarity scores.

To simplify their analysis, Shah and his collaborators — Guy Bresler, a postdoc in LIDS, and George Chen, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS) who is co-advised by Shah and EECS associate professor Polina Golland — assumed that the ratings system had two values, thumbs-up or thumbs-down. The taste of every user could thus be described, with perfect accuracy, by a string of ones and zeroes, where the position in the string corresponds to a particular movie and the number at that location indicates the rating.

Birds of a feather

The MIT researchers’ model assumes that large groups of such strings can be clustered together, and that those clusters can be described probabilistically. Rather than ones and zeroes at each location in the string, a probabilistic cluster model would feature probabilities: an 80 percent chance that the members of the cluster will like movie “A,” a 20 percent chance that they’ll like movie “B,” and so on.

The question is how many such clusters are required to characterize a population. If half the people who like “Die Hard” also like “Shakespeare in Love,” but the other half hate it, then ideally, you’d like to split “Die Hard” fans into two clusters. Otherwise, you’d lose correlations between their preferences that could be predictively useful. On the other hand, the more clusters you have, the more ratings you need to determine which of them a given user belongs to. Reliable prediction from limited data becomes impossible.

In their new paper, the MIT researchers show that so long as the number of clusters required to describe the variation in a population is low, collaborative filtering yields nearly optimal predictions. But in practice, how low is that number?

To answer that question, the researchers examined data on 10 million users of a movie-streaming site and identified 200 who had rated the same 500 movies. They found that, in fact, just five clusters — five probabilistic models — were enough to account for most of the variation in the population.

Missing links

While the researchers’ model corroborates the effectiveness of collaborative filtering, it also suggests ways to improve it. In general, the more information a collaborative-filtering algorithm has about users’ preferences, the more accurate its predictions will be. But not all additional information is created equal. If a user likes “The Godfather,” the information that he also likes “The Godfather: Part II” will probably have less predictive power than the information that he also likes “The Notebook.”

Using their analytic framework, the LIDS researchers show how to select a small number of products that carry a disproportionate amount of information about users’ tastes. If the service provider recommended those products to all its customers, then, based on the resulting ratings, it could much more efficiently sort them into probability clusters, which should improve the quality of its recommendations.

Sujay Sanghavi, an associate professor of electrical and computer engineering at the University of Texas at Austin, considers this the most interesting aspect of the research. “If you do some kind of collaborative filtering, two things are happening,” he says. “I’m getting value from it as a user, but other people are getting value, too. Potentially, there is a trade-off between these things. If there’s a popular movie, you can easily show that I’ll like it, but it won’t improve the recommendations for other people.”

That trade-off, Sanghavi says, “has been looked at in an empirical context, but there’s been nothing that’s principled. To me, what is appealing about this paper is that they have a principled look at this issue, which no other work has done. They’ve found a new kind of problem. They are looking at a new issue.”

Source : MIT News


Visual control of big data

Data-visualization tool identifies sources of aberrant results and recomputes visualizations without them.

By Larry Hardesty


 

CAMBRIDGE, Mass. – In the age of big data, visualization tools are vital. With a single glance at a graphic display, a human being can recognize patterns that a computer might fail to find even after hours of analysis.

But what if there are aberrations in the patterns? Or what if there’s just a suggestion of a visual pattern that’s not distinct enough to justify any strong inferences? Or what if the pattern is clear, but not what was to be expected?

The Database Group at MIT’s Computer Science and Artificial Intelligence Laboratory has released a data-visualization tool that lets users highlight aberrations and possible patterns in the graphical display; the tool then automatically determines which data sources are responsible for which.

It could be, for instance, that just a couple of faulty sensors among dozens are corrupting a very regular pattern of readings, or that a few underperforming agents are dragging down a company’s sales figures, or that a clogged vent in a hospital is dramatically increasing a few patients’ risk of infection.

Big data is big business

Visualizing big data is big business: Tableau Software, which sells a suite of visualization tools, is a $4 billion company. But in creating attractive, informative graphics, most visualization software discards a good deal of useful data.

“If you look at the way people traditionally produce visualizations of any sort, they would have some big, rich data set — that has maybe hundreds of millions of data points, or records — and they would do some reduction of the set to a few hundred or thousands of records at most,” says Samuel Madden, a professor of computer science and engineering and one of the Database Group’s leaders. “The problem with doing that sort of reduction is that you lose information about where those output data points came from relative to the input data set. If one of these data points is crazy — is an outlier, for example — you don’t have any real ability to go back to the data set and ask, ‘Where did this come from and what were its properties?’”

That’s one of the problems solved by the new visualization tool, dubbed DBWipes. For his thesis work, Eugene Wu, a graduate student in electrical engineering and computer science who developed DBWipes with Madden and adjunct professor Michael Stonebraker, designed a novel “provenance tracking” system for large data sets.

If a visualization system summarizes 100 million data entries into 100 points to render on the screen, then each of the 100 points will in some way summarize — perhaps by averaging — 1 million data points. Wu’s provenance-tracking system provides a compact representation of the source of the summarized data so that users can easily trace visualized data back to the source — and conversely, track source data to the pixels that are rendered by it.

The idea of provenance tracking is not new, but Wu’s system is particularly well suited to the task of tracking down outliers in data visualizations. Rather than simply telling the user the million data entries that were used to compute the outliers, it first identifies those that most influenced the outlier values, and summarizes those data entries in human readable terms.

Best paper

Wu and Madden’s work on their “Scorpion” algorithm was selected as one of the best papers of the Very Large Database conference last year. The algorithm tracks down the records responsible for particular aspects of a DBWipes visualization and then efficiently recalculates the visualization to either exclude or emphasize the data they contain.

If some of the points in the visualization suggest a regular pattern, the user can highlight them and mark them as “normal data”; if some of the points disrupt that pattern, the user can highlight them and mark them as “outlier data”; and if the pattern is surprising, the user can draw the anticipated pattern on-screen.

Scorpion then tracks down the provenance of the highlighted points, and filters the provenance down to the subset that most influenced the outliers. Their paper introduces several properties about the specific computation that can be used to develop more efficient algorithms for finding these subsets.

Scorpion, Madden says, was partly motivated by a study conducted by a researcher at a Boston hospital, who noticed that a subset of patients in one of the hospital’s wards was incurring much higher treatment costs than the rest. Any number of factors could have been responsible: the patients’ age and fitness, the severity of their conditions, their particular constellations of symptoms, their health plans, or perhaps something as banal as their proximity to the hospital — nothing could be ruled out.

After six months of work, the researcher concluded that most of the variance in patients’ treatment costs could be explained by a single variable: their doctors. It turned out that three doctors on the hospital staff, in an effort to leave no stone unturned, simply prescribed more interventions than their peers.

As an experiment, Wu and Madden turned Scorpion loose on the researcher’s data. Within five minutes, it had concluded that the data point most strongly correlated with the increase in patients’ treatment costs was the names of their doctors. Because it was combing through a massive data set and, like all big-data search algorithms, had to sacrifice some precision for efficiency, it couldn’t pinpoint just the three doctors identified by the six-month study. But it did produce a list of 10 doctors most likely to be responsible for cost variance, and those three were among them. “You would at least know where to begin looking,” Madden says.

Source:  MIT News Office