Tag Archives: technology

Visual control of big data

Data-visualization tool identifies sources of aberrant results and recomputes visualizations without them.

By Larry Hardesty


 

CAMBRIDGE, Mass. – In the age of big data, visualization tools are vital. With a single glance at a graphic display, a human being can recognize patterns that a computer might fail to find even after hours of analysis.

But what if there are aberrations in the patterns? Or what if there’s just a suggestion of a visual pattern that’s not distinct enough to justify any strong inferences? Or what if the pattern is clear, but not what was to be expected?

The Database Group at MIT’s Computer Science and Artificial Intelligence Laboratory has released a data-visualization tool that lets users highlight aberrations and possible patterns in the graphical display; the tool then automatically determines which data sources are responsible for which.

It could be, for instance, that just a couple of faulty sensors among dozens are corrupting a very regular pattern of readings, or that a few underperforming agents are dragging down a company’s sales figures, or that a clogged vent in a hospital is dramatically increasing a few patients’ risk of infection.

Big data is big business

Visualizing big data is big business: Tableau Software, which sells a suite of visualization tools, is a $4 billion company. But in creating attractive, informative graphics, most visualization software discards a good deal of useful data.

“If you look at the way people traditionally produce visualizations of any sort, they would have some big, rich data set — that has maybe hundreds of millions of data points, or records — and they would do some reduction of the set to a few hundred or thousands of records at most,” says Samuel Madden, a professor of computer science and engineering and one of the Database Group’s leaders. “The problem with doing that sort of reduction is that you lose information about where those output data points came from relative to the input data set. If one of these data points is crazy — is an outlier, for example — you don’t have any real ability to go back to the data set and ask, ‘Where did this come from and what were its properties?’”

That’s one of the problems solved by the new visualization tool, dubbed DBWipes. For his thesis work, Eugene Wu, a graduate student in electrical engineering and computer science who developed DBWipes with Madden and adjunct professor Michael Stonebraker, designed a novel “provenance tracking” system for large data sets.

If a visualization system summarizes 100 million data entries into 100 points to render on the screen, then each of the 100 points will in some way summarize — perhaps by averaging — 1 million data points. Wu’s provenance-tracking system provides a compact representation of the source of the summarized data so that users can easily trace visualized data back to the source — and conversely, track source data to the pixels that are rendered by it.

The idea of provenance tracking is not new, but Wu’s system is particularly well suited to the task of tracking down outliers in data visualizations. Rather than simply telling the user the million data entries that were used to compute the outliers, it first identifies those that most influenced the outlier values, and summarizes those data entries in human readable terms.

Best paper

Wu and Madden’s work on their “Scorpion” algorithm was selected as one of the best papers of the Very Large Database conference last year. The algorithm tracks down the records responsible for particular aspects of a DBWipes visualization and then efficiently recalculates the visualization to either exclude or emphasize the data they contain.

If some of the points in the visualization suggest a regular pattern, the user can highlight them and mark them as “normal data”; if some of the points disrupt that pattern, the user can highlight them and mark them as “outlier data”; and if the pattern is surprising, the user can draw the anticipated pattern on-screen.

Scorpion then tracks down the provenance of the highlighted points, and filters the provenance down to the subset that most influenced the outliers. Their paper introduces several properties about the specific computation that can be used to develop more efficient algorithms for finding these subsets.

Scorpion, Madden says, was partly motivated by a study conducted by a researcher at a Boston hospital, who noticed that a subset of patients in one of the hospital’s wards was incurring much higher treatment costs than the rest. Any number of factors could have been responsible: the patients’ age and fitness, the severity of their conditions, their particular constellations of symptoms, their health plans, or perhaps something as banal as their proximity to the hospital — nothing could be ruled out.

After six months of work, the researcher concluded that most of the variance in patients’ treatment costs could be explained by a single variable: their doctors. It turned out that three doctors on the hospital staff, in an effort to leave no stone unturned, simply prescribed more interventions than their peers.

As an experiment, Wu and Madden turned Scorpion loose on the researcher’s data. Within five minutes, it had concluded that the data point most strongly correlated with the increase in patients’ treatment costs was the names of their doctors. Because it was combing through a massive data set and, like all big-data search algorithms, had to sacrifice some precision for efficiency, it couldn’t pinpoint just the three doctors identified by the six-month study. But it did produce a list of 10 doctors most likely to be responsible for cost variance, and those three were among them. “You would at least know where to begin looking,” Madden says.

Source:  MIT News Office

The art of translating science into business

“There are many things which can go wrong when starting a company; but the worst thing that can go wrong is to not do it,” said Prof. Karl Leo, Director of KAUST’s Solar & Photovoltaics Engineering Research Center, when speaking at an Entrepreneurship Center speaker series event this past spring. Wearing the dual hats of scientist and entrepreneur, Prof. Leo is the author of 440 publications, holds more than 50 patents, and has co-created 8 companies which have generated over 300 jobs.

A physicist by training, Prof. Leo highlighted the point that he is primarily a scientist who stumbled onto business by chance. “For me it’s always started with and been about the science,” he says. All his spin-off companies came about as a result of basic research he and his group conducted on organic semiconductors. Speaking specifically to the young KAUST researchers hoping to emulate his success as academics and entrepreneurs, Prof. Leo said: “The message I want to pass along is if you really want to do things, just be curious. Don’t say I want to do research to make a company. Do very basic research and the spin-off ideas will come along.”

The Growing Influence of Organic Semiconductors

Prof. Karl Leo started doing research on organic semiconductors about 20 years ago. He has since been passionate about this field’s developments and future potential. Despite his early skepticism resulting from the ephemeral lifetime of organic semiconductors in the ’90s, the performance levels of LED devices for instance have gone from just a few minutes of useful life then to virtually not aging today. “In the long-term, as in 20 to 30 years from now, almost everything will be organics,” he believes. “Silicon has dominated electronics for a long time but organic is something new.” Organic products have evolved into a variety of applications such as: small OLED displays, OLED televisions, OLED lighting, OPV and organic electronics.

Organics, as opposed to traditional silicon-based semiconductors, are by nature essentially lousy semiconductors. Mobility, or the speed at which electrons move on these materials, is a really important property. However, when looking at the electronic properties of semiconductors, carbon offers interesting developments for the performance of organics. For instance, graphene, which is a carbon-based organic material, has even higher mobility than silicon.

One of the companies Prof. Karl Leo co-founded and began operating out of Dresden, Germany in 2003, Novaled, became a leader in in organic light-emitting diode (OLED) field. OLEDs are made up of multiple thin layers of organic materials, known as OLED stacks. They essentially emit light when electricity is applied to them. Novaled became a pioneer in developing highly efficient and long-lifetime OLED structures; and it currently holds the world record in power efficiency. They key to Novaled’s success, as Prof. Leo explains, is “the simple discovery that you can dope organics.” This was a major breakthrough achieved simply adding a very little amount of another molecule.

This organic conductivity doping technology, used to enhance the performance of OLED devices, was the main factor leading to the company being purchased by Samsung in 2013.

Organic Photovoltaics: Technology of the Future

Following the successful commercial penetration of OLED displays in the consumer electronics market, Prof. Karl Leo has since turned his focus on organic photovoltaics. “I think organic PV is something that can change the world,” said Leo. Among the many advantages of organic photovoltaics are that they are thin organic layers which can be applied on flexible plastic substrates. They consume little energy, can be made transparent, and are compatible with low-cost large-area production technologies. Because they are transparent, they can be made into windows for instance, and also be manufactured in virtually any color. All these characteristics make organic PV ideal for consumer products.

Again based on basic research conducted by his group, Prof. Leo also started a company,Heliatek, which is now a world-leader in the production of organic solar film. Heliatek has developed the current world record in the efficiency of transparent solar cells. The company also holds the record for efficiency of opaque cells at 12 percent. Leo believes that it’s possible to achieve up to 20 percent efficiency in the near future, which will be necessary to compete with silicon and become commercially viable.

Don’t Believe Business Plans

Prof. Leo explained that the experience he and his team gained from launching a successful company like Novaled helped them to both define the objectives and obtain funding from investors for his solar cell company, Heliatek. “Once you create a successful company, things get much easier,” he said. But Leo also cautioned the budding entrepreneurs in the audience to be willing to adapt as they present and implement their ideas.

“If you have a good idea and you are convinced you have a good idea, never give up,” he said. But being able to adapt to market needs is also crucial. For instance, Leo’s original business plan for Novaled focused on manufacturing displays. But the realities of the market, and the prohibitive cost of manufacturing displays, convinced his team that the smarter way to go was to supply materials. At the end of the day, what really succeeded in getting a venture capital firm’s attention, after haven been told no 49 times, was his team’s ability to demonstrate the value of the technology.

“Business plans are useful but they must not be overestimated,” said Prof. Leo. Business plans are a good indicator of how entrepreneurs are able to structure their thoughts, identify markets and create a roadmap, but “nobody is able to predict the future in a business plan; it’s not possible.”

Source: KUST