High-performance low-cost PC hardware and high-speed LAN/WAN technologies make distributed database (DDB) systems an attractive research area where query optimization and DDB design are the two important and related problems. Since dynamic programming is not feasible for optimizing queries in a DDB, we propose a new genetic algorithm (GA)-based query optimizer (new genetic algorithm (NGA)) and compare its performance with random and optimal (exhaustive) algorithms. We perform experiments on a synthetic database with replicated relations, but no horizontal or vertical fragmentation. Network links are assumed to be gigabit ethernet. Comparisons with optimal results show that our NGA formulation performs only 20% of the optimal results and we have achieved 50% improvement over a previous GA-based algorithm.
January 18, 2010
Prospects for Bandit Solutions in Sensor Management
Sensor management in information-rich and dynamic environments can be posed as a sequential action selection problem with side information. To study such problems we employ the dynamic multi-armed bandit with covariates framework. In this generalization of the multi-armed bandit, the expected rewards are time-varying linear functions of the covariate vector. The learning goal is to associate the covariate with the optimal action at each instance, essentially learning to partition the covariate space adaptively. Applications of sensor management are frequently in environments in which the precise nature of the dynamics is unknown. In such settings, the sensor manager tracks the evolving environment by observing only the covariates and the consequences of the selected actions. This creates difficulties not encountered in static problems, and changes the exploitation–exploration dilemma. We study the relationship between the different factors of the problem and provide interesting insights. The impact of the environment dynamics on the action selection problem is influenced by the covariate dimensionality. We present the surprising result that strategies that perform very little or no exploration perform surprisingly well in dynamic environments.
Internet Failures: an Emergent Sea of Complex Systems and Critical Design Errors?
Complex systems researchers have looked to the Internet as a possible source of interesting emergent behaviour. Indeed, some high-profile failures, and some low-level phenomena, might easily be construed as evidence of a complex system. In this paper, I look at the local and global consequences of the Internet design, and show that few, if any, of these problems are actually consequences of emergent properties in the pure technical sense. However, there are lessons for network architecture from these problems. The influence of local decisions on global behaviour of the network is a source of some of the difficulties that protocol designers must cope with, but it is also a source of great wealth and innovation, and as such should be regarded in a positive light.
Generic Text Summarization for Turkish
In this paper, we propose a generic text summarization method that generates summaries of Turkish texts by ranking sentences according to their scores. Sentence scores are calculated using their surface-level features, and summaries are created by extracting the highest ranked sentences from the original documents. To extract sentences which form a summary with an extensive coverage of the main content of the text and less redundancy, we use features such as term frequency, key phrase (KP), centrality, title similarity and sentence position. The sentence rank is computed using a score function that uses its feature values and the weights of the features. The best feature weights are learned using machine-learning techniques with the help of human-constructed summaries. Performance evaluation is conducted by comparing summarization outputs with manual summaries of two newly created Turkish data sets. This paper presents one of the first Turkish summarization systems, and its results are promising. We introduce the usage of KP as a surface-level feature in text summarization, and we show the effectiveness of the centrality feature in text summarization. The effectiveness of the features in Turkish text summarization is also analyzed in detail.
NVR-BIP: Nuclear Vector Replacement using Binary Integer Programming for NMR Structure-Based Assignments
Nuclear magnetic resonance (NMR) spectroscopy is an important experimental technique that allows one to study protein structure and dynamics in solution. An important bottleneck in NMR protein structure determination is the assignment of NMR peaks to the corresponding nuclei. Structure-based assignment (SBA) aims to solve this problem with the help of a template protein which is homologous to the target and has applications in the study of structure–activity relationship, protein–protein and protein–ligand interactions. We formulate SBA as a linear assignment problem with additional nuclear overhauser effect constraints, which can be solved within nuclear vector replacement’s (NVR) framework (Langmead, C., Yan, A., Lilien, R., Wang, L. and Donald, B. (2003) A Polynomial-Time Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments. Proc. the 7th Annual Int. Conf. Research in Computational Molecular Biology (RECOMB), Berlin, Germany, April 10–13, pp. 176–187. ACM Press, New York, NY. J. Comp. Bio., (2004), 11, pp. 277–298; Langmead, C. and Donald, B. (2004) An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments. J. Biomol. NMR, 29, 111–138). Our approach uses NVR’s scoring function and data types and also gives the option of using CH and NH residual dipolar coupling (RDCs), instead of NH RDCs which NVR requires. We test our technique on NVR’s data set as well as on four new proteins. Our results are comparable to NVR’s assignment accuracy on NVR’s test set, but higher on novel proteins. Our approach allows partial assignments. It is also complete and can return the optimum as well as near-optimum assignments. Furthermore, it allows us to analyze the information content of each data type and is easily extendable to accept new forms of input data, such as additional RDCs.
Implementing a Thermal-Aware Scheduler in Linux Kernel on a Multi-Core Processor
As power dissipation causes thermal issues in cooling costs, lifetime and reliability, thermal management has become an important issue in today’s OS and processor design. Early OS-level thermal management schemes were proposed and evaluated mainly with simulators or analytical models. In this paper, we implement a thermal-aware round-robin scheduling algorithm in the Linux kernel, and compare its performance with the ‘Heat-and-Run’ algorithm and the default Linux baseline scheduler on an Intel Core 2 Duo processor using representative benchmarks from SPEC2000, MiBench and NetBench. Our results indicate that the current Linux scheduler can easily be enhanced with thermal-awareness to show improved performance in terms of both the on-chip temperature condition and application throughput.
Networks of Concepts and Ideas
We present the results of an experiment designed to investigate the way information is organized and stored in the human brain. In particular, we are using controlled stimuli to reverse engineer the networks of ideas and concepts in order to answer the following questions. (1) Are the networks of ideas and concepts in the human brain invoked by verbal and visual stimuli distinct from each other? The answer appears to be no for the network of ideas and inconclusive for the network of concepts. (2) What is the topology of these networks? Our experimental results show that both are small-world networks, with the network of ideas being random and the network of concepts scale-free.
A Necessary and Sufficient Condition for the Liveness of Normal Nets
This paper gives a necessary and sufficient condition for the liveness of normal nets, i.e. a normal net with a given initial marking is live if and only if it is structurally repetitive and each minimal siphon is marked in any reachable marking. Furthermore, it is proved that a normal net is structurally live if and only if it is structurally repetitive. Finally, we prove that a weakly persistent net, which is a special normal net, is live for a given initial marking if and only if it is structurally repetitive and each minimal siphon is marked in the initial marking. That is to say, the liveness of weakly persistent nets can be decided by the net structure and the initial marking only.
An Analysis of Associated Dividends in the DBM Algorithm for Division by Constants Using Multiplication
When a compiler encounters a fixed integer divisor, which is not a power of 2, it can calculate its inverse to be multiplied by the run-time integer dividends to obtain the quotients, using our very efficient, recently published [Cavagnino, D. and Werbrouck, A.E. (2008) Efficient algorithms for integer division by constants using multiplication. Comp. J., 51, 470–480] division by multiplication algorithms. Essentially our algorithms permit a complete partition of a defined number space into non-adverse and adverse divisors on the basis of whether a dividend associated with each divisor is, or is not, greater than the maximum dividend size. In this paper, we demonstrate useful relations between the dividends associated with all divisors and with their multiples by positive powers of 2 leading to rapid iterative algorithms for calculating full sets of associated dividends.