binary search tree research paper

binary search tree Recently Published Documents

Total documents.

Latest Documents
Most Cited Documents
Contributed Authors
Related Sources
Related Keywords

Breadth-First Search Multi-Dimensional Binary Search Tree based Algorithms for Structural

This research proposes a set of novel algorithms for structural reliability estimation based on muti-dimensional binary search tree and breadth-first search, namely the reliability accuracy supervised searching algorithm, the limit-state surface resolution supervised searching algorithm and the reliability index precision supervised fast searching algorithm. The proposed algorithms have the following strengths: 1, all the proposed algorithms have satisfactory computational efficiency by reducing redundant samplings; 2, their computational costs are stable and computable; 3, performance functions of high non-linearity can be will handled; 4, the reliability accuracy supervised searching algorithm can adapt its computational cost according to a prescribed accuracy; 5, the limit-state surface resolution supervised searching algorithm is able to probe sharp changes on limit-state surfaces; 6, the reliability index precision supervised fast searching algorithm computes the reliability index with sufficient precision in a fast way.

Improving the Red-Black tree delete algorithm

Abstract Today, Red-Black trees are becoming a popular data structure typically used to implement dictionaries, associative arrays, symbol tables within some compilers (C++, Java …) and many other systems. In this paper, we present an improvement of the delete algorithm of this kind of binary search tree. The proposed algorithm is very promising since it colors differently the tree while reducing color changes by a factor of about 29%. Moreover, the maintenance operations re-establishing Red-Black tree balance properties are reduced by a factor of about 11%. As a consequence, the proposed algorithm saves about 4% on running time when insert and delete operations are used together while conserving search performance of the standard algorithm.

A novel RSW&TST framework of MCPs detection for abnormal pattern recognition on large-scale time series and pathological signals in epilepsy

To quickly and efficiently recognize abnormal patterns from large-scale time series and pathological signals in epilepsy, this paper presents here a preliminary RSW&TST framework for Multiple Change-Points (MCPs) detection based on the Random Slide Window (RSW) and Trigeminal Search Tree (TST) methods. To avoid the remaining local optima, the proposed framework applies a random strategy for selecting the size of each slide window from a predefined collection, in terms of data feature and experimental knowledge. For each data segment to be diagnosed in a current slide window, an optimal path towards a potential change point is detected by TST methods from the top root to leaf nodes with O(log3(N)). Then, the resulting MCPs vector is assembled by means of TST-based single CP detection on data segments within each of the slide windows. In our experiments, the RSW&TST framework was tested by using large-scale synthetic time series, and then its performance was evaluated by comparing it with existing binary search tree (BST), Kolmogorov-Smirnov (KS)-statistics, and T-test under the fixed slide window (FSW) approach, as well as the integrated method of wild binary segmentation and CUSUM test (WBS&CUSUM). The simulation results indicate that our RSW&TST is both more efficient and effective, with a higher hit rate, shorter computing time, and lower missed, error and redundancy rates. When the proposed RSW&TST framework is executed for MCPs detection on pathological ECG (electrocardiogram)/EEG (electroencephalogram) recordings of people in epileptic states, the abnormal patterns are roughly recognized in terms of the number and position of the resultant MCPs. Furthermore, the severity of epilepsy is roughly analyzed based on the strength and period of signal fluctuations among multiple change points in the stage of a sudden epileptic attack. The purpose of our RSW&TST framework is to provide an encouraging platform for abnormal pattern recognition through MCPs detection on large-scale time series quickly and efficiently.

Allocation of phasor measuring unit using an admissible searching‐based algorithm A‐star and binary search tree for full interconnected power network observability

Guidance system based on dijkstra-ant colony algorithm with binary search tree for indoor parking system.

A common algorithm to solve the single-source shortest path (SSSP) is the Dijkstra algorithm. However, the traditional Dijkstra’s is not accurate and need more time to perform the path in order it should visit all the nodes in the graph. In this paper, the Dijkstra-ant colony algorithm (ACO) with binary search tree (BST) has been proposed. Dijkstra and ACO are integrated to produce the smart guidance algorithm for the indoor parking system. Dijkstra algorithm initials the paths to finding the shortest path while ACO optimizes the paths. BST has been used to store the paths that Dijkstra algorithm initialled. The proposed algorithm is aimed to control the shortest path as well as guide the driver towards the nearest vacant available space near the entrance. This solution depending on applying the optimization on an optimal path while the traditional ACO is optimizing the random path based on the greedy algorithm hence we get the most optimal path. Moreover, the reason behind using the BST is to make the generation of the path by Dijkstra’s algorithm more accurate and less time performance. The results show a range of 8.3% to 26.8% improvement in the proposed path compared to the traditional Dijkstra’s algorithm.

We introduce the zip tree , 1 a form of randomized binary search tree that integrates previous ideas into one practical, performant, and pleasant-to-implement package. A zip tree is a binary search tree in which each node has a numeric rank and the tree is (max)-heap-ordered with respect to ranks, with rank ties broken in favor of smaller keys. Zip trees are essentially treaps [8], except that ranks are drawn from a geometric distribution instead of a uniform distribution, and we allow rank ties. These changes enable us to use fewer random bits per node. We perform insertions and deletions by unmerging and merging paths ( unzipping and zipping ) rather than by doing rotations, which avoids some pointer changes and improves efficiency. The methods of zipping and unzipping take inspiration from previous top-down approaches to insertion and deletion by Stephenson [10], Martínez and Roura [5], and Sprugnoli [9]. From a theoretical standpoint, this work provides two main results. First, zip trees require only O (log log n ) bits (with high probability) to represent the largest rank in an n -node binary search tree; previous data structures require O (log n ) bits for the largest rank. Second, zip trees are naturally isomorphic to skip lists [7], and simplify Dean and Jones’ mapping between skip lists

Production optimization of a network of multiple wells with each well using a combination of Electrical Submersible Pump and Gas lift

AbstractArtificial lift methods such as ESP and GL are commonly used in oil wells around the world, especially in offshore wells. However, these two methods are normally used separately, and this paper therefore aimed to study the possible combination of ESP and GL by analyzing its effects on energy saving using equivalent depth method and on production rate as well as on ESP life cycle using nodal analysis. The paper also performed the production optimization for a network of wells using each well a combination of GL and ESP. The optimization process consists of selecting the appropriate operation frequency for the ESP system and the injection gas lift distributed to each well with the aim of maximizing the total production of the network. In addition, this optimization process was conducted in two cases: unlimited and limited volume of injection gas lift. In case the GL flow is limited, the BST (Binary Search Tree) algorithm was used to determine the suitable gas rates injected into each well to maximize the total network production. The optimization workflow proposed in this study was applied to the field X in Cuu Long basin of Vietnam and was calibrated from the real data of this field. The results demonstrated the advantage of the combination of ESP and GL in energy saving and in application for small diameter wells. In addition, the workflow and source code will allow engineers to replicate the results and to apply this method for future studies in order to determine optimum operating parameters of this hybrid artificial lift to achieve the highest production rate from a network of multiple wells.

Testing Performance (Time Analysis) of Nearest Neighbour (NN) Search Algorithms on K-d Trees

K-d tree (k-dimensional tree) is a space partitioning data structure for organizing points in a k-dimensional space. K-d tree, or Multidimensional Binary Search Tree is a useful data structure for several applications such as searches involving a multidimensional search key (e.g., Range Search and Nearest Neighbour Search). K-d trees are a special case of binary space partitioning trees.KNN Search is a searching algorithm with complexity O(N log N) {N= no. of data points}. This search algorithm is relatively better than brute force search {Complexity= O(n*k); where k=No. of neighbours searched, N=No. of Data Points in Kd tree} for dimensions N>>2D {N=No. of Points, D=Dimensionality of Tree}.Furthermore, Parallel KNN Search is much more efficient and performs better than KNN Search, as it harnesses parallel processing capabilities of computers and thus, results in better search time.This paper tests the time performance of KNN Search and Parallel KNN Search and compares them by plotting it on a 3D graph. A more comprehensive comparison is done by use of 2D graphs for each dimension(from 2 to 20).

Modeling of Wireless Traffic Load in Next Generation Wireless Networks

Software defined WiFi network (SD-WiFi) is a new paradigm that addresses issues such as mobility management, load management, route policies, link discovery, and access selection in traditional WiFi networks. Due to the rapid growth of wireless devices, uneven load distribution among the network resources still remains a challenging issue in SD-WiFi. In this paper, we design a novel four-tier software defined WiFi edge architecture (FT-SDWE) to manage load imbalance through an improved handover mechanism, enhanced authentication technique, and upgraded migration approach. In the first tier, the handover mechanism is improved by using a simple AND operator and by shifting the association control to WAPs. Unauthorized user load is mitigated in the second tier, with the help of base stations (BSs) which act as edge nodes (ENs), using elliptic ElGamal digital signature algorithm (EEDSA). In the third tier, the load is balanced in the data plane among the OpenFlow enabled switches by using the whale optimization algorithm (WOA). Moreover, the load in the fourth tier is balanced among the multiple controllers. The global controller (GC) predicts the load states of local controllers (LCs) from the Markov chain model (MCM) and allocates packets to LCs for processing through a binary search tree (BST). The performance evaluation of FT-SDWE is demonstrated using extensive OMNeT++ simulations. The proposed framework shows effectiveness in terms of bandwidth, jitter, response time, throughput, and migration time in comparison to SD-WiFi, EASM, GAME-SM, and load information strategy schemes.

Bloom Filter-Based Parallel Architecture for Accelerating Equi-Join Operation on FPGA

As one of the most important operations in relational databases, the join is data-intensive and time-consuming. Thus, offloading this operation using field-programmable gate arrays (FPGAs) has attracted much interest and has been broadly researched in recent years. However, the available SRAM-based join architectures are often resource-intensive, power-consuming, or low-throughput. Besides, a lower match rate does not lead to a shorter operation time. To address these issues, a Bloom filter (BF)-based parallel join architecture is presented in this paper. This architecture first leverages the BF to discard the tuples that are not in the join result and classifies the remaining tuples into different channels. Second, a binary search tree is used to reduce the number of comparisons. The proposed method was implemented on a Xilinx FPGA, and the experimental results show that under a match rate of 50%, our architecture achieved a high join throughput of 145.8 million tuples per second and a maximum acceleration factor of 2.3 compared to the existing SRAM-based join architectures.

Export Citation Format

Share document.

IEEE Account

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Grab your spot at the free arXiv Accessibility Forum

Help | Advanced Search

Quantitative Biology > Populations and Evolution

Title: a complete characterization of pairs of binary phylogenetic trees with identical $a_k$-alignments.

Abstract: Phylogenetic trees play a key role in the reconstruction of evolutionary relationships. Typically, they are derived from aligned sequence data (like DNA, RNA, or proteins) by using optimization criteria like, e.g., maximum parsimony (MP). It is believed that the latter is able to reconstruct the \enquote{true} tree, i.e., the tree that generated the data, whenever the number of substitutions required to explain the data with that tree is relatively small compared to the size of the tree (measured in the number $n$ of leaves of the tree, which represent the species under investigation). However, reconstructing the correct tree from any alignment first and foremost requires the given alignment to perform differently on the \enquote{correct} tree than on others. A special type of alignments, namely so-called $A_k$-alignments, has gained considerable interest in recent literature. These alignments consist of all binary characters (\enquote{sites}) which require precisely $k$ substitutions on a given tree. It has been found that whenever $k$ is small enough (in comparison to $n$), $A_k$-alignments uniquely characterize the trees that generated them. However, recent literature has left a significant gap between $k\leq 2k+2$ -- namely the cases in which no such characterization is possible -- and $k\geq 4k$ -- namely the cases in which this characterization works. It is the main aim of the present manuscript to close this gap, i.e., to present a full characterization of all pairs of trees that share the same $A_k$-alignment. In particular, we show that indeed every binary phylogenetic tree with $n$ leaves is uniquely defined by its $A_k$-alignments if $n\geq 2k+3$. By closing said gap, we also ensure that our result is optimal.

Subjects:	Populations and Evolution (q-bio.PE); Combinatorics (math.CO)
Cite as:	[q-bio.PE]
	(or [q-bio.PE] for this version)
	Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

HTML (experimental)
Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

We're Hiring!
Help Center

Binary Search Trees

Related Papers

Fernanda Argota

By applying object-oriented design to the definition of a binary search tree, Berman and Duvall [1] designed a data structure comprised of three classes: (i) an EmptyBST class to model empty binary search trees, (ii) a NonEmptyBST class to model non-empty binary search trees, and (iii) a BST base class for common attributes of EmptyBST and NonEmptyBST objects. That paper noted the problem of inserting new values into such a structure: since insertions occur at an EmptyBST object, an EmptyBST would have to "turn into " a NonEmptyBST; a behavior beyond the capabilities of the classes in most languages. This paper presents three C++ solutions to the insertion problem in their order of development. The first solution uses a procedural programming technique, with the second and third solutions shifting to a more object-oriented approach. This chronology illustrates the author's ongoing battle to shift from procedural to object-oriented thinking.

ACM SIGCSE Bulletin

Information Processing Letters

Sukhamay Kundu

South African Computer Journal

Brink Van Der Merwe

We consider two ways of inserting a key into a binary search tree: leaf insertion which is the standard method, and root insertion which involves additional rotations. Although the respective cost of constructing leaf and root insertion binary search trees trees, in terms of comparisons, are the same in the average case, we show that in the worst case the construction of a root insertion binary search tree needs approximately 50% of the number of comparisons required by leaf insertion.

Robert Duvall

The Binary Search Tree serves as an important example when teaching data structures. We explore new approaches to understanding the implementation of a Binary Search Tree, using concepts from Object-Oriented Programming and C++. The Binary Search Tree illustrates how adopting a new approach and a new language can lead to a new way of thinking about a familiar problem.

jagannadha varma Pinnamaraju

Srinidhi Deshpande

A data structure is proposed to maintain a collection of vertex-disjoint trees under a sequence of two kinds of operations: a link operation that combines two trees into one by adding an edge, and a cut operation that divides one tree into two by deleting an edge. The tree is one of the most powerful of the advanced data structures and it often pops up in even more advanced subjects such as AI and compiler design. Surprisingly though the tree is important in a much more basic application-namely the keeping of an efficient index. This research paper gives us brief description of importance of tree in data structure, types of trees, implementation with their examples.

Discrete Mathematics & Theoretical Computer Science

Alois Panholzer

There are three classical algorithms to visit all the nodes of a binary tree - preorder, inorder and postorder traversal. From this one gets a natural labelling of the internal nodes of a binary tree by the numbers , indicating the sequence in which the nodes are visited. For given (size of the tree) and (a number between 1 and

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Henk Koppelaar , Peyman Nasehpour

David Maier

Ramesh Singh Rawat

Erkki Makinen

Journal of Algorithms and Computation

Peyman Nasehpour , Henk Koppelaar

International Journal of Computer Science and Mobile Computing

pijush kumar

Robert Logozar

Representations for Genetic and Evolutionary Algorithms

Franz Rothlauf

International Journal of Computer Applications

Pradeep Kaushik

Computer Science Education

Stephen Edwards

International Journal of Advanced Computer Science and Applications

Emilia Todorova , Ivaylo Donchev

Journal of Computer Science IJCSIS

Sashka Davis

Journal of the ACM

Robert Tarjan

Aldon Walker

Proceedings of the 1971 international ACM SIGIR …

The Computer Journal

Adrijan Božinovski , Nevena Ackovska

Dr. Inayat ur-Rehman

isromania .in

Amitava Bagchi

Binary Search Tree Data Structure Explained with Examples

A binary search tree (BST) is a rooted binary tree data structure used to store data in a way that allows quick lookup, addition, and removal of items. The key idea is that each node‘s value is greater than all the values in its left subtree, but smaller than the values in its right subtree. This ordering of nodes allows binary search trees to operate with improved efficiency compared to other data structures like linked lists or plain arrays.

In this comprehensive guide, we‘ll cover all aspects of binary search trees including definition, implementation, special types, traversal algorithms, performance analysis, and real-world applications from the perspective of an experienced full-stack developer.

BST Properties and Definition

More formally, a binary search tree has the following defining structure:

Each node has a data element, left child, and right child pointers
The data element of a node is greater than all nodes in left subtree
The data element of a node is less than all nodes in right subtree
Both left and right subtrees must also be valid BSTs

This hierarchy of nodes ranked by value enables very fast lookup times. Given a target value, you traverse left or right based on value comparison until located. This makes lookup time on average O(log n) rather than O(n) for sequential structures. Insert and delete operations also utilize the hierarchy for improved efficiency in most cases.

Now let‘s explore BST functionality and properties in more depth across a variety of areas.

Basic BST Operations

The three primary operations that can be performed efficiently on a binary search tree data structure are:

Searching for a node value in a BST is straightforward. Starting at the root, compare the target value against node data. If less, recurse left. If greater, recurse right. Continue until match is found or recursion terminates.

To insert a new node, search down to locate insertion point, then add new leaf node in appropriate subtree preserving BST order.

Deleting has multiple cases to handle based on node degree. Delete leaf nodes directly. For internal node, swap with predecessor or successor before removing leaf node.

Deleting correctly while preserving tree structure is more complex, but can be done efficiently by utilizing the BST ordering.

The efficiency of BST ops depends directly on the height of the tree.

For balanced BSTs, we have guaranteed O(log n) behavior. But in pathological cases, skewed trees can have O(n).

Operation	Balanced	Worst Case
Search	O(log n)	O(n)
Insert	O(log n)	O(n)
Delete	O(log n)	O(n)

Space complexity is O(n) since we need to set aside pointer space for each node.

These performance limits motivate more advanced self-balancing binary search tree data structures…

Self-Balancing BSTs

While the basic binary search tree structure enables efficient operations, real-world usage requires dealing with edge cases. For example, inserting sequential data will generate a linked list shape losing efficiency benefits.

Self-balancing trees resolve this by automatically rearranging structure to keep the tree balanced after insertions and deletions. This guarantees O(log n) height limiting worst case behavior.

Common examples include:

AVL Trees – After inserts/deletes, restores balance by rotating subtrees so node difference is under 1.

Red-Black Trees – Colors nodes red/black to ensure subtrees root heights differ by at most 2. Recolors/rotates on insertion.

Splay Trees – Splay recently accessed element to root using series of rotations to improve access locality. Amortizes to O(log n) per operation.

For example, here is splay tree insertion in C++:

By incorporating self-adjustment, these BST variants offer better guarantees and prevent the pathological slow cases. This makes them staple data structures in real systems.

Specialized BST Variants

Many more exotic BST offshoots exist for niche use cases:

Treaps – Maintain order property of BSTs along with randomized heap priority ordering for improved balancing.

Tango Trees – Support faster predecessor/successor access.

Scapegoat Trees – Partial rebuilding to ensure balance and good behavior.

BW Trees – Achieve high concurrency support alongside efficient queries via lock coupling.

For example, here is a Treap insert function for the specialized structure:

The variability of binary search trees enables many custom variants tailored for specific system requirements in industry settings.

BST Implementation of ADTs

Many abstract data type (ADT) interfaces popular across languages like sets and maps are commonly implemented via binary search trees under the hood.

This is because BST operations like search, minimum, maximum, successor can be used provide efficient ADT operations without reimplementation by client code. The ordering also enables range queries.

For example:

This simplicity of mapping an ADT interface onto a BST implementation makes them very attractive for reuse across domains and languages.

Tree Traversal Algorithms

Since binary search trees arrange data in a hierarchical structure, visiting all nodes systematically is essential for utilities like data backups, graphing, serialization, ecology analysis of subtree interconnections, etc.

Traversal has 3 main flavours:

Inorder Traversal : Left subtree, current node, right subtree. Outputs nodes from min to max.

Preorder Traversal : Current node, left subtree, right subtree. Useful for copying structure.

Postorder Traversal : Left subtree, right subtree, current node. Good for destructively querying then freeing nodes.

Implementing recursively in C++:

Can also traverse iteratively using stack, queues, or specialized iterator classes encapsulating state. This allows cleaner client code:

Mastering various traversal schemes is key for properly analyzing binary tree data sets.

All traversals visit n nodes exactly once, so are O(n) time.

Space complexity varies depending on recursive call stack, but can be O(h) where h is tree height. Iterative traversal via explicit stack costs O(n) space.

So time is output size dependent, while space depends on shape and constraints. Good to optimize based on system memory limits.

BST Analysis

Now let‘s explore some key characteristics to analyze for optimizing binary search tree behavior.

Tree Shape Analysis

The shape of a BST drastically affects performance of operations. We can capture this mathematically by modeling expected behaviors under randomized data insertion.

Theorem : For random BST insertion, expected height E[h] ~ 1.44log(n+2) – 1.328

Proof . By probabilistic argument on likelihood of each node depth.

Some distributions based on common insertion patterns:

Tree Type	Height	Notes
Perfect	log(n)	Optimally balanced
Random	1.44log(n) – 1.328	Average case
Linked List	n	Worst case

Expected tree shapes for n=500 nodes

So perfectly uniform data gets log(n) efficiency, but random insertion can approach linear behavior for skewed distributions without explicit balancing!

Probabilistic Analysis

We can also mathematically model expected run times via probabilistic analysis:

Theorem : For a BST with n nodes and height h, probability that a search query has to traverse i nodes = (h+1-i)/(h+1)

Proof . Based on level likelihood.

This enables precise calculation of average search cost modelled via cumulative probability over traversal distance:

Where H is harmonic number function. Asymptotic bounded by:

O(log(n)) ≤ E[Search Length] ≤ O(n)

So BSTs give at least logarithmic efficiency for search in the average case. But could degrade to linear without explicit balancing. Mathematical analysis empowers precise fine-tuning.

Benchmarking Different BSTs

To highlight the performance variation across BST shapes and types empirically, I implemented a variety of structures and plotted out runtimes across insertion patterns:

Code: https://github.com/kstainer/BST-Compare

We clearly see the worst case linked list shape causing inefficient usage. Meanwhile red-black and AVL tree rebalancing counters this. Splay‘s locality adaptation also offers insert optimizations.

This showcases how structural configurations can vastly affect real world usage – not just asymptotic notations. So implementing advanced self-balancing BST variants pays dividends avoiding edge case degradation.

Usage in Database Indexes

Binary search trees play a huge role behind-the-scenes in database systems and file organization due to efficient search and update characteristics.

Log-structured merge trees are crucial for handling inserts and stale data. B+ trees store sorted records in leaf pages by leveraging ordered hierarchy.

In B+ tree leaves:

This builds an access structure allowing quick record retrieval via inner nodes akin to BST search guiding path.

Database indexes also embed tree structures for rapid filtering without expensive table scans:

Here an index BST on the price column enables fast range queries.

So while naive BST usage has pitfalls, the fundamental hierarchy concept transforms database performance when applied properly.

Binary search trees enable a host of high-performance operations useful across problem domains by arranging data in hierarchical structure for access efficiency.

However, subtleties emerge in advanced analysis and real-world usage without proper balancing. Self-adjusting variants resolve these issues by keeping optimal shapes dynamically. So the BST and augmentations remain staples in every seasoned computer scientist‘s toolbox.

I hope this tour from basic API usage to mathematical analysis and benchmarking conveys a comprehensive yet practical picture of employing binary search trees effectively to empower data applications. The techniques contained form crucial building blocks for state-of-the-art system design and scale.

Let me know in the comments if any aspects need more elaboration! I would be glad to expand my treatment of the venerable binary search tree with any insights you might have.

Dr. Alex Mitchell is a dedicated coding instructor with a deep passion for teaching and a wealth of experience in computer science education. As a university professor, Dr. Mitchell has played a pivotal role in shaping the coding skills of countless students, helping them navigate the intricate world of programming languages and software development.

Beyond the classroom, Dr. Mitchell is an active contributor to the freeCodeCamp community, where he regularly shares his expertise through tutorials, code examples, and practical insights. His teaching repertoire includes a wide range of languages and frameworks, such as Python, JavaScript, Next.js, and React, which he presents in an accessible and engaging manner.

Dr. Mitchell’s approach to teaching blends academic rigor with real-world applications, ensuring that his students not only understand the theory but also how to apply it effectively. His commitment to education and his ability to simplify complex topics have made him a respected figure in both the university and online learning communities.

A Computer Science Degree: Worthwhile or Waste of Time?

As technology continues its rapid growth, software engineering and computer science have become two of the…

How to Authenticate your Elixir/Phoenix APIs using Guardian: An Expert Guide

Protecting access to Phoenix application programming interfaces (APIs) is a crucial security task. This comprehensive guide…

A recipe for website automated tests with Python Selenium & Headless Chrome in Docker

The growing need for test automation Developing quality software requires continually testing code changes. But repetitive…

Between the Wires: An Exclusive 3300+ Word Interview with React Core Team Member Sebastian Markbåge

If you‘ve done any web development in the past decade, chances are you‘ve used React, the…

A Quick Introduction to Server-Side Blazor Apps

Blazor is a popular new web framework from Microsoft that allows building interactive web UIs using…

A More Seamless Workflow: Style Guides for Better Design and Development

Style guides enhance consistency in the visual identity and user experience of complex interfaces while streamlining…

Research on bearing fault diagnosis method based on cjbm with semi-supervised and imbalanced data

Original Paper
Published: 13 August 2024

Cite this article

Yanfeng Peng ORCID: orcid.org/0000-0002-6658-3750 1 ,
Guangfu Bin 1 ,
Yiping Shen 1 ,
Yong Guo 1 ,
Baoqing Li 1 ,
Yongzheng Jiang 1 &
Chao Fan 1

21 Accesses

Explore all metrics

Data-driven intelligent methods have been widely used in bearing fault diagnosis. However, it is observed that previous studies on bearing fault diagnosis always assume that the label samples are sufficient and that the number of normal and fault samples is the same or similar, which is challenging to meet in practical engineering applications. This assumption reduces the accuracy and stability of the semi-supervised imbalanced bearing data fault diagnosis model in practical working conditions. The complex training and weak interpretation problems of transfer learning methods are analyzed, and a center jumping boosting machine method for bearing intelligent fault recognition with semi-supervised and imbalanced data is proposed. First, a modified density peak clustering (DPC) algorithm is used to classify unlabeled samples and select subsamples, and aiming at the DPC problem, a $\gamma $ DPC algorithm based on the $\gamma $ jumping phenomenon is proposed to determine the number of clusters and intercept distance automatically. Second, combined with the synthetic minority oversampling technique, some minority class samples are added to achieve a balanced bearing dataset. Then, a few known faults are used to assign pseudo-labels to unknown samples. Finally, to diagnose the new data and reduce the amount of calculation in actual production, the balanced data after processing are used to train the bottom light gradient boosting machine model to solve intelligent classification and recognition of bearing vibration data. In addition, by using three bearing datasets with different balance ratios and comparing them with other methods, the superiority of the proposed method is verified in bearing condition identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Imbalanced fault diagnosis of rotating machinery via multi-domain feature extraction and cost-sensitive learning

A train bearing imbalanced fault diagnosis method based on extended CCR and multi-scale feature fusion network

Imbalanced fault diagnosis based on semi-supervised ensemble learning

Explore related subjects.

Artificial Intelligence
Automotive Engineering

Data availability

All data generated or analyzed during this study are included in this article.

Bin, G.F., Jiang, Z.N., Li, X.J., Dhillon, B.S.: Weighted multi-sensor data level fusion method of vibration signal based on correlation function. Chin. J. Mech. Eng. 5 , 025 (2011)

Google Scholar

Wu, Z., Zhang, H., Guo, J., Ji, Y., Pecht, M.: Imbalanced bearing fault diagnosis under variant working conditions using cost-sensitive deep domain adaptation network. Expert Syst. Appl. 193 , 116459 (2022)

Article Google Scholar

Tama, B.A., Vania, M., Lee, S., Lim, S.: Recent advances in the application of deep learning for fault diagnosis of rotating machinery using vibration signals. Artif. Intell. Rev. 56 (5), 4667–4709 (2023)

An, Y., Zhang, K., Chai, Y., Liu, Q., Huang, X.: Domain adaptation network base on contrastive learning for bearings fault diagnosis under variable working conditions. Expert Syst. Appl. 212 , 118802 (2023)

Li, S., Peng, Y.F., Shen, Y.P., Zhao, S.B., Shao, H.D., Bin, G.F., Yang, X.K., Fan, C.: Rolling bearing fault diagnosis under data imbalance and variable speed based on adaptive clustering weighted oversampling. Reliab. Eng. Syst. Saf. 244 , 109938 (2024)

Bin, G.F., Gao, J.J., Li, X.J., Dhillon, B.S.: Early fault diagnosis of rotating machinery based on wavelet packets-Empirical mode decomposition feature extraction and neural network. Mech. Syst. Signal Process. 27 , 696–711 (2012)

Zong, X., Yang, R., Wang, H., Du, M., You, P., Wang, S., Su, H.: Semi-supervised transfer learning method for bearing fault diagnosis with imbalanced data. Machines 10 (7), 515 (2022)

Wang, Q., Taal, C., Fink, O.: Integrating expert knowledge with domain adaptation for unsupervised fault diagnosis. IEEE Trans. Instrum. Meas. 71 , 1–12 (2021)

Chen, Z., Chen, J., Xie, Z., Xu, E., Feng, Y., Liu, S.: Multi-expert attention network with unsupervised aggregation for long-tailed fault diagnosis under speed variation. Knowl.-Based Syst. 252 , 109393 (2022)

Jiang, Z., Zhao, L., Lu, Y., Zhang, Y., Mao, Q.: A semi-supervised resampling method for class-imbalanced learning. Expert Syst. Appl. 221 , 119733 (2023)

Arshad, A., Riaz, S., Jiao, L.: Semi-supervised deep fuzzy C-mean clustering for imbalanced multi-class classification. IEEE Access 7 , 28100–28112 (2019)

An, Y., Zhang, K., Chai, Y., Zhu, Z., Liu, Q.: Gaussian mixture variational based transformer domain adaptation fault diagnosis method and its application in bearing fault diagnosis. IEEE Trans. Industr. Inf. 20 (1), 615–625 (2023)

Xu, Z., Shen, D., Nie, T., Kou, Y., Yin, N., Han, X.: A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data. Inf. Sci. 572 , 574–589 (2021)

Article MathSciNet Google Scholar

Tao, X., Chen, W., Zhang, X., Guo, W., Qi, L., Fan, Z.: SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data. Knowl.-Based Syst. 234 , 107588 (2021)

Tang, M., Meng, C., Wu, H., Zhu, H., Yi, J., Tang, J., Wang, Y.: Fault Detection for Wind Turbine Blade Bolts Based on GSG Combined with CS-LightGBM. Sensors 22 (18), 6763 (2022)

Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344 (6191), 1492–1496 (2014)

Liu, R., Wang, H., Yu, X.: Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf. Sci. 450 , 200–226 (2018)

Liu, Y., Yu, Z., Chen, C., Han, Y., Yu, B.: Prediction of protein crotonylation sites through LightGBM classifier based on SMOTE and elastic net. Anal. Biochem. 609 , 113903 (2020)

Wang, S., Liu, S., Zhang, J., Che, X., Yuan, Y., Wang, Z., Kong, D.: A new method of diesel fuel brands identification: SMOTE oversampling combined with XGBoost ensemble learning. Fuel 282 , 118848 (2020)

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 , 321–357 (2002)

Sun, W., Chen, J., Li, J.: Decision tree and PCA-based fault diagnosis of rotating machinery. Mech. Syst. Signal Process. 21 (3), 1300–1317 (2007)

Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29 , 1189–1232 (2001)

Hwehih: Data-set. Githup. (2023). https://github.com/HWEHIH/Data-set

E62505. (2022). Fault_diognosis_dataset. Githup. https://github.com/e62505/Fault_diognosis_dataset .

Srinivas, S., Rajendran, C.: Community detection and influential node identification in complex networks using mathematical programming. Expert Syst. Appl. 135 , 296–312 (2019)

Xu, C., Dai, Y., Lin, R., Wang, S.: Deep clustering by maximizing mutual information in variational auto-encoder. Knowl.-Based Syst. 205 , 106260 (2020)

Agrawal, U., Rohatgi, V., Katarya, R.: Normalized mutual information-based equilibrium optimizer with chaotic maps for wrapper-filter feature selection. Expert Syst. Appl. 207 , 118107 (2022)

Khan, M.S., Lohani, Q.D.: Topological analysis of intuitionistic fuzzy distance measures with applications in classification and clustering. Eng. Appl. Artif. Intell. 116 , 105415 (2022)

De Bruin, S., Brus, D.J., Heuvelink, G.B., van Ebbenhorst Tengbergen, T., Wadoux, A.M.C.: Dealing with clustered samples for assessing map accuracy by cross-validation. Eco. Inform. 69 , 101665 (2022)

Omuya, E.O., Okeyo, G.O., Kimwele, M.W.: Feature selection for classification using principal component analysis and information gain. Expert Syst. Appl. 174 , 114765 (2021)

Wei, G., Mu, W., Song, Y., Dou, J.: An improved and random synthetic minority oversampling technique for imbalanced data. Knowl.-Based Syst. 248 , 108839 (2022)

Li, W., Zhong, X., Shao, H., Cai, B., Yang, X.: Multi-mode data augmentation and fault diagnosis of rotating machinery using modified ACGAN designed with new framework. Adv. Eng. Inform. 52 , 101552 (2022)

Download references

This work was funded by the National Natural Science Foundation of China (Grant Nos. 52275107), the Hunan Provincial Science and Technology Innovation Talent Project (Grant No. 2022RC1135), the Hunan Provincial Natural Science Foundation of China (Grant No. 2021JJ30260 and 2021JJ30253).

Author information

Authors and affiliations.

Hunan Provincial Key Laboratory of Health Maintenance for Mechanical Equipment, Hunan University of Science and Technology, Xiangtan, 411201, People’s Republic of China

Sai Li, Yanfeng Peng, Guangfu Bin, Yiping Shen, Yong Guo, Baoqing Li, Yongzheng Jiang & Chao Fan

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanfeng Peng .

Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Li, S., Peng, Y., Bin, G. et al. Research on bearing fault diagnosis method based on cjbm with semi-supervised and imbalanced data. Nonlinear Dyn (2024). https://doi.org/10.1007/s11071-024-10073-4

Download citation

Received : 18 July 2023

Accepted : 22 July 2024

Published : 13 August 2024

DOI : https://doi.org/10.1007/s11071-024-10073-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Fault diagnosis
Semi-supervised learning
Imbalanced data
Find a journal
Publish with us
Track your research

IMAGES

Binary Search Tree (BST).
Solved The following diagram shows a binary search tree.
Binary Search Tree (BST) with Example
Data Structures 101: Binary Search Tree
(PDF) Binary search tree implementation with C++ / Python on Visual
Data Structures 101: Binary Search Tree

COMMENTS

A Survey on Balanced Binary Search Trees methods
Binary Search Trees (BSTs) are fundamental data structures in computer science; because of their implicit key sorting and linked node structure, they provide effective sorting and simple update operations. They achieve their highest performance when they are balanced. A non-balanced Binary Search Tree is no more efficient than a regular linked list. In the present literature, multiple methods ...
(PDF) New Ways to Construct Binary Search Trees
Using Theorem 5, it is easy to construct a nearly optimal binary search tree. of small height. Theorem 6. Given an optimal binary search tree Ton nnodes and a posi-. tive value <1, we can ...
binary search tree Latest Research Papers
Abstract Today, Red-Black trees are becoming a popular data structure typically used to implement dictionaries, associative arrays, symbol tables within some compilers (C++, Java …) and many other systems. In this paper, we present an improvement of the delete algorithm of this kind of binary search tree.
PDF Performance Analysis of BSTs in System Software
Binary search tree (BST) based data structures, such as AVL trees, red-black trees, and splay trees, are of-ten used in system software, such as operating system kernels. Choosing the right kind of tree can impact performance signiﬁcantly, but the literature oﬀers few empirical studies for guidance. We compare 20 BST
(PDF) Randomized Binary Search Trees
Abstract. In this paper we present randomized algorithms over binary search trees such that: a) the insertion of a set of keys, in any fixed order, into an initially empty tree always produces a ...
PDF A Practical Concurrent Binary Search Tree
An AVL tree [1] is a self-balancing binary search tree in which the heights of the left and right child branches of a node differ by no more than one. If an insertion to or deletion from the tree causes this balance condition to be violated then one or more rotations are performed to restore the AVL invariant.
(PDF) Nearly Optimal Binary Search Trees
Main idea. Our improved attack method is on the basis of Mehlhorn's Rule II in Nearly Optimal Binary Search Tree [42], i.e., for every given probability range, we always select the root in a way ...
PDF A Fast Parallelizable Algorithm for Constructing Balanced Binary Search
We present a new non-recursive algorithm for construct-ing a binary search tree. The algorithm has O(N) time and O(1) auxiliary memory complexity if the given array of N numbers is sorted. We use an array-based representation of the BST. The O(1) auxiliary memory complexity means that, except for the resulting arrays used to store the tree, we.
An improved algorithm to maintain the Binary Search Tree dynamically
Binary Search Tree (BST) is an acyclic graph that is widely used to arrange the data for optimal search. In order to maintain the binary search tree in optimal shape several algorithms have been proposed. A recently proposed technique [1] applies single and double rotations to balance the binary search tree. However, due to double rotation it takes almost double time as compared to single ...
[2206.12110] Learning Augmented Binary Search Trees
Honghao Lin, Tian Luo, David P. Woodruff. View a PDF of the paper titled Learning Augmented Binary Search Trees, by Honghao Lin and 2 other authors. A treap is a classic randomized binary search tree data structure that is easy to implement and supports O (\log n) expected time access. However, classic treaps do not take advantage of the input ...
PDF Binary Trees
Binary Search Tree Niche Basically, binary search trees are fast at insert and lookup. The next section presents the code for these two algorithms. On average, a binary search tree algorithm can locate a node in an N node tree in order lg(N) time (log base 2). Therefore, binary search trees are good for "dictionary" problems where the code ...
PDF MIT Open Access Articles Combining Binary Search Trees
Binary search trees (BSTs) are one of the most fundamental and well-studied data structures in computer science. Yet, many fundamental questions about ... y Research supported by GAANN Grant P200A090157 from the US Department of Education. ... In this paper we present a structural tool to com-
PDF The Functional Essence of Imperative Binary Search Trees
168:4 Anton Lorenzen, Daan Leijen, Wouter Swierstra, and Sam Lindley typetree Node( left:tree, key:key, right:tree) Leaf We use an abstract type key for the keys stored in the tree but this is usually instantiated to be an int.The main operation on binary trees is the insert function that takes a tree and a key as its arguments.
PDF Randomized Binary Search Trees
Let T be a binary search tree of size n. —If n 5 0, then T 5 h and it is a random binary search tree; —If n. 0, the tree T is a random binary search tree if and only if both its left subtree L and its right subtree R are independent random binary search trees, and Pr{size~L! 5 iusize~T! 5 n} 5 1 n, i 5 0,...,n 2 1, n. 0. (1) An immediate ...
PDF Binary Search Trees
Binary Search Trees The data structure we have just seen is called a binary search tree (or BST). The tree consists of a number of nodes, each of which stores a value and has zero, one, or two children. All values in a node's left subtree are smaller than the node's value, and all values in a node's right subtree are greater than
PDF 1 Binary search trees
CS 161 Lecture 8 - Binary Search Trees Jessica Su (some parts copied from CLRS) Even though 2 7 9 and 3 5 7, this tree does not satisfy the binary search tree property, because 2 is in the right subtree of 5, despite being smaller than 5. 1.0.1 Runtime Binary search trees support several operations, including Search, Minimum, Maximum, Pre-
Binary Search Tree Research Papers
We present a detailed study of left-right-imbalance measures for random binary search trees under the random permutation model, i.e., where binary search trees are generated by random permutations of f1;2; : : : ; ng. For random binary search trees of size n we study (i) the difierence between the left and the right depth of a randomly chosen node,
PDF The Functional Essence of Imperative Binary Search Trees
The functional algorithms sufer from asymptotically worse heap allocation (due to copying immutable data) and stack usage (due to non-tail calls). This paper presents novel functional algorithms on binary search trees with absolute performance on par with the best known imperative implementations.
A complete characterization of pairs of binary phylogenetic trees with
Phylogenetic trees play a key role in the reconstruction of evolutionary relationships. Typically, they are derived from aligned sequence data (like DNA, RNA, or proteins) by using optimization criteria like, e.g., maximum parsimony (MP). It is believed that the latter is able to reconstruct the \\enquote{true} tree, i.e., the tree that generated the data, whenever the number of substitutions ...
(PDF) ASH Search: Binary Search Optimization
International Journal of Computer Applications (0975 - 8887) Volume 178 - No. 15, May 2019. 10. ASH Search: Binary Search Optimization. Ashar Mehmood. School of Electrical Engineeri n g and ...
(PDF) Binary Search Trees
This research paper gives us brief description of importance of tree in data structure, types of trees, implementation with their examples. ... Binary Search Trees • Definition: A binary search tree T is a binary tree; either it is empty or each node in the tree contains an identifier and: - all keys in the left subtree of T are less ...
PDF Binary Search Trees
A binary search tree is balanced if its height is O(log n), where n is the number of nodes in the tree (i.e. left/right subtrees don't differ in height by more than 1). Lookup, insertion, and deletion with balanced BSTs all operate in O(log n) runtime. There are multiple valid BSTs for the same set of data.
Binary Search Tree Data Structure Explained with Examples
A binary search tree (BST) is a rooted binary tree data structure used to store data in a way that allows quick lookup, addition, and removal of items. The key idea is that each node's value is greater than all the values in its left subtree, but smaller than the values in its right subtree.
Research on bearing fault diagnosis method based on cjbm ...
2.1 Density peak clustering (DPC). This algorithm can identify data of any shape, intuitively find the number of clusters, and easily identify outliers. In addition, this algorithm assumes that cluster centers are surrounded by neighboring points with low local densities and are far away from any points with high local densities [].For each data point ${x}_{i}$, we calculate two essential ...
PDF Lecture 22: Binary Search Trees
Every node in a tree has exactly one parent node (except for the root node). A path between two nodes traverses edges between parents and their children, and length of a path is the number of edges between the two nodes. The depth of a node is the length of the path (# of edges) between the root and that node. e. the number of levels in the tree).
PDF CS 106B, Lecture 20 Binary Search Trees
A tree is a data structure where each element (parent) stores two or more pointers to other elements (its children) A doubly-linked list doesn't count because, just like outside of computer science, a child can not be its own ancestor. Each node in a binary tree has two pointers.