Solving Find Median from Data Stream
To see the question, click here.
Naive Approach
The idea is to store the elements in a list. At each stage of adding an element, we will sort the list because the elements may not be added to the list in a sorted fashion.
// TC: O(nlogn) for addNum and O(1) for findMedian
// SC: O(n)
import java.util.List;
import java.util.ArrayList;
import java.util.Collections;
class FindMedianfromDataStream {
List<Integer> list;
public FindMedianfromDataStream() {
list = new ArrayList<>();
}
public void addNum(int num) {
list.add(num);
Collections.sort(list);
}
public double findMedian() {
int size = list.size();
if (size % 2 != 0) {
return list.get(size / 2);
}
Double ans = (list.get(size / 2) + list.get(size / 2 - 1)) / 2.0;
return ans;
}
public static void main(String args[]) {
FindMedianfromDataStream fmd = new FindMedianfromDataStream();
fmd.addNum(1);
fmd.addNum(2);
System.out.println(fmd.findMedian());
fmd.addNum(3);
System.out.println(fmd.findMedian());
}
}
Performance
The time complexity of the addNum
method is O(nlogn) due to the sorting operation performed after each addition. The time complexity of the findMedian
method is O(1) because it only involves accessing elements by index and performing arithmetic operations. The space complexity is O(n) because we store all the numbers in the list.
Refined Approach
The idea is to maintain a balanced and sorted data stream structure in real time, allowing for efficient median retrieval. The data stream is divided into two halves: a lower half (low
) and an upper half (high
). The low
half contains the smaller elements stored in descending order. The high
half contains the larger elements stored in ascending order. The TreeMap
data structure is used to maintain order and allow quick access to the smallest and largest elements. Each TreeMap
also tracks the count of each element to handle duplicates. After every insertion, the sizes of the two halves are balanced to ensure that the difference in size is at most 1. This balancing ensures that the median can be easily calculated based on the top elements of the two halves.
// TC: O(logn) for addNum and O(1) for findMedian
// SC: O(n) for TreeMap
import java.util.TreeMap;
import java.util.Collections;
class FindMedianfromDataStream {
private TreeMap<Integer, Integer> low;
private TreeMap<Integer, Integer> high;
private int lowSize;
private int highSize;
public FindMedianfromDataStream() {
low = new TreeMap<>(Collections.reverseOrder());
high = new TreeMap<>();
lowSize = 0;
highSize = 0;
}
public void addNum(int num) {
if (lowSize == 0 || num <= low.firstKey()) {
low.put(num, low.getOrDefault(num, 0) + 1);
lowSize++;
} else {
high.put(num, high.getOrDefault(num, 0) + 1);
highSize++;
}
balance();
}
public double findMedian() {
if (lowSize > highSize) {
return low.firstKey();
} else if (lowSize < highSize) {
return high.firstKey();
} else {
return (low.firstKey() + high.firstKey()) / 2.0;
}
}
private void balance() {
if (lowSize > highSize + 1) {
moveFirstElement(low, high);
lowSize--;
highSize++;
} else if (highSize > lowSize) {
moveFirstElement(high, low);
highSize--;
lowSize++;
}
}
private void moveFirstElement(TreeMap<Integer, Integer> from, TreeMap<Integer, Integer> to) {
int firstKey = from.firstKey();
int count = from.get(firstKey);
if (count == 1) {
from.remove(firstKey);
} else {
from.put(firstKey, count - 1);
}
to.put(firstKey, to.getOrDefault(firstKey, 0) + 1);
}
public static void main(String args[]) {
FindMedianfromDataStream fmd = new FindMedianfromDataStream();
fmd.addNum(1);
fmd.addNum(2);
System.out.println(fmd.findMedian());
fmd.addNum(3);
System.out.println(fmd.findMedian());
}
}
Performance
The time complexity for addNum
is O(log n) due to the insertion and potential rebalancing of the TreeMap
, and O(1) for findMedian
because it is implemented efficiently using TreeMap's firstKey/lastKey. The space complexity is O(n) as it stores each input number in the TreeMap
, with n
being the number of elements added. The TreeMap
ensures ordered keys, allowing efficient median finding and insertion operations.
Efficient Approach 1
Instead of two treemaps, we can utilize two heaps. The concept remains the same i.e., maxHeap
stores the smaller half of the numbers while minHeap
stores the larger half of the numbers. When a new number is added, it is compared with the root of maxHeap
. If the new number is smaller than or equal to the root of maxHeap
, it is added to maxHeap
. Otherwise, it is added to minHeap
. After adding the number, the sizes of the two heaps should balance. If maxHeap
has more than one extra element compared to minHeap
, the root of maxHeap
is moved to minHeap
. If minHeap
has more elements than maxHeap
, the root of minHeap
is moved to maxHeap
. If maxHeap
has more elements, the median is the root of maxHeap
. If minHeap
has more elements, the median is the root of minHeap
. If both heaps have the same size, the median is the average of the roots of maxHeap
and minHeap
.
// TC: O(logn) for addNum and O(1) for findMedian
// SC: O(n) for maxHeap and minHeap
import java.util.PriorityQueue;
import java.util.Collections;
class FindMedianfromDataStream {
private PriorityQueue<Integer> maxHeap; // For the smaller half
private PriorityQueue<Integer> minHeap; // For the larger half
public FindMedianfromDataStream() {
maxHeap = new PriorityQueue<>(Collections.reverseOrder()); // Max-Heap
minHeap = new PriorityQueue<>(); // Min-Heap
}
public void addNum(int num) {
if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
maxHeap.add(num);
} else {
minHeap.add(num);
}
// Balance the heaps
if (maxHeap.size() > minHeap.size() + 1) {
minHeap.add(maxHeap.poll());
} else if (minHeap.size() > maxHeap.size()) {
maxHeap.add(minHeap.poll());
}
}
public double findMedian() {
if (maxHeap.size() > minHeap.size()) {
return maxHeap.peek();
} else if (maxHeap.size() < minHeap.size()) {
return minHeap.peek();
} else {
return (maxHeap.peek() + minHeap.peek()) / 2.0;
}
}
public static void main(String args[]) {
FindMedianfromDataStream fmd = new FindMedianfromDataStream();
fmd.addNum(1);
fmd.addNum(2);
System.out.println(fmd.findMedian());
fmd.addNum(3);
System.out.println(fmd.findMedian());
}
}
The time complexity for adding a number is O(logn) because we add the number to one of the heaps and then potentially balance the heaps, which involves removing and adding elements to the heaps. The time complexity for finding the median is O(1) because we can directly access the top elements of the heaps. The space complexity is O(n) because we store all the heaps' elements.
Thank you for reading!
Check out other DSA patterns here.
You can support me by buying me a book.