Solving Find Median from Data Stream

To see the question, click here.

Naive Approach

The idea is to store the elements in a list. At each stage of adding an element, we will sort the list because the elements may not be added to the list in a sorted fashion.

// TC: O(nlogn) for addNum and O(1) for findMedian
// SC: O(n)

import java.util.List;
import java.util.ArrayList;
import java.util.Collections;

class FindMedianfromDataStream {
    List<Integer> list;

    public FindMedianfromDataStream() {
        list = new ArrayList<>();
    }

    public void addNum(int num) {
        list.add(num);
        Collections.sort(list);
    }

    public double findMedian() {
        int size = list.size();
        if (size % 2 != 0) {
            return list.get(size / 2);
        }
        Double ans = (list.get(size / 2) + list.get(size / 2 - 1)) / 2.0;
        return ans;
    }

    public static void main(String args[]) {
        FindMedianfromDataStream fmd = new FindMedianfromDataStream();
        fmd.addNum(1);
        fmd.addNum(2);
        System.out.println(fmd.findMedian());
        fmd.addNum(3);
        System.out.println(fmd.findMedian());
    }
}

Performance

The time complexity of the addNum method is O(nlogn) due to the sorting operation performed after each addition. The time complexity of the findMedian method is O(1) because it only involves accessing elements by index and performing arithmetic operations. The space complexity is O(n) because we store all the numbers in the list.

Refined Approach

The idea is to maintain a balanced and sorted data stream structure in real time, allowing for efficient median retrieval. The data stream is divided into two halves: a lower half (low) and an upper half (high). The low half contains the smaller elements stored in descending order. The high half contains the larger elements stored in ascending order. The TreeMap data structure is used to maintain order and allow quick access to the smallest and largest elements. Each TreeMap also tracks the count of each element to handle duplicates. After every insertion, the sizes of the two halves are balanced to ensure that the difference in size is at most 1. This balancing ensures that the median can be easily calculated based on the top elements of the two halves.

// TC: O(logn) for addNum and O(1) for findMedian
// SC: O(n) for TreeMap

import java.util.TreeMap;
import java.util.Collections;

class FindMedianfromDataStream {
    private TreeMap<Integer, Integer> low;
    private TreeMap<Integer, Integer> high;
    private int lowSize;
    private int highSize;

    public FindMedianfromDataStream() {
        low = new TreeMap<>(Collections.reverseOrder());
        high = new TreeMap<>();
        lowSize = 0;
        highSize = 0;
    }

    public void addNum(int num) {
        if (lowSize == 0 || num <= low.firstKey()) {
            low.put(num, low.getOrDefault(num, 0) + 1);
            lowSize++;
        } else {
            high.put(num, high.getOrDefault(num, 0) + 1);
            highSize++;
        }
        balance();
    }

    public double findMedian() {
        if (lowSize > highSize) {
            return low.firstKey();
        } else if (lowSize < highSize) {
            return high.firstKey();
        } else {
            return (low.firstKey() + high.firstKey()) / 2.0;
        }
    }

    private void balance() {
        if (lowSize > highSize + 1) {
            moveFirstElement(low, high);
            lowSize--;
            highSize++;
        } else if (highSize > lowSize) {
            moveFirstElement(high, low);
            highSize--;
            lowSize++;
        }
    }

    private void moveFirstElement(TreeMap<Integer, Integer> from, TreeMap<Integer, Integer> to) {
        int firstKey = from.firstKey();
        int count = from.get(firstKey);
        if (count == 1) {
            from.remove(firstKey);
        } else {
            from.put(firstKey, count - 1);
        }
        to.put(firstKey, to.getOrDefault(firstKey, 0) + 1);
    }

    public static void main(String args[]) {
        FindMedianfromDataStream fmd = new FindMedianfromDataStream();
        fmd.addNum(1);
        fmd.addNum(2);
        System.out.println(fmd.findMedian());
        fmd.addNum(3);
        System.out.println(fmd.findMedian());
    }
}

Performance

The time complexity for addNum is O(log n) due to the insertion and potential rebalancing of the TreeMap, and O(1) for findMedian because it is implemented efficiently using TreeMap's firstKey/lastKey. The space complexity is O(n) as it stores each input number in the TreeMap, with n being the number of elements added. The TreeMap ensures ordered keys, allowing efficient median finding and insertion operations.

Efficient Approach 1

Instead of two treemaps, we can utilize two heaps. The concept remains the same i.e., maxHeap stores the smaller half of the numbers while minHeap stores the larger half of the numbers. When a new number is added, it is compared with the root of maxHeap. If the new number is smaller than or equal to the root of maxHeap, it is added to maxHeap. Otherwise, it is added to minHeap. After adding the number, the sizes of the two heaps should balance. If maxHeap has more than one extra element compared to minHeap, the root of maxHeap is moved to minHeap. If minHeap has more elements than maxHeap, the root of minHeap is moved to maxHeap. If maxHeap has more elements, the median is the root of maxHeap. If minHeap has more elements, the median is the root of minHeap. If both heaps have the same size, the median is the average of the roots of maxHeap and minHeap.

// TC: O(logn) for addNum and O(1) for findMedian
// SC: O(n) for maxHeap and minHeap

import java.util.PriorityQueue;
import java.util.Collections;

class FindMedianfromDataStream {
    private PriorityQueue<Integer> maxHeap; // For the smaller half
    private PriorityQueue<Integer> minHeap; // For the larger half

    public FindMedianfromDataStream() {
        maxHeap = new PriorityQueue<>(Collections.reverseOrder()); // Max-Heap
        minHeap = new PriorityQueue<>(); // Min-Heap
    }

    public void addNum(int num) {
        if (maxHeap.isEmpty() || num <= maxHeap.peek()) {
            maxHeap.add(num);
        } else {
            minHeap.add(num);
        }

        // Balance the heaps
        if (maxHeap.size() > minHeap.size() + 1) {
            minHeap.add(maxHeap.poll());
        } else if (minHeap.size() > maxHeap.size()) {
            maxHeap.add(minHeap.poll());
        }
    }

    public double findMedian() {
        if (maxHeap.size() > minHeap.size()) {
            return maxHeap.peek();
        } else if (maxHeap.size() < minHeap.size()) {
            return minHeap.peek();
        } else {
            return (maxHeap.peek() + minHeap.peek()) / 2.0;
        }
    }

    public static void main(String args[]) {
        FindMedianfromDataStream fmd = new FindMedianfromDataStream();
        fmd.addNum(1);
        fmd.addNum(2);
        System.out.println(fmd.findMedian());
        fmd.addNum(3);
        System.out.println(fmd.findMedian());
    }
}

The time complexity for adding a number is O(logn) because we add the number to one of the heaps and then potentially balance the heaps, which involves removing and adding elements to the heaps. The time complexity for finding the median is O(1) because we can directly access the top elements of the heaps. The space complexity is O(n) because we store all the heaps' elements.

Thank you for reading!

Check out other DSA patterns here.

You can support me by buying me a book.