Suppose you have a random list of people standing in a queue. Each person is described by a pair of integers `(h, k)`

, where `h`

is the height of the person and `k`

is the number of people in front of this person who have a height greater than or equal to `h`

. Write an algorithm to reconstruct the queue.

**Note:**

The number of people is less than 1,100.

**Example**

Input: [[7,0], [4,4], [7,1], [5,0], [6,1], [5,2]] Output: [[5,0], [7,0], [5,2], [6,1], [4,4], [7,1]]

When I started to solve this problem, I have no idea about the solution.

My first thought is sorting the array. Because it will be very hard if we are using a unsorted array and have no idea about its structure. But there are two values for each element, how should we sort it? Sorting with $h$ with descending order would be a good idea. The reason is that if we start with a small height, we have no idea about the large elements, which need to be in front of this smaller elements to make $k$ meet the requirement.

What if $h$ is the same? I would sort it with $k$ being the ascending order because if k is larger, it must be after the element with the same $h$ but smaller $k$.

So for the example input, after sorting, we will have: [[7,0], [7,1], [6,1], [5,0], [5,2], [4,4]]

We can start processing this sorted array.

- For the first element [7, 0], looks fine. We can add it to the result.
- Second element [7,1], we already have one element in our result, looks like it also meet the requirement, adding it to result is fine.
- But for the third one [6,1], we cannot directly add it to the result. Because in the result we have 2 elements, but third element is [6,1], which indicating that there should be one elements in the result. So we can take the [7,1] out to somewhere else(We don’t know what we should use to save this right now), and add the [6,1] into the result.
- For the fourth element [5,0], looks like we need to remove every thing from the result. However, we should be able to maintain the original order of the removed value.
- For element [5,2], now we only have 1 element in the result. But this element requires two elements in front of it. We need to take some elements we removed back. Which one should we take back? As I mentioned in 4, we need to keep order of the removed value. We need to take [7,0] back, since it WAS the very first element in the result.
- For element [4,4], currently in result we have [5,0], [7,0] and [5,2] looks like we need to take one more elements back. To maintain the order, we need to take [6,1] back. Just think in this way, if we take [7,1] back, where should we put this [6,1]? So obviously we should take [6,1] back.
- All elements are processed. We could add the elements we took out back to our result.

So to maintain the order, we should save the elements that are removed from our result to a stack. When we process an element, the heights of elements that already processed are always larger or equal to height of current element. We need to adjust the current size of result to meet the condition.

We also need to sort the array like I mentioned above.

If we have a binary tree, the value will the [h, k] pair. Here we need to have a left count to track how many elements are in current node’s left subtree, including itself. With this left count, when we insert a new node, the height of this new node is of course larger or equal to current nodes, we would be able to insert it into a correct position:

- If we insert the new node to the left of a current node, we treat the new node will be before the current node, and we need to update the left count of current node.
- If we insert the new node to the right of a current node, we need to update k according to the left count of current value. Because when we insert it to the right, the new node will be “after” current node. And there are already “left count of current node” before the new node. So we need to update $k$ when we recursively insert it to the right.

public class Solution { public int[][] reconstructQueue(int[][] people) { Arrays.sort(people, (int[] p1, int[] p2) -> { if (p1[0] != p2[0]) { return p2[0] - p1[0]; } else { return p1[1] - p2[1]; } }); LinkedList<int[]> ret = new LinkedList<>(); LinkedList<int[]> stack = new LinkedList<>(); for (int i = 0; i < people.length; i++) { int[] p = people[i]; while (p[1] < ret.size()) { stack.push(ret.pop()); } while (p[1] > ret.size()) { ret.push(stack.pop()); } ret.push(p); } while (!stack.isEmpty()) { ret.push(stack.pop()); } int[][] result = new int[people.length][2]; for (int i = 0; i < people.length; i++) { result[i] = ret.pollLast(); } return result; } }

public class Solution { public int[][] reconstructQueue(int[][] people) { if (people.length == 0) { return people; } Arrays.sort(people, (int[] p1, int[] p2) -> { if (p1[0] != p2[0]) { return p2[0] - p1[0]; } else { return p1[1] - p2[1]; } }); TreeNode root = new TreeNode(people[0]); for (int i = 1; i < people.length; i++) { insert(root, new TreeNode(people[i]), people[i][1]); } int[][] ret = new int[people.length][2]; LinkedList<TreeNode> stack = new LinkedList<>(); TreeNode p = root; while (p != null) { stack.push(p); p = p.left; } int i = 0; while (!stack.isEmpty()) { TreeNode current = stack.pop(); ret[i++] = current.people; p = current.right; while (p != null) { stack.push(p); p = p.left; } } return ret; } private void insert(TreeNode root, TreeNode insertNode, int currentK) { if (currentK < root.leftCount) { root.leftCount++; if (root.left == null) { root.left = insertNode; } else { insert(root.left, insertNode, currentK); } } else { if (root.right == null) { root.right = insertNode; } else { insert(root.right, insertNode, currentK - root.leftCount); } } } private class TreeNode { int[] people; int leftCount; TreeNode left; TreeNode right; public TreeNode(int[] people) { this.people = people; this.leftCount = 1; this.left = null; this.right = null; } } }

Sorting is $O(nlogn)$.

**For stack solution:**

Processing each element is potentially $O(n)$, since we may need to remove all elements from result or move all element prior to current element back to result.

So total complexity should be $O(n^2)$.

**For binary tree solution:**

If the tree is balanced, we need $O(logn)$ to insert a node, accounting for $O(nlogn)$ in total. And in-order traverse is $O(n)$. So average complexity is $O(nlogn)$.

However, the tree can be imbalanced. The worst case is still $O(n^2)$.

]]>

That solution seems easy to understand. However, it turns out to be very difficult to implement. The hardest part is how we come up with a correct way to handle borders.

After reading some post, I found there is another way to achieve findKth number from two sorted arrays, and the running time complexity is $O(log(min(m, n)))$ instead of $O(log(m + n))$, even better than the requirement.

Let’s go through the question again.

There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Assuming that m is always smaller or equal than n, to find out the Kth number in these two arrays, we can divide array A and array B to two parts like the following:

A[0], A[1], …, A[i – 1] / A[i], A[i + 1], …, A[m – 1]

B[0], B[1], …, B[j – 1] / B[j], B[j + 1], …, B[n – 1]

The length of first part of A is i, and the length of first part of B is j. In this case, we can choose any i from 0 to m. If i is 0, it means that there is no value in left part. If i is m, it means that there is no value in right part. It’s the same for j.

Assuming $i + j == k$, in the same time if we can make sure that the maximum value of the left parts is smaller than or equal to the minimum value of right parts, the Kth value must be A[i – 1] or B[j – 1] (depending on which is larger).

But how can we make sure that all values in left parts are always smaller than any value of right parts?

We’ve already known that the left part of A is smaller than right part of A, and left part of B is smaller than right part of B. The only thing we need to make sure is that the left part of A is smaller than right part of B, and the left part of B is smaller than right part of A. Since array A and array B are sorted, we can come up with the following condition:

- A[i – 1] <= B[j]
- B[j – 1] <= A[i]

We noticed that when i == 0 or j == 0, index i – 1 and j – 1 are invalid. However, if i == 0, it means that there is no value in the left part of A. We don’t even need to worry about the first condition. This is the same when j == 0.

We are going to find the Kth number. So if i is decided, j must equal to k – i, which can make the length of left parts to k. Binary search can be used here to find out the correct value of i.

- If the first condition doesn’t hold, which means A[i – 1] > B[j], we must decrease i. Why? Since when we try to decrease i, A[i – 1] is getting smaller. In the same time, j = k – i is getting larger, which makes B[j] getting larger. So decreasing i makes it possible to achieve the first condition.
- If the second condition doesn’t hold, which means B[j – 1] > A[i], we must increase i. The reason is similar.

public class Solution { public double findMedianSortedArrays(int[] nums1, int[] nums2) { int m = nums1.length; int n = nums2.length; if (m > n) { return findMedianSortedArrays(nums2, nums1); } if ((m + n) % 2 != 0) { return (double)findKth(nums1, nums2, (m + n + 1) / 2); } else { return ((double)findKth(nums1, nums2, (m + n) / 2) + findKth(nums1, nums2, (m + n) / 2 + 1)) / 2; } } private int findKth(int[] nums1, int[] nums2, int k) { int iMin = Math.max(0, k - nums2.length); int iMax = Math.min(nums1.length, k); while (iMin <= iMax) { int i = (iMin + iMax) / 2; int j = k - i; if (i > 0 && j < nums2.length && nums1[i - 1] > nums2[j]) { iMax = i - 1; } else if (j > 0 && i < nums1.length && nums2[j - 1] > nums1[i]) { iMin = i + 1; } else { if (i == 0) { return nums2[k - 1]; } if (j == 0) { return nums1[k - 1]; } return Math.max(nums1[i - 1], nums2[k - i - 1]); } } return 0; } }

There are some optimizations here. In the analysis part, I mentioned that we can choose i from 0 to m. However, if k is quite big, we should have the minimum length of left part of array A. Otherwise even if we are using all values from array B, the length of left parts cannot reach k. If you change it back to 0 and m for iMin and iMax, you can try with test case A=[1], B[1].

As mentioned above, the time complexity is $O(log(min(m, n)))$.

]]>

There is no one type of data management system that meets every needs. In most cases we will have a primary source-of-truth system and some other data systems. But we need to maintain the consistency between the primary system and other systems. There are two possible type of solutions:

- Application-driven Dual writes: Application writes to both systems. But it could be hard to handle the error cases. For example, if application succeed writing to the main system but fail to write to the another one, it should have the logic to roll back the change in main system. This leads to the complexity of applications.
- Database Log Mining: We make the database the single source-of-truth and extract changes from its transaction or commit log.

Databus follows “log mining” option.

- No additional point of failure.
- Source consistency preservation.
- Capture transaction boundaries.
- Capture Commit order.
- Capture Consistent state: We can miss changes but we can’t miss the last update.

- User-space processing. This means that the processing of the change set is in user side. Benefits are as follows:
- Reduce the load on the database server.
- Avoids affecting the stability of the primary data store.
- Decouples the subscriber implementation from the specifics of the database server implementation.
- Enabled independent scaling of the subscribers.

- No assumption about consumer uptime.
- Isolation between Data-source and consumers.
- Allow multiple subscribers.
- Support different types of partitioning for computation tasks.
- Isolate the source database from the number of subscribers. (Increasing number of subscribers does not impact the performance of database.)
- Isolate the source database from slow or failing subscribers.
- Isolate the subscribers from the operational aspects of the source database: database system choice, partitioning, schema evolution.

- Low latency of the pipeline.
- Scalable and Highly available.

- A fetcher which extracts changes from the data source or another Databus component.
- A log store which caches this change stream.
- A snapshot store which stores a moving snapshot of the stream.
- A subscription client which pulls change events seamlessly across the various components and surfaces them up to the application.

- Support guaranteed at-least once delivery semantics.
- An event may be delivered several times only in the case of failures in the communication channel between the relay and the client, or in case of a hard failure in the consumer application.
- Consumers need to be idempotent in the application of the delivered events.

- Each change set is annotated with monotonically increasing system change number(SCN), which is assigned by the data source and typically system specific.
- States are in consumer side since we want to support a large number of consumer.

- Mapping from change set to SCN is immutable and assigned at commit time by the data source.

- Changes extracted by the fetcher are serialized into a binary format that is independent of the data source. They are grouped together within transaction window boundaries, annotated with the clock value or SCN associated with the transaction and buffered in the transient log.
- Relay does not maintain consumer-related state. Consumer application progress is tracked through a consumer checkpoint maintained by the subscription client and passed on each pull request. The checkpoint is portable across relays.
- Relay does not know if a given change set has been processed by all interested consumers. A time or size-based retention policy at the relay tier is used to age out old change sets. Even if there are consumer in a very bad state for a long time, they can still pull the changes from bootstrap service.
- Relay Cluster Deployment
- Connect all relays to source: 100% availability if one of the relay do not fail. But it increase the load on data source.
- One leader connecting to source, and other followers connecting to leader: Very small impact to source. But small down time when leader fails. (When leader fails, a follower is elected to be the new leader and connect to source. Other followers will disconnect from the failed leader and connect to new leader.)

- New relay servers can be added: Some streams are transferred from the old relay. Managed by Helix.

- Dumping all data from database leads to greatly increased load on the database that is serving online traffic.
- Getting a consistent snapshot of all rows by running a long running query is difficult.
- Much efficient to catch up using a snapshot store which is compacted representation of the changes.

- If we read it in one time, it’s too big to process.
- We should allow batch read while new changes are applied to the snapshot store.
- After reading the snapshot, we can just apply the new changes.

Three primary categories of partitioning scenarios

- Single consumer: A consumer subscribing to the change stream from a logical database must be able to do so independently of the physical partitioning database. This is supported by just doing a merge of the streams.
- Partition-aware consumer: Consumer can chooser which partition it’s interested in.
- Consumer groups: When change stream is too fast for a single consumer to process, we can have a group of consumer to consume the change events.

Oracle 10g and later version provide a feature that provides ora_rowscn pseudo column which contains the internal Oracle clock at transaction commit time. But this column is not indexable. To make it available to capture the transactions spanning multiple tables, we need to have a **txn** column to all the tables that we wish to get change from. We also have a table call TxLog for the trigger that is triggered on every transaction.

Txlog table has following columns: txn, scn, ts ,mask, ora_rowscn.

Changes can be pulled with the query:

select src.* from T src, TxLog where scn > lastScn AND ora_rowscn > lastScn AND src.txn = TxLog.txn;

Drawbacks of trigger-based approach:

- It can miss intermediate changes to row because it only guaranteed to return the latest state of every changed row, which is not ideal but fine.
- Triggers and the associated tables that they update cause additional load in terms of read and writes on the source database.

- RDBMS has some shortages and it costs a lot both in terms of licensing and hardware costs.
- Relational Database installation requires costly, specialized hardware and extensive caching to meet scale and latency requirements.
- Adding capacity requires a long planing cycle. Cannot do it with 100% uptime.
- Data model (Or schema) don’t readily map to relational normalized forms. Schema changes on the production database incur a lot of DBA time and machine time when the datasets are large.

- Voldemort store(inspired by Dynamo) is an eventually consistent key-value store. It’s initially used for soft-state and derived data sets and it’s increasingly being used for primary data that does not require a timeline consistent change capture stream.
- Essential requirement for Espresso
- Scale and Elasticity.
- Consistency
- Integration: ability to consume a timeline consistent change stream directly from a source-0f-truth system
- Bulk Operations: ability to load/copy all or part of a database from/to other instances, Hadoop and other datacenters, without downtime
- Secondary Indexing: keyword search, relational predicates
- Schema Evolution: forward and backward compatible schema evolution
- Cost to serve: RAM provisioning proportional to active data rather than total data size

- Transaction support: MySQL does not support transaction beyond a single record. Expresso supports a hierarchical data model and provides transaction support on related entities.
- Consistency model: Read and write are served by master node. Replication between master and slave is either asynchronous or semi-synchronous to make it
*timeline consistent*. When master failure happens, the cluster manager promotes one of the slave replicas to be the new master to maintain the availability of system. - Integration with the complete data ecosystem: Providing out-of-the-box access to the change stream, both online and offline.
- Schema awareness and rich functionality: Espresso is not schema-free like other NoSQL stores. Enforcing schema make sure the data is in a consistent way, and also enable key features like secondary indexing and search, partial updates to a documents and projections of fields in a document.

- Common use case
- Nested Entities.
- Example: All messages that belongs to a mailbox and any statistics associated with the mailbox. All comments that belongs to a discussion and the meta-data associated with the discussion.
- Primary write pattern: Creating new entities and/or updating existing entities. Mutations often happen in a group and atomicity guarantees here are helpful in simplifying the application logic.
- Read pattern: Unique-key based lookups of the entities, filtering queries on a collection of like entities or consistent read of related entities.

- Independent Entities.
- Example: People and Jobs.
- Write pattern tend to be independent inserts/updates. Application is more forgiving of atomicity but need guarantees that updates to both entities must eventually happen.

- Nested Entities.
- Data hierarchy
- Document: Smallest unit of data represented in Espresso. Just like row in SQL table.
- Table: Collection of like-schema-ed documents. Just like table in SQL world.
- Document Group: A collection of documents that live within the same database and share a common partitioning key. It’s not explicitly represented. Document groups span across tables and form the largest unit of transactionality.
- Database: Largest unit of data management. Just like databases in any RDMBS.

- Read
- Write
- Conditionals: Rarely used.
- Multi Operations: Batch operations.
- Change Stream Listener: Databus is used here to allow observer to observe all mutations happening on the database while preserving the commit-order of the mutations with in a document group.

- Load from Hadoop job
- Hadoop job output a specialized format
- Hadoop job notice Espresso cluster to load data
- Progress can be monitored

- Data Export
- Databus is used to provide real-time stream of updates which we persist in HDFS.
- Periodic jobs additionally compact these incremental updates to provide snapshots for downstream consumption.

- Clients and Routers
- Client sends a request to Espresso endpoint by sending an HTTP request to a router.
- Router forwards the request to appropriate storage nodes, and assembles a response.
- Routing logic use the partitioning method specified in database schema and applies the appropriated partitioning function.
- If it does not contain a partitioning key, such as index search query on whole data set, router will query all storage nodes and sends the meged result set back.

- Storage Nodes
- Replicas are maintained using a change log stream.
- Consistency checking is performed between master and slave partitions and backups.

- Databus relays
- Low replication latency and high throughput.

- Cluster Managers
- Apache Helix is used. Given a cluster state model and system constraints as input, it computes an ideal state of resource distribution, monitors the cluster health and redistributes resources upon node failure.
- Helix assigns partitions to storage nodes with these constraints:
- Only one master per partition.
- Master and slave partitions are assigned evenly across all storage nodes.
- No two replicas of the same partition may be located on the same node or rack.
- Minimize partition migration during cluster expansion.

- Use case: Selecting a set of documents from a document group based on matching certain predicates on the fields of the documents.
- In key-value model, there are two ways to achieve this:
- Fetch all rows and perform filtering: Slow.
- Maintain the primary relationship and reverse-mappings for every secondary key: Create potential divergence.

- Key requirement
- Real-time indexes.
- Ease of Schema Evolution.
- Query flexibility.
- Text search.

- First attempt with Apache Lucene:
- Fulfill requirements 2, 3 and 4.
- Drawbacks
- Not designed for realtime indexing requirements.
- Entire index needs to be memory-resident to support low latency query response times.
- Updates to documents require deleting the old document and re-indexing the new one.

- Second attempt with Prefix Index.

- Load balancing during request processing time
- Efficient and predictable cluster expansion

- Each change set is annotated with a monotonically increasing system change number(SCN).
- SCN has two parts: generation number and sequence number.
- Each mastership transfer, generation number increase by one.
- Each new transaction, sequence number increase by one.

- Replication layer is designed to address MySQL problems.
- Consistency checker: Calculate the checksum of certain number of rows of master partition and comparing to the checksum of slave partition. On detection of errors, recovery mechanisms are applied.

- When a storage node fails, each master partition on the failed node has to be failed over. A slave partition on a healthy node is selected to take over. The slave partition drains any outstanding change events from databus and then transitions into a master partition. SCN is changed from (g, s) to (g + 1, 1).
- How to detect storage failures?
- Zookeeper heartbeat.
- Monitor performance.

- Transient unavailability is there for the partitions mastered on the failed node. Helix always promotes a slave partition which is closest in the timeline to the failed master. Router can optionally enabled slave reads to eliminate the read unavailability. After master transition finishes for a partition, routing table is changed on Zookeeper by Helix, which allow router to direct the request accordingly.
- Databus is fault-tolerant. Each relay has several replicas. One relay is designated as leader and others are followers.
- The leader relay connects to the data source.
- The follower relays pull the change from leader.
- While leader fails, one of the followers is elected as leader and will connect to source. Other followers will disconnect from the failed leader and connect to new leader.

- Helix itself is stateless.

- Certain master and slave partitions are selected to migrate to new nodes.
- Helix will calculate the smallest set of partitions to migrate to minimize the data movement and cluster expansion time.

- Company pages
- MailboxDB
- USCP (User social content platform)

- Espresso does not provide support for global distributed transactions.
- Transaction support within an Entity group is richer most distributed data systems, such as MongoDB, HBase and PNUTS.
- Among the well known NoSQL systems, MongoDB is the only one that offers rich secondary indexing capability like Espresso. But Espresso has better RAM:disk utilization.
**Except MongoDB, no other NoSQL system offer rich secondary indexing capability that Espresso offers**. - HBase and BigTable follow a shared-storage paradigm by using a distributed replicated file system for storing data blocks. Espresso uses local shared nothing storage and log shipping between masters and slaves with automatic failover, similar to MongoDB. This guarantees that queries are always served out of local storage and delivers better latency on write operations.
- The multi-DC operation of Espresso differs significantly from other systems.
- Voldemort and Cassandra implements quorums that span geographic regions.
- MegaStore and Spanner implements synchronous replica using Paxos.
- PNUTS implement record level mastership and allows writes only on geographic master.
- Espresso relaxes consistency across data centers and allows concurrent writes to the same data in multiple data centers relying on the application layers to minimize write conflicts.
- Conflict detection and resolution schemes are employed to ensure that data in different data center eventually converges.

Lots of “log” data generated every day, including

- user activities like login, page views, clicks, likes, and other queries
- machine metrics like CPU, memory usage.

This is not only for offline analytics, but also very useful in online services. Usage may includes

- search relevance
- recommendation performance
- ad targeting and reporting
- security things.

The traditional way is to dump the log file on each machine. But it’s time consuming and not efficient. And it only works for offline analytics. There are other distributing log aggregators including Facebook’s Scribe, Yahoo’s Data Highway but they are primarily designed for data ware house and hadoop usage. We have the needs for online usage, with the delays of no more than a few seconds.

- Distributed and scalable, offering high throughput.
- API is similar to messaging system. Application can consume it in real time.

- Mismatch in features: they are focusing on offering delivery guarantees, which is overkill for collecting log data.
- Cannot meet throughput requirement: very high cost when sending a message.
- Weak in distributed support.
- Assuming near immediate consumption of messages: queue of unconsumed message is always small. If it increases, their performance downgrades.

- A stream of messages of a particular type is defined by a
*topic*. - A
*producer*can publish message to a topic. - Published message are stored at a set of services call
*brokers*. - A
*consumer*can subscribe to one or more topics from the brokers and consume the subscribed messages by**pulling**the data from brokers.

- Topic is divided into multiple partitions and each broker stores one or more of those partitions.

- Simple Storage
- Each partition of a topic corresponds to a logical log, and each log is implemented as a set of segment files of approximately same size.
- When there is a new message from producer, the broker just append it to the last segment file.
- Only flush after a configurable number of messages.
- A message is only exposed to consumers after it’s flushed.
- No message id: each message is addressed by its logical offset in the log.
- Consumer always consumes messages from a particular partition sequentially:
- If a particular message offset is acknowledged, it means all messages before this offset are consumed.
- Pull request contains the offset of the message and acceptable number of bytes to fetch.
- Broker has a sorted list of offsets, including the offset of first message in every segment file.

- Efficient transfer
- Producer can submit a set of messages in a single send request. Consumer will receive several messages in a request even when they are processing the message one by one.
- No caching: Relying on the file system page cache. Both producer and consumer access the segment files sequentially and consumer often lagging the producer by a small amount.
- An API “sendfile” is used to reduce unnecessary 2 copies and 1 system call.

- Stateless Broker
- How much each consumer has consumed is not maintained by the broker, but by the consumer it self.
- But how to delete the message? Time-based SLA for retention policy: A message is automatically deleted if it has been retained in the broker longer than a certain period, typically 7 days.
- Side benefit: A consumer can deliberately rewind back to an old offset and re-consume the data.

- Each producer can publish a message to either a randomly selected partition or a partition semantically determined by a partitioning key and partitioning function.
- Consumer groups:
- Each message is delivered to only one of the consumers within the group.
- Different groups each independently consume the full set of subscribed messages and no coordination is needed across consumer groups. (Reduce the complexness of locking and state maintenance. To make it to be truly balanced, we need many more partitions in a topic than the consumers in each group)
- No central “master” node. Zookeeper is used here. Zookeeper’s feature:
- Create a path, set the value of path, read the value of path, delete path, and list the children of a path.
- One can register a watcher on a path and get notified when the children of a path or the value of a path has changed.
- A path can be created as ephemeral, which will be automatically removed when creating client is gone.
- It replicas data to multiple servers.

- What is Zookeeper used for?
- Detecting the addition and removal of brokers an consumers.
- Triggering the rebalance process in each consumer.
- Maintaining the consumption relationship and keeping track of the consumed offset of each partition.

- Detail about Zookeeper:
- When broker or consumer starts up, it stores its information in a broker or consumer registry in Zookeeper. Broker registry contains host name and port, and the set of topics and partitions stored on it. The consumer registry includes the consumer group to which a consumer belongs and the set of topics that it subscribes to.
- Each consumer group is associated with an ownership registry and an offset registry. Ownership registry has one path for every subscribed partition and the path value is the id of the consumer currently consuming from this partition. The offset registry stores for each subscribed partition, the offset of the last consumed message in the partition.
- Paths are created in Zookeeper are ephemeral for the broker registry, the consumer registry, and the ownership registry, and persistent for the offset registry.
- Rebalancing happens when the initial startup of a consumer or when the consumer is notified about a broker/consumer change through the watcher.
- Rebalancing algorithm:
- Calculate the set of available partitions each subscribed topic T.
- Calculate the set of consumers subscribe to T.
- Calculate N = (number of available partitions)/(number of available consumers).
- Each consumer will be assigned N partitions: Writes to owner and start consuming data from the offset registry.

- Delivery guarantees
- Only guarantee at-least-once delivery.
- Consumer can have their own de-duplication logic if they cares about duplicates.
- Messages from a single partition are delivered to a consumer in order. But not guarantee on the ordering of messages coming from different partitions.
- CRC for each message is used to avoid log corruption.
- If a broker goes down, any messages stored on it not yet consumed becomes unavailable. If the storage system is permanently damaged, message is lost forever.

- Online consuming and offline jobs.
- Tracking: Monitoring event is used to validate data loss.
- Avro is used as serialization protocol.

- Producer doesn’t wait for acknowledgment from the broker.
- More efficient storage format.

- Data replication.
- Window-based counting.

Problem description:

Find two missing number from 1 to NGiven an array of size N-2, containing integer numbers from 1 to N, but there is two number missing. Return the missing number.

The difference of this problem to Find one missing number from 1 to N is that now we have two number missing. Now how can we solve this?

Assuming the array given is A[] with length L, then we can easily know that $N = L + 2$, since there are two number missing in this array.

Of course, it’s very straight forward. You can use a boolean array B[] with size is N. Go through A[], for each number $n$, set B[n – 1] to true. In the end you will find there is only two places in B[] are false. Now you get it!

Time and space complexity: $O(N)$

We can calculate the sum and product of each element in array A[], we can call it $S$ and $P$. And we know the sum and product of 1 to N, named $S_N$ and $P_N$. Assuming the missing numbers are $x$ and $y$, we have:

$x + y + S = S_N$

$x * y * P = P_N$

There is only two unknown number in this equation set, it’s easy to calculate it.

Time complexity: $O(N)$

Space complexity: $O(1)$

Potential problem: When we calculate the product, it can overflow.

Sorting is valid to solve this problem. However, the time complexity becomes $O(nlogn)$.

This method is highly recommended.

We assume that the missing numbers are $x$ and $y$.

We use a similar approach in problem Find one missing number from 1 to N. We do a XOR through all number in A[] and all number from 1 to N. But if we apply the same logic here, we only get an XOR of this two missing number, which is $Z = x XOR y$.

How can we get $x$ and $y$ separately? Since $x$ is not equal to $y$, $Z$ is not equal to $0$. So there must be one bit that is set in $Z$. For example ,if $x = 2 = 010_{(2)}$ and $y=4=100_{(2)}$, then $Z=110_{(2)}$. You can see there are two bit set in $Z$. The corresponding bit of these two missing number must be different at the bit that is set in $Z$. Why? Because $Z = x\ XOR\ y$. If there is a bit that is set in $Z$, the corresponding bit in $x$ and $y$ must be different to make it “1”.

We can pick any bit that is set in $Z$. To make it easy, we can just use the least significant bit, calling it mask $m$. Still using the example above, $m = Z \& (Z-1)=100_{(2)}$. It’s obvious that $m \& x != m$ while $m \& y = m$.

Then we can use the following logic to separate the number from 1 to N to two groups $X$ and $Y$: For any number $i$ from 1 to N, if $m \& i \neq m$, then $i$ belongs to group $X$, otherwise, it belongs to group $Y$. Assuming the number is 1 to 5, and $m = 100_{(2)}$, we have:

$X: 1, 2, 3$

$Y: 4, 5$

Is this useful? Of course. When we get $m$, we can apply this logic for all numbers in A[], and all numbers from 1 to N. The group will be:

$X: 1, 3, 1, 2, 3 $

$Y: 5, 4, 5$

You can see that all number shows twice, except the missing numbers. And missing numbers are already separated to different groups. Now we can just do XOR for all numbers in group $X$ to get missing number $x$, and do XOR for all numbers in group $Y$ to get missing number $y$.

Actually, we don’t even need to save it to group. We can do XOR on the fly to save the space.

Time complexity: $O(N)$.

Space complexity: $O(1)$.

public int[] findTwoMissingNumber(int[] nums) { int N = nums.length + 2; int m = 0; for (int n : nums) { m = m ^ n; } for (int i = 1; i <= N; i++) { m = m ^ i; } int k = 1; while ((m & k) == 0) { k = k * 2; } m = m & k; int x = 0; int y = 0; for (int n : nums) { if ((m & n) == m) { x = x ^ n; } else { y = y ^ n; } } for (int i = 1; i <= N; i++) { if ((m & i) == m) { x = x ^ i; } else { y = y ^ i; } } return new int[]{x, y}; }]]>

It’s quite interesting and it’s easy to get tired even if you just play half an hour.

Special thanks to Dong for GoPro.

]]>**Find one missing number from 1 to N**

Given an array of size N-1, containing integer numbers from 1 to N, but there is one number missing. Return the missing number.

Assuming the array given is A[], it’s easy to get N since we have the size of the array: N = A.length + 1.

The solution of $O(N)$ time and $O(N)$ space is intuitive. We can use a boolean array flag[] of size N, setting all of them to false initially. Then we can scan through A[], if we get value A[i], we can just set flag[A[i] – 1] to true. After we finish scanning, we can go through flag[] to see which value is false. If flag[j] is false, then the missing number is j + 1.

If we can modify the given array, we can use sorting to solve this problem. However, we won’t use the regular $O(NlogN)$ sorting method. Since the range of numbers in this array is only from 1 to N, we can use swapping to sort the array. When we scan the array at position i, if A[i] is not equal to i + 1, we can swap it with A[A[i] – 1], if A[i] is not N. After swapping, we can go through this array to see which number is missing. The array is sorted, so it’s easy to find the missing number. The complexity of this method is O(N), you can check the code below.

If we cannot modify the given array, there are still some ways to reach $O(N)$. Since we already know N, it’s easy to calculate the sum of $1+2+…+N$, which is just $\frac{N(N+1)}{2}$. And we can calculate the sum $S$ of the array. Then the missing number will be $n=\frac{N(N+1)}{2}-S$. However, it’s possible that N is quite big that the sum of the array can overflow. Using long could solve this problem but it’s not a good idea. We have a better approach to use XOR.

We know that a number XOR with itself will be 0. And any number XOR 0 will still be that number. So we can go through the array and calculate the XOR value by x = A[0] XOR A[1] XOR A[2] XOR …. XOR A[N – 1]. Then we can XOR x with numbers from 1 to N to get the missing number: n = x XOR 1 XOR 2 XOR 3 …. XOR N. The other numbers except the missing number will eventually XOR with itself to become 0. The missing number will be XOR with zeros, which is still itself. So the result is just what we want.

// Brute force public int findMissingNumberBruteForce(int[] A) { int N = A.length + 1; boolean[] flag = new boolean[N]; // Of course you can combine these two for-loops together. for (int i = 0; i < A.length; i++) { flag[A[i] - 1] = true; } for (int i = 0; i < N; i++) { if (!flag[i]) return i + 1; } return N; } // Sum public int findMissingNumberWithSum(int[] A) { int N = A.length + 1; int sum = 0; // sum can overflow here. for (int i = 1; i <= N; i++) { sum += i; } for (int i = 0; i < A.length; i++) { sum -= A[i]; } return sum; } // Sorting public int findMissingNumberWithSorting(int[] A) { int N = A.length + 1; int i = 0; while (i < A.length) { // Correct position if (A[i] == i + 1) i++; else { // If A[i] is N, we are not able to swap it with other elements. if (A[i] == N) i++; else { // Swap the element. int tmp = A[i]; A[i] = A[A[i] - 1]; A[tmp - 1] = tmp; } } } // Try to find mismatch. If there is no mismatch, the missing number is N. for (int j = 0; j < A.length; j++) { if (A[j] != j + 1) return j + 1; } return N; } // XOR public int findMissingNumber(int[] A) { int N = A.length + 1; int n = 0; // Of course you can combine these two for-loops together. for (int i = 0; i < A.length; i++) { n = n ^ A[i]; } for (int i = 1; i <= N; i++) { n = n ^ i; } return n; }

This problem can be changed to “Find **two** missing numbers from 1 to N”. I am going to cover it in the next post.

**Palindrome Partitioning**

Given a string *s*, partition *s* such that every substring of the partition is a palindrome.

Return all possible palindrome partitioning of *s*.

For example, given *s* = `"aab"`

,

Return

[ ["aa","b"], ["a","a","b"] ]

**Palindrome Partitioning II**

Given a string *s*, partition *s* such that every substring of the partition is a palindrome.

Return the minimum cuts needed for a palindrome partitioning of *s*.

For example, given *s* = `"aab"`

,

Return `1`

since the palindrome partitioning `["aa","b"]`

could be produced using 1 cut.

In these two problem, we often needs to check if the substring(i, j) of the original string is palindrome or not. So we need to save this very important information to an array to prevent checking a same substring for several times. This is a 2-D boolean array which is quite easy to generate. We can generate it from short length to long length. When s.charAt(i) is equal to s.charAt(j), if the array isPalindrome[i + 1][j – 1] is true, then we can update isPalindrome[i][j] = true.

In the first problem, we can use recursive function to generate all possible string lists.

In the second problem, if we use recursive solution, we will got TLE. So a DP solution is needed. We can use a 1-D array D[n] to save the minimum cut. For example, D[i] saves the number of minimum cut of substring(i, n). We can start from i = n – 1, and move i from right to left. When we want to get the D[i] for a new i, we can check every possible substrings from i to n, which means we can use another point j, in which j is between i and n. If substring(i, j) is a palindrome, then we can update D[i] = min(D[i], 1 + D[j + 1]).

This method can also be used in the first problem to save more time.

public class Solution { List<List<String>> ret; public List<List<String>> partition(String s) { int n = s.length(); boolean[][] isPalindrome = new boolean[n][n]; for (int i = 0; i < n; i++) isPalindrome[i][i] = true; for (int i = n - 1; i >= 0; i--) { for (int j = i + 1; j < n; j++) { if (s.charAt(i) == s.charAt(j)) { if (j - i < 2 || isPalindrome[i + 1][j - 1]) isPalindrome[i][j] = true; } } } ret = new LinkedList<>(); List<String> list = new LinkedList<>(); partition(s, 0, isPalindrome, list); return ret; } private void partition(String s, int start, boolean[][] isPalindrome, List<String> list) { if (start == s.length()) { List<String> newList = new LinkedList<>(); newList.addAll(list); ret.add(newList); return; } for (int i = start; i < s.length(); i++) { if (isPalindrome[start][i]) { list.add(s.substring(start, i + 1)); partition(s, i + 1, isPalindrome, list); list.remove(list.size() - 1); } } } }

import java.lang.reflect.Array; public class Solution { public List<List<String>> partition(String s) { int n = s.length(); boolean[][] isPalindrome = new boolean[n][n]; for (int i = 0; i < n; i++) isPalindrome[i][i] = true; for (int i = n - 1; i >= 0; i--) { for (int j = i + 1; j < n; j++) { if (s.charAt(i) == s.charAt(j)) { if (j - i < 2 || isPalindrome[i + 1][j - 1]) isPalindrome[i][j] = true; } } } List<List<String>>[] palindromes = (List<List<String>>[])Array.newInstance(List.class, n + 1); palindromes[n] = (List)(new LinkedList<List<String>>()); List<String> emptyList = new LinkedList<>(); palindromes[n].add(emptyList); for (int i = n - 1; i >= 0; i--) { palindromes[i] = (List)(new LinkedList<List<String>>()); for (int j = i; j < n; j++) { if (isPalindrome[i][j]) { List<List<String>> lists = palindromes[j + 1]; String substring = s.substring(i, j + 1); for (List<String> list : lists) { List<String> newList = new LinkedList<>(); newList.add(substring); newList.addAll(list); palindromes[i].add(newList); } } } } return palindromes[0]; } }

The creation of generic array: http://www.quora.com/Java-programming-language/Why-does-Java-prohibit-generic-array-creation.

public class Solution { public int minCut(String s) { int n = s.length(); boolean[][] isPalindrome = new boolean[n][n]; for (int i = 0; i < n; i++) isPalindrome[i][i] = true; for (int i = n - 1; i >= 0; i--) { for (int j = i + 1; j < n; j++) { if (s.charAt(i) == s.charAt(j)) { if (j - i < 2 || isPalindrome[i + 1][j - 1]) isPalindrome[i][j] = true; } } } int[] minCut = new int[n + 1]; for (int i = n; i >= 0; i--) minCut[i] = n - 1 - i; for (int i = n - 1; i >= 0; i--) { for (int j = i; j < n; j++) { if (isPalindrome[i][j]) { minCut[i] = Math.min(minCut[i], 1 + minCut[j + 1]); } } } return minCut[0]; } }

In the first problem, assuming the total number of strings is k, then the complexity is $O(n^2 + k)$.

The complexity of second problem is $O(n^2)$.

]]>Say you have an array for which the *i*th element is the price of a given stock on day *i*.

Design an algorithm to find the maximum profit. You may complete at most *two* transactions.

**Note:**

You may not engage in multiple transactions at the same time (ie, you must sell the stock before you buy again).

In this problem, we are only allowed to complete at most two transactions. We can use the way we used in “Best Time to Buy and Sell Stock I”. For each i, calculate the maximum profit from o to i, and add them together, finding the maximum profit. But every i will costs $O(n)$. So it will cost $O(n^2)$.

We can use DP to optimize it. Two arrays is needed to save the information. One is to save the maximum profit of (0, i) for every possible position i. And another one is to save the maximum profit of (i, N).

To calculate to arrays, we can go through the array from different direction. For example, to calculate the first array. We start from 0, and try to find the maximum profit until position i.

public class Solution { public int maxProfit(int[] prices) { if (prices.length == 0) return 0; int[] profitUntil = new int[prices.length]; int[] profitFrom = new int[prices.length]; //Calculate the profit until i. int minValue = prices[0]; profitUntil[0] = 0; for (int i = 1; i < prices.length; i++) { minValue = Math.min(minValue, prices[i]); profitUntil[i] = Math.max(profitUntil[i - 1], prices[i] - minValue); } //Calculate the profit from i. profitFrom[prices.length - 1] = 0; int maxValue = prices[prices.length - 1]; for (int i = prices.length - 2; i >= 0; i--) { maxValue = Math.max(maxValue, prices[i]); profitFrom[i] = Math.max(profitFrom[i + 1], maxValue - prices[i]); } int maxProfit = 0; for (int i = 0; i < prices.length; i++) maxProfit = Math.max(maxProfit, profitUntil[i] + profitFrom[i]); return maxProfit; } }

The complexity is $O(n)$. We can also combined the third for loop to the second one, but it does not affect the time complexity.

Using this method, we can also calculate the profit if we can complete at most M transactions. For example, if we are allowed to complete 3 transactions, we can divided the array to two parts at position i. In the first part, we will calculate the maximum profit from 0 to i. For the second part, we can treat it as an array that are allowed to use two transactions. We need to calculate the array for all i positions. The complexity will be $O(n^2)$. In conclusion, if we can complete at most M transactions, the complexity is $O(n^{(M+1)/2})$.

]]>