To branch, or not to branch

2005-12-21 19:00:00 -0500


DIT is the question… One of the most frequently debated topics in directory design involves the decision point on when it is appropriate to introduce a new branch into a DIT/namespace.

There are usually two camps: the first advocating the use of numerous branch points based on organizational units, locations, countries, etc. The second set strongly believes that less is more – that is you should use as few branch points as possible to get the job done. Put architects from different schools of thought in a room and its not unusual for the debate to take on the religious tones of an emacs vs. vi flame war.

There is one, however, one thing that both camps will generally agree on: the debate over DIT design is important because it is fundamentally very difficult to change the directory’s structure once clients start using it. This means that you generally have only one shot at getting it right; else a poor design may well live on into perpetuity.

So which is the best practice? In my opinion, like many things in life, the answer lies somewhere in the middle. The appropriate design is dependant on a number of factors including: the types and schema of objects residing in the directory, the types of common search operations, the system’s security model, directory server software features/functionality, and data flow / feed processes.

Instead of making a blanket recommendation on one side of the debate, I’d like to propose a set of criteria that can be used as a litmustest to evaluate the validity of a branch point decision.

Data segregation
LDAP directories do not enforce the types (objectclasses) of data that may exist under each branch. It is, however, often convenient to separate out data into separate containers based on its type. For example, it is generally considered a good practice to separate inetOrgPerson or User objects underneath an ou=People. A similar convention might be used to separate groupOfNames entries underneath an ou=Groups branch, or application data underneath an ou=,ou=apps branch. This approach makes a lot sense organizationally, and helps to ensure that your top-level namespace makes immediate sense for applications / users that provision data into the tree.

Partitioning
Partitioning is a technique that splits portions of the DIT across multiple stand-alone directory infrastructures. Generally used in large scale directory services, it allows a designer the flexibility of separating the storage, optimization, and control of data across non-replicated trees. In Active Directory, individual domains resemble classic directory partitions, as data is not replicated between them except via Global Catalog.

Search base optimization
In many cases its possible to optimize searches on the directory by selective use of branching. For example, consider a general purpose application directory. In an organization with both internal and external data requirements it may be desirable to introduce intranet and extranet branch points under the root suffix (ou=intranet,dc=yourco,dc=com & ou=extranet,dc=yourco,dc=com). In many cases this is desirable because it allows a client to restrict the results that will be returned by a query by using the search base parameter for an LDAP search. In general, if a client would wish to restrict a search to a Subtree it may be appropriate to introduce a branch point.

Access Control
Many directory service implementations tie Access Control information directly to branch points in the directly. Therefore, its often necessary to introduce a branch points to segregate entries underneath

Conversely, here are some negative-case criteria that describe situations where you should probably NOT introduce a branch point in the directory:

Lack of data
I ran into a directory architect who advocated the use of branch points to address a lack of underlying data in the user entries. Her argument was that since “location�? wasn’t available from any authoritative source, it made good sense to branch on location, and allow the admin provisioning the account to place the user within the appropriate branch. Unfortunately I’ve since seen this technique used in several other organizations, usually to their detriment.

Given that there is an admitted lack of data quality for location, wouldn’t it make more sense to omit it entirely, or store it in an attribute that can be changed in the future? It’s bad enough to have potential bad data in an authoritative security system. Using this data in the described manner actually encodes erroneous information within the structure of the directory.

Summary: Don’t retaliate against poor data quality by trying to over-organize the DIT. Try to attack the problem at its source.

Performance
Consider this scenario: A directory tree with 100,000 entries is rooted at dc=yourco,dc=com, with a Subtree named dc=sales,dc=yourco,dc=com. Lets further stipulate that the sales team is small, so there are only 100 entries in dc=sales,dc=yourco,dc=com. Now you execute a search to resolve a user by uid – (uid=sjlombardo), once with the searchbase rooted at dc=yourco,dc=com, the next time at dc=sales,dc=yourco,dc=com. Which will return faster?

It is a common misunderstanding that is more efficient to execute a search on a limited branch of a directory, even given the same filter criteria. In fact, with most directory servers on the market this is not true. For performance purposes a directory will evaluate indexes first, which makes performance between the two searches nearly identical. Furthermore many backend database formats are non-hierarchical, so the database is unable to optimize for branch point. Therefore, in 90% + of cases, introducing a branch point to optimize search performance is practically useless.

In short, unless you really know what you’re doing and have executed a fair amount of testing, you shouldn’t introduce branch points for the sake of performance.

Earlier, this post alluded to the fact that there was no clear favorite. Despite this statement, if you judiciously apply the criteria described herein, you will often end up with a DIT employing a fairly small set of branch points. That said, there are perfectly legitimate examples of heavily branched directories. This is, after all, one of the most important features of the LDAP model: it is flexible enough to meet highly varied requirements.