Architecture of Entra Connect : The Expert & Awesome Guide

This article explains the basic structure of Microsoft Entra Connect Sync. If you’ve used older tools for syncing identities, some of this might already make sense to you. If you’re new to syncing, this article is a good place to start. You don’t need to know all the details here to successfully customize Microsoft Entra Connect Sync (also called the sync engine in this article).

Architecture Overview

Sync Engine
- The core component that aggregates, processes, and synchronizes identity data from various connected sources.
- Creates an integrated view of identity objects, ensuring consistency across systems.
Connected Data Sources
- External systems or repositories where identity information resides, such as directories (e.g., Active Directory), databases, or cloud identity providers.
- The sync engine retrieves and synchronizes data from these sources.
Identity Information
- Includes attributes such as usernames, email addresses, roles, permissions, and other identity-related data.
- The sync engine processes this information to maintain consistency and provide a unified view.
Integrated View
- A consolidated representation of identity data from multiple sources.
- Serves as the foundation for downstream systems or services that rely on synchronized and consistent identity information.
Set of Rules

Determines how identity information is processed, transformed, and synchronized. These rules may include:

Mapping rules: Align attributes between systems (e.g., map “userPrincipalName” in one source to “email” in another).
Filtering rules: Include or exclude specific objects based on criteria.
Conflict resolution: Handle discrepancies when attributes differ across sources.
Transformation rules: Modify or format data to fit target system requirement.

The sync engine acts as the central processing unit for synchronizing identity information between different connected data sources.

Connected Data Sources (CD)

Definition: Data repositories, such as Active Directory, SQL Server databases, or any database-like systems, that provide standardized data-access methods.
Function: Serve as the external systems where identity information is either read from or written to by the sync engine.

Connectors

Definition: Modules within the sync engine designed to interface with specific types of connected data sources.
Function:
- Translate the operations required by the sync engine into a format that the connected data source understands.
- Act as intermediaries, enabling seamless communication between the sync engine and connected data sources.

API Calls:

Connectors use standard APIs to read and write identity information to/from the connected data sources.
This ensures secure and efficient data exchange.

Custom Connectors:

The sync engine supports the creation of custom Connectors using an extensible connectivity framework.
This feature allows integration with non-standard or proprietary data sources.

The sync engine at the center, acting as the orchestrator.
Various connected data sources on the periphery, such as Active Directory, SQL Server, or custom systems.
Connectors as bridges or translators between the sync engine and each connected data source.

Data Flow Rules

Unidirectional Flow:
- Data can flow either from the connected data source to the sync engine or from the sync engine to the connected data source.
- Data cannot flow in both directions simultaneously for the same object and attribute.
Granular Directionality:
- The direction of data flow can differ based on:
  - Object types (e.g., users, groups, devices).
  - Attributes (e.g., sAMAccountName, email).

Connector Configuration

To ensure proper synchronization, the following steps and settings must be addressed:

Object Types:
- Specify the types of objects (e.g., users, groups, computers) to include in the synchronization.
- This defines the scope of synchronization.
Attribute Inclusion List:
- Select the attributes for synchronization.
- This list determines which attributes of the chosen object types are included in the sync process.
- These settings can be updated to reflect changes in business rules or requirements.
Export Requirements:
- The attribute inclusion list must include all mandatory attributes for creating objects in the connected data source.
- Example:
  - For exporting a user object to Active Directory, the sAMAccountName attribute must be included because it is required for all user objects in Active Directory.
- The Microsoft Entra Connect installation wizard automatically configures these settings during installation.

Partition or Container Support:

If the connected data source uses structural components (e.g., partitions or containers), you can configure the Connector to limit synchronization to specific areas of the connected data source.
This ensures that only relevant objects and attributes are included in the synchronization, optimizing performance and aligning with business needs.

Internal Structure of the Sync Engine Namespace

Connector Space (CS)

Definition: A staging area specific to each Connector, representing designated objects and their attributes from a connected data source.
Purpose:
- To track and stage incoming changes from the connected data source.
- To stage outgoing changes for export back to the connected data source.

Key Features:

Separation from Connected Data Sources:
- Each Connector has its own connector space.
- This independence ensures that the sync engine can process data even if the connected data source is unavailable.
Change Detection and Optimization:
- The sync engine:
  - Detects changes made in the connected data source since the last synchronization.
  - Pushes out only unsynchronized changes to reduce network traffic.
Status Tracking:
- Tracks the synchronization status of all objects and attributes staged in the connector space.
- Evaluates new data against already synchronized data to prevent redundancy.
Attribute Scope:
- Stores only the attributes specified in the attribute inclusion list, ensuring efficiency and focus on relevant data.

Metaverse (MV)

Definition: A global storage area that aggregates and integrates identity information from multiple connected data sources.
Purpose:
- To provide a single, unified view of all synchronized identity objects across data sources.
- Acts as the core repository for identity management in the sync engine.

Key Features:

Aggregation:
- Combines identity data from all connected data sources into a centralized repository.
Rules-Based Synchronization:
- Metaverse objects are created and maintained based on synchronization rules defined by administrators.
- Rules determine how data from connected data sources is aggregated, transformed, and managed.
Identity Management:
- Ensures that duplicate objects across data sources are resolved into a single metaverse object.
- Provides a comprehensive, integrated view of each identity.

Decoupling from Data Sources:

The staging areas (connector spaces) buffer the sync engine from direct dependencies on the connected data sources.
Synchronization tasks can occur independently of connected data source availability.

Efficient Processing:

The sync engine processes only changes, minimizing unnecessary network communication and improving performance.

Scalability:

By maintaining distinct connector spaces for each data source and using the metaverse for integration, the architecture supports scalability across diverse and numerous data sources.

Sync Engine Identity Objects

These are representations of objects either from a connected data source or from an integrated view the sync engine maintains.
GUID (Globally Unique Identifier): Every object in the sync engine must have a GUID. This ensures data integrity and enables relationships between objects to be defined.

Connector Space Objects

The connector space holds representations of the objects from the connected data sources.
GUID: Each object has a GUID, ensuring a globally unique identifier.
Distinguished Name (DN): Also known as DN, it helps in identifying the object in a more structured manner.
Anchor Attribute: If the data source provides a unique attribute for an object (like an employee number or user ID), it is called the anchor. The sync engine uses this to track the object’s identity in the data source, and it assumes this attribute doesn’t change during the object’s lifetime.

Types of Connector Space Objects

Staging Objects
- Represent instances of object types from the connected data source.
- They have the GUID, Distinguished Name, and a type indicator.
- These objects are created during the import or export process and can be flagged as:
  - Pending Import: New information from the data source has been received but not yet processed.
  - Pending Export: New information hasn’t yet been exported to the connected data source.
Staging Objects Types:
- Import Object: Created by the sync engine when new data is imported from the connected data source. The sync engine uses data from the source (e.g., Active Directory’s objectGUID) to create a staging object in the connector space.
- Export Object: If the sync engine needs to export data to a data source, it will use a staging object to represent the data that will be sent.
Placeholder Objects
- These are placeholders used by the sync engine when an object is being created or processed. The placeholder represents an object that is either waiting for additional information or is temporarily not available.

Connector Space and Anchor Attribute

Some connectors automatically generate anchors based on unique attributes of an object (e.g., Active Directory uses objectGUID).
For connectors that do not have a unique identifier, anchor generation can be manually configured to use other attributes, ensuring a unique identifier is present for each object in the connector space.

The Export Object in a sync engine represents an object in the metaverse (the logical space where the synchronized data is stored). These objects are created by the sync engine based on the identity information and business attributes stored in the metaverse, and they serve as the representations of data that will be sent out to the connected data source during the next communication cycle.

Existence in the Metaverse, Not Data Source:

From the sync engine’s perspective, export objects do not yet exist in the connected data source. They are essentially “draft” objects waiting to be created or updated in the source system.

No Anchor Attribute at Creation:

Since export objects are not yet present in the connected data source, they do not have an anchor attribute initially. The anchor, which uniquely identifies the object in the data source, is only created by the data source after it receives the object from the sync engine.

Process Flow:

The sync engine creates the export object in the connector space, using information from the metaverse.
When the data source processes the export object (typically during the next synchronization or communication session), it assigns the anchor attribute, giving the object a unique identity in the source system.

Purpose of the Export Object:

The export object serves as a vehicle for transferring identity and business data from the sync engine (via the metaverse) to the connected data source. After the anchor is assigned by the data source, the sync engine can track the object using that anchor in future communication cycles.

The sync engine must handle data from connected systems that use hierarchical namespaces (like Active Directory) and transform this data into a flat namespace for processing. To achieve this, the sync engine uses placeholders. Placeholders are critical for maintaining the integrity of hierarchical data while working in a flat namespace. Here’s an explanation of how placeholders are used and their role in preserving the hierarchy:

Purpose of Placeholders

Representation of Hierarchical Components:
- In hierarchical namespaces, objects often have names that reflect their position within a tree structure. For instance, Active Directory uses organizational units (OUs), domains, and other nested components to define an object’s location in the directory.
- The sync engine uses placeholders to represent these hierarchical components when syncing with systems that use flat namespaces. These placeholders store parts of an object’s name (such as an OU) that are not directly imported into the sync engine but are necessary to reconstruct the full, hierarchical name.
Filling Gaps:
- If a part of an object’s hierarchical name exists in the data source but is not part of the sync engine’s import process, a placeholder is used to represent this missing component. Placeholders are essentially “empty objects” that stand in for real objects that are not yet part of the connector space but whose references are needed.

Example Use Case

Manager Attribute:
- Consider an object such as Abbie Spencer with a manager attribute pointing to another object, say Lee Sperry. If Lee Sperry has not been imported into the connector space yet, the sync engine cannot fully represent the manager attribute at that time.
- In this case, the sync engine will create a placeholder in the connector space for Lee Sperry using the hierarchical reference (e.g., CN=Lee Sperry,CN=Users,DC=fabrikam,DC=com). This placeholder fills the gap in the data until the actual object (staging object) is imported and processed.
Overwriting Placeholders:
- Once the referenced object, such as Lee Sperry, is imported into the connector space, the placeholder is replaced by the actual staging object representing Lee Sperry. The placeholder is overwritten, and the sync engine can now use the full, imported information for the manager attribute.

Hierarchical Representation: Placeholders allow the sync engine to maintain references to parts of an object’s hierarchical name that aren’t directly imported, such as organizational units or other components in a system like Active Directory.

Filling Missing Information: Placeholders act as temporary objects to ensure that references to missing components (like managers, groups, or organizational units) are maintained until the real data is available.

Replacement with Staging Objects: Once the missing objects are imported into the connector space, placeholders are replaced with the actual data, and the sync engine can resolve the full hierarchical relationships.

Placeholders in the sync engine help bridge the gap between hierarchical data in connected systems and the flat namespace used by the engine. They preserve the hierarchical structure by representing components of an object’s name or referenced objects that haven’t yet been imported, ensuring that the full information can be restored later once the referenced objects are available.

Metaverse Objects in the Sync Engine

A metaverse object is a central concept in the synchronization engine, representing an aggregated or unified view of objects from the connected data sources. These objects provide a consolidated identity view by linking multiple connector space objects (from different data sources) into a single object within the metaverse. Below are key aspects of metaverse objects:

Aggregation of Connector Space Objects:

A metaverse object is an aggregation of information from multiple connector space objects. These objects could represent the same entity (e.g., a user or a group) in different connected systems.
The sync engine combines data from these objects into a single metaverse object to maintain a comprehensive and unified identity across systems.

Linking Connector Space Objects to Metaverse Objects:

Multiple connector space objects can be linked to a single metaverse object, which allows different representations of the same entity (e.g., in Active Directory, SQL database, and another data source) to be combined.
However, a connector space object can only be linked to one metaverse object at a time. This ensures consistency and prevents conflicts between different representations of the same object.

Automatic Creation and Deletion:

Creation: Metaverse objects are automatically created by the sync engine when data is imported from a connector space object (via import objects). The sync engine consolidates the object’s attributes into the metaverse object during this process.
Deletion: Metaverse objects cannot be manually created or deleted. They are automatically deleted by the sync engine when they are no longer linked to any connector space objects. This ensures that only relevant and active objects exist in the metaverse.

Schema for Metaverse Objects:

The sync engine provides an extensible schema to define the types and attributes of metaverse objects. It includes a predefined set of object types and attributes, such as users, groups, and devices, but this schema can be extended to accommodate custom object types and attributes.
You can define new object types and attributes for metaverse objects. These attributes can have different data types, including:
- Single-valued attributes: Attributes that hold one value (e.g., user ID, email).
- Multivalued attributes: Attributes that can hold multiple values (e.g., group memberships, phone numbers).
- Attribute types: These can include strings, references to other objects, numbers, and Boolean values.

In the context of the sync engine, the relationship between staging objects and metaverse objects is crucial for the data flow and synchronization process. These relationships enable data to be moved between different data sources (connected systems) and the metaverse, ensuring that the identity data is consistent and up-to-date.

Joined Objects (or Connector Objects):

A joined object is a staging object that has been linked to a metaverse object. The link establishes a relationship between the data in the connector space (the staging object) and the metaverse (the unified identity object).
When a staging object becomes a joined object during synchronization, it can synchronize data (i.e., attribute values) between the connector space and the metaverse. This means that the attribute values from the connected data source (via the staging object) can flow into the metaverse object, and vice versa.
Attribute Flow:
- The attribute flow between the staging object and the metaverse object is bidirectional, meaning changes in one space (connector or metaverse) can be propagated to the other. This flow is controlled by the import and export attribute rules, which determine how data is transferred between the two.

Disjoined Objects (or Disconnector Objects):

A disjoined object is a staging object that is not linked to a metaverse object. This means that the sync engine has not yet established a connection between the data from the connected data source and the unified identity in the metaverse.
Disjoined objects typically represent data that has been imported but hasn’t been successfully matched to an existing metaverse object. These objects may eventually be linked (joined) during subsequent synchronization cycles.

Placeholders:

Placeholders are not linked to metaverse objects. Placeholders represent objects or components (like organizational units or referenced objects) that are missing or yet to be imported into the connector space. Since they don’t have actual data, they cannot have a relationship with metaverse objects. They exist temporarily until the real object is imported and can be processed.

Data Flow and Synchronization:

Attribute Synchronization:
- Once a staging object becomes a joined object, the sync engine can synchronize attributes between the connector space object (staging object) and the metaverse object. This allows for consistent and up-to-date identity data across systems.
- The import attribute rules govern how data flows from the connector space object (staging object) into the metaverse object, while the export attribute rules define how data flows from the metaverse object into the connector space.
Bidirectional Flow:
- The bidirectional flow of attributes means that updates to an object’s attributes in either the connector space or the metaverse can be synchronized across both. For example, if the email address in the metaverse object is updated, it can be exported to the connector space objects that are linked, ensuring all systems reflect the same value.

Example Scenario:

Suppose there’s a user object in both Active Directory (connector space) and an HR system (connector space). These two objects represent the same individual but are stored in different systems.
- The sync engine imports the user data into the connector space and attempts to join this object to the corresponding metaverse object.
- If the object in the HR system (staging object) is successfully matched to the metaverse object, the staging object becomes a joined object. The sync engine can then synchronize attributes, like name, department, and email, between the two systems.
- If the HR system object cannot be linked to the metaverse object yet (perhaps due to missing data or no matching metaverse object), it remains a disjoined object. The sync engine will attempt to resolve this link in future synchronization cycles.

Relationship Constraints:

One Connector Space Object to One Metaverse Object:
- A connector space object can only be linked to one metaverse object at any time, ensuring a one-to-one mapping between the two.
One Metaverse Object to Many Connector Space Objects:
- A metaverse object can be linked to multiple connector space objects, which means that an identity in the metaverse can be composed of information from several connected systems.

Conclusion

Joined objects allow for bidirectional synchronization of data between the connector space (staging objects) and the metaverse.
Disjoined objects are staging objects that have not been linked to any metaverse object, often requiring resolution in future cycles.
Placeholders are used to temporarily represent missing or unimported data, and they never have links to metaverse objects.

By establishing links between staging objects and metaverse objects, the sync engine enables consistent, unified identity management across multiple systems, ensuring that data remains synchronized and up-to-date across the connected data sources.

Sync engine identity management process

The sync engine identity management process ensures that identity information is consistently updated across multiple connected data sources. This process involves three main stages: Import, Synchronization, and Export. These stages ensure that data is accurately reflected in both the metaverse and the connector space and that updates are pushed to the relevant connected systems.

Import Process:

The Import process is the first step in the identity management cycle.
What Happens:
- The sync engine reads identity information from a connected data source.
- It evaluates the incoming data to detect any changes, such as newly created objects or updated attributes.
- If changes are detected, the sync engine creates new staging objects in the connector space or updates existing staging objects.
Purpose:
- The import process essentially brings in the latest identity data from connected systems and stores it in the connector space, where it can be processed and synchronized with the metaverse.

Synchronization Process:

The Synchronization process happens after the import, where the sync engine ensures that both the connector space and the metaverse remain in sync.
What Happens:
- The sync engine updates the metaverse to reflect changes from the connector space. This means that newly imported or updated data is used to maintain a consolidated view of the identity information in the metaverse.
- Simultaneously, the sync engine updates the connector space to reflect any changes made in the metaverse. This ensures that the data in the connector space remains consistent with the unified identity in the metaverse.
Purpose:
- This step ensures that any new changes from connected data sources or the metaverse are consistently reflected in both spaces, providing accurate, unified identity information.

Export Process:

The Export process is the final step, where changes are pushed out to the connected data sources.
What Happens:
- The sync engine identifies staging objects that are flagged as pending export (i.e., objects that have been updated or created but haven’t yet been pushed to the connected data sources).
- It then exports the changes from the connector space back to the relevant connected data sources.
Purpose:
- This ensures that any updates or changes made to identity information in the sync engine are sent out to the connected data sources, keeping all systems up-to-date with the latest identity data.

Data Flow Through the Processes:

Import Process: The sync engine reads identity information from the connected data source and creates or updates staging objects in the connector space.
Synchronization Process: The sync engine updates the metaverse based on changes in the connector space and vice versa, ensuring both spaces have the same data.
Export Process: The sync engine pushes updates from the connector space to the connected data sources.

Illustration of the Process Flow:

Identity Information Flow:
- The process begins with the import process, where data from the connected data source (e.g., Active Directory, HR system) is imported into the connector space.
- The synchronization process ensures that changes made in the connector space are reflected in the metaverse and vice versa, maintaining data consistency.
- Finally, the export process pushes updates from the connector space to the connected data sources, completing the identity management cycle.

Import Process

During the import process, the sync engine evaluates updates to identity information from the connected data source. It compares the incoming identity information with the staging objects in the connector space and determines whether the staging object needs updating. If a change is detected, the staging object is flagged as pending import for synchronization.

Key Functions of the Import Process:

Efficient synchronization: Only changed data is processed, minimizing the volume of data handled during synchronization.
Efficient resynchronization: Changes to the sync engine’s processing rules can be made without reconnecting to the data source.
Opportunity to preview synchronization: You can preview synchronization results to verify assumptions about the identity management process.

Object Matching and Special Behavior:

The sync engine tries to locate a staging object in the connector space that matches the incoming object, first by the anchor attribute (a unique identifier), and if that fails, by the distinguished name (DN).
Special Behavior for Matching by DN but Not Anchor:
- No Anchor: If the object in the connector space has no anchor, the sync engine deletes it and reassigns the “retry provisioning” flag to the linked metaverse object. Then, a new import object is created.
- Anchor Exists: If the object in the connector space has an anchor, it is assumed to be renamed or deleted. A new temporary DN is assigned, and the old object becomes transient. This might happen, for example, in cases where an object is deleted directly in a system like Microsoft Entra ID using PowerShell.
Transient Objects:
- These are not necessarily problematic but may need to auto-resolve in future delta synchronization cycles.
- Example: When a deletion is made in Microsoft Entra ID but synchronization hasn’t yet reflected it.

Pending Import Types:

None: No change to the object’s attributes. The sync engine does not flag the object for import.
Add: A new import object is created in the connector space, marked for processing in the metaverse.
Update: An existing object has changes in its attributes (e.g., renaming). It’s flagged as pending import for processing in the metaverse.
Delete: An existing object needs to be deleted from the connected data source. It’s flagged as pending import for removal.
Delete/Add: The object type has changed, requiring a complete resynchronization. It is flagged for delete-add processing.

This method ensures that only the changed data is processed during synchronization, reducing overhead and improving efficiency.

Synchronization Process

Synchronization includes two major parts:

Inbound Synchronization (from the connector space to the metaverse)
Outbound Synchronization (from the metaverse to the connector space)

Inbound Synchronization:

The inbound synchronization process aggregates data in the metaverse from the staging objects in the connector space.

Steps:
- Provision: Creates a new metaverse object based on a staging object and links them. This is an object-level operation.
- Join: Links a staging object to an existing metaverse object, establishing a joined relationship.
- Import Attribute Flow: Updates attribute values in the metaverse based on data from the staging object. This is an attribute-level operation.

Provision only creates objects for disjoined objects—those that are not yet linked to a metaverse object. For joined objects, the sync engine simply updates attributes.

Outbound Synchronization:

Outbound synchronization ensures changes in the metaverse are propagated to the connector space.

Steps:
- Provisioning: Creates new export objects or updates existing export objects based on changes in the metaverse.
- Deprovisioning: Removes links between metaverse and export objects. This results in deletion or disjunction of the object.
- Export Attribute Flow: Transfers updated attributes from the metaverse to the export objects.

The sync engine ensures that export objects are updated when metaverse objects change but are not deleted. If an export object requires updates, it is flagged as pending export and pushed out to the connected data source during the export process.

Export Process

During the export process, the sync engine examines all export objects flagged as pending export and pushes them to the connected data source.

Key points:

Export Status: The sync engine tracks whether each object’s attributes are successfully exported. If attributes are changed, they are compared between imports and exports to determine if the export was successful.
Example of Export Process:
- If attribute C (value 5) is exported to the connected data source, the sync engine stores the C=5 export status.
- During subsequent exports, the sync engine attempts to re-export C=5, unless a different value has been imported recently.

The export memory is cleared when the attribute value is successfully imported back from the connected data source.

Summary of the Process Flow:

Import: Sync engine compares incoming data with staging objects, flags objects for updates if necessary.
Synchronization: Data flows from the connector space to the metaverse (inbound synchronization), and from the metaverse back to the connector space (outbound synchronization).
Export: Export objects are flagged for changes and updates, and data is pushed to the connected data sources during the export phase.

This structured process ensures that identity data across various systems is kept consistent and up-to-date with minimal overhead.

Conclusion

The sync engine plays a crucial role in managing identity information across various connected data sources, ensuring data consistency, integrity, and efficient synchronization. By leveraging processes such as import, synchronization, and export, the sync engine facilitates seamless data flow, minimizing overhead and optimizing performance. Understanding the nuances of staging objects, metaverse objects, and the intricacies of synchronization operations can empower administrators to effectively manage identity information, troubleshoot issues, and enhance the overall synchronization process. With careful configuration and monitoring, organizations can ensure that their identity management solutions remain reliable, scalable, and adaptable to changing business needs.

If you’re looking to optimize your identity management process or troubleshoot synchronization challenges, diving deeper into the inner workings of the sync engine can provide valuable insights into achieving better synchronization outcomes. Happy syncing!