profile-img

Identities/Identifiers

Home  »  Blogs  »  Identities/Identifiers

Identities/Identifiers

Identities/Identifiers
Shri Rajender Sethi
Deputy Director General, NIC
rsethi[at]nic[dot]in

The registration numbers (say student registration number was allotted by an education board) are issued to uniquely identify entities. It makes easier to maintain information and track progress or activities related to an entity within manual register-based systems. With the digitization of records, primary/foreign keys were created to maintain entity relationships within a system. These primary keys were being referred as Ids (key generated by a computerized system), subsequently, it became alternative to registration numbers and also known as identities/identifiers. Moving ahead with integration of such system, a number of such identifiers came up like PAN/TAN, TIN/GSTIN, IFSC, etc.

With the proliferation of digital utilization, the integrated systems came up as digital platforms. Going further, Public Digital Platform are envisaged as unifying these isolated systems into a combined Platform with a unique number where all the entities through consent-based mechanism supported by new age technologies would interact with the data available across entities. For example, UPI emerged as public digital platform unifying isolated banking system using Mobile and Aadhaar as key identifiers. It has become pertinent to focus on designing identifiers and their structure in building an ecosystem that would enable growth of public digital platforms.

Identifiers: Design Criteria

The criteria for unique codes which may be referred as identifiers need to be determined for every entity along with their primary attributes for easy access after error detection, data cleaning, validation and de-duplication etc. Identifiers assigned to entities are used to uniquely identify them throughout their life cycle. These identifier codes can be validated through different checksum formulas.

Image of Identifiers assigned to entities are used to uniquely identify

All the identifiers in a domain should be designed taking care of following aspects:

1. Availability for unlimited entities,structure of the unique identifier should be designed to incorporate the unanticipated growth in number and type of entities, appropriate number of digits should be used offering an abundant space with low density of used numbers;

2. Longevity,formats used for sequencing and identification should remain compatible with any future format changes;

3. The identifier should not reflect identification of entities (Privacy Issues), reinforced encoding scheme should be used to prevent a back trace on vital entity information;

4. There should be a process for de-duplication, indexing techniques should be used to identify and reject redundant entries to avoid identity conflict;

5. Usage of only numeric values to serve multilingual society, should be used for easy interpretation in a society with diverse vernaculars;

6. Semantics-free, apart from reserving the last digit (used as check digit);

7. Mapping of multiple Standards/Identifiers, system should have capabilities to consolidate and support already existent multiple standards to become single source of truth.

Following points may be considered to satisfy above mentioned design principles

a. The identifier is assigned once, at inception, and shall remain the same for the lifetime, and there is no consideration of reusing numbers.

b. It should have processed defined for activation, de-activation and recovery.;

c. It should have an error detecting (but not necessarily error correcting) check digit that detects as many data entry errors as possible. The Verhoeff algorithm may be used to generate check-digit which detects all single-digit errors, and all transposition errors involving two adjacent digits.

d. 1st digit should not be zero, it helps in easy data transfer to integrate applications.

e. Some of the numbers may be kept reserved for specific purpose (e.g., taking group actions on set of identifiers).

f. It may not have three or more repeating digits (like 222, 5555 etc.) or sequence of three or more digits (like 234, 34567 etc.)

g. It may be displayed or printed in bunch of four digits from right side. For example, the identifiers 27, 121, 2359, 67105, 5378654, 9132787821 may be displayed or written as 27, 121, 2359, 6 7105, 537 8654, 91 3278 7821 respectively, or there can be a hyphen or dash instead of space as a delimiter for making it simple to use.

h. It may be generated sequentially or in a random.

i. It may be of variable or fixed length satisfying all design principles.

j. In case of variable length

i. It may be generated in a sequentially.

ii. It may have a starting value (default 1), incremental value (default 1), and may not have any upper limit.

iii. To make it slightly randomized, one or two digits may be randomized after the generated sequence number.

iv. Finally, a check digit may be added to make it an identifier.

References

1. Core Data Registries for Public Digital Platforms, https://uxdt.nic.in/flipbooks/CDR/

2. A UID Numbering Scheme, May 2010, Hemant Kanakia, Srikanth Nadhamuni and Sanjay Sarma

Page Last Updated Date :October 4th, 2023
error: Content is protected !!