1. What is Data Governance?
Definition:
➡️ Data Governance is the system of rules, processes, and responsibilities used to manage data in an organization.
➡️ It ensures that data is accurate, secure, consistent, and available to the right people.
In simple words:
✔️ How data should be handled
✔️ Who can access the data
✔️ How data should be protected
✔️ How data should be maintained
2. Why Data Governance is Important for Banks?
Banks deal with huge amounts of sensitive data like customer details, KYC, transactions, loans, credit scores, etc.
Good data governance helps banks to:
- ✔️ Improve decision-making
- ✔️ Reduce fraud and risks
- ✔️ Meet regulatory requirements (RBI, SEBI, IRDAI)
- ✔️ Maintain customer trust
- ✔️ Improve operational efficiency
3. Key Components of Data Governance
| Component | Meaning | Example |
|---|---|---|
| Data Ownership | Assigning responsibility for data | Branch Manager responsible for branch customer data |
| Data Stewardship | People who maintain quality of data daily | Employee updates KYC details |
| Data Policies | Rules on how data must be used | KYC update policy |
| Data Standards | Common formats of data | Date format: DD/MM/YYYY |
| Data Security | Protecting data from unauthorized access | Encrypting customer account data |
| Compliance | Following laws & regulations | RBI Cyber Security Framework |
4. What is Data Quality? (Very Important)
Definition:
➡️ Data Quality means how accurate, complete, consistent, and reliable the data is.
In simple words:
High data quality = Good data.
Low data quality = Wrong, outdated, or incomplete data.
5. Dimensions of Data Quality (Exam Favourite)
| Data Quality Dimension | Meaning | Example |
|---|---|---|
| Accuracy | Correct and error-free data | PAN number matches the customer |
| Completeness | No missing values | All KYC fields are filled |
| Consistency | Same data across systems | Customer address same in CBS and CRM |
| Timeliness | Updated regularly | NPA data updated daily |
| Uniqueness | No duplicates | One CIF per customer |
| Validity | Data follows rules/formats | Email must contain “@” |
6. Difference Between Data Governance vs Data Quality
| Data Governance | Data Quality |
|---|---|
| Focuses on rules, roles, and responsibilities | Focuses on correctness of data |
| Governance is the framework | Quality is the output |
| Ensures data is used properly | Ensures data is error-free |
| Managed by Data Governance Team | Managed by Data Stewards / IT / Operations |
7. How Banks Maintain Data Governance
Banks use many practices:
- ✔️ Data Governance Committee (DGC)
- ✔️ Data Owners and Data Stewards
- ✔️ Policies (KYC, AML, Data Retention)
- ✔️ Data Encryption and Access Control
- ✔️ RBI & SEBI Compliance Guidelines
- ✔️ Periodic Data Audits
- ✔️ Master Data Management (MDM)
8. How Banks Maintain Data Quality
- ✔️ Regular KYC updates
- ✔️ Automated validation (e.g., Aadhaar/PAN check)
- ✔️ Removing duplicate data
- ✔️ Ensuring same format across systems
- ✔️ Using CBS, CRM, AML integrated systems
- ✔️ Daily/Weekly data quality checks
- ✔️ Data cleansing tools
9. Risks of Poor Data Governance (Exam Important)
- ❌ Increase in fraud
- ❌ Wrong decision-making
- ❌ Regulatory penalties (from RBI/SEBI)
- ❌ Data leaks & cyber attacks
- ❌ Poor customer service
- ❌ Incorrect financial reporting
10. Risks of Poor Data Quality
- ❌ Wrong KYC leads to compliance violations
- ❌ Wrong credit decisions (NPA risk increases)
- ❌ Duplicate customer records
- ❌ Wrong risk scoring
- ❌ Slower processes
11. Example Scenario (Very Easy to Remember)
Example 1: Wrong Customer Address
- CBS has old address
- CRM has new address
- Loan notices go to wrong address → leads to legal issues
➡️ This is a Data Consistency problem.
Example 2: PAN number typed incorrectly
- PAN mismatch in CBS and Income Tax records
➡️ This is a Data Accuracy problem.
Example 3: Missing KYC field
- Customer’s occupation is blank
➡️ This is a Data Completeness issue.
12. RBI Guidelines Related to Data Governance
Banks must follow:
- ✔️ RBI Cyber Security Framework
- ✔️ RBI Guidelines on IT Governance (Gopalakrishna Committee)
- ✔️ RBI Data Localization Mandate
- ✔️ RBI KYC Master Direction
- ✔️ RBI Risk-Based Supervision (RBS)
- ✔️ Guidelines on Credit Risk Management
13. SEBI / NABARD / IRDAI Importance
- SEBI: Requires accurate market data, investor data, trade records
- NABARD: Accurate data for rural credit, SHG records, agriculture loans
- IRDAI: Policyholder data, claims data, risk data
14. Summary (One-line Revision Notes)
- Data Governance = Rules & Responsibility.
- Data Quality = Accuracy & Correctness of Data.
- Both are essential for compliance, risk reduction, and banking operations.
A. Basic Definitions & Concepts (1–12)
- What is Data Governance?
A. The process of storing data only.
B. The system of rules, roles and processes for managing data.
C. A software tool for backup.
D. A type of database.
Answer: B.
Explanation: Data governance defines who is responsible for data, rules and processes. - What is Data Quality?
A. Speed of data entry.
B. How accurate, complete and reliable data is.
C. The size of a database.
D. A data visualization method.
Answer: B.
Explanation: Quality refers to correctness, completeness, etc. - Which is NOT a component of data governance?
A. Data ownership
B. Data stewardship
C. Data encryption algorithms only
D. Data policies and standards
Answer: C.
Explanation: Encryption is part of data security, not the full governance framework. - Who is a Data Owner?
A. A person who stores backups.
B. The person accountable for data and decisions about it.
C. A software developer.
D. The server admin.
Answer: B.
Explanation: Data owners have accountability for specific datasets. - Who is a Data Steward?
A. A person who ships physical records.
B. The person managing data quality and daily data tasks.
C. The CEO.
D. The external auditor.
Answer: B.
Explanation: Stewards maintain and improve data quality. - What is a Data Custodian?
A. Responsible for technical storage and protection of data.
B. Owner of the data.
C. A regulator.
D. A customer.
Answer: A.
Explanation: Custodians handle technical controls and backups. - Which best describes Data Policy?
A. A programming language.
B. Rules that govern how data is used and handled.
C. A type of data format.
D. A hardware device.
Answer: B.
Explanation: Policies set allowed actions, retention, access rules. - Metadata means:
A. Data about data.
B. The fastest database.
C. Encrypted data.
D. A report only.
Answer: A.
Explanation: Metadata describes fields, meaning, source, format. - Data lineage tells you:
A. Who owns the server.
B. The path and transformations data has undergone.
C. How many rows are in a table.
D. The password history.
Answer: B.
Explanation: Lineage shows source, transformations, and movement. - Master Data refers to:
A. Temporary logs.
B. Core business entities like customer, product, account.
C. Backup files.
D. UI designs.
Answer: B.
Explanation: Master data is shared reference info across systems. - Reference Data examples are:
A. Transaction logs.
B. Country codes, currency codes, interest rate types.
C. Draft emails.
D. Temporary cache.
Answer: B.
Explanation: Reference data standardizes values across systems. - What is a Golden Record?
A. A single best version of an entity (e.g., customer) consolidated from many sources.
B. A music record.
C. A database index.
D. A backup policy.
Answer: A.
Explanation: Golden record is the trusted single source of truth.
B. Data Quality Dimensions (13–28)
- Which is NOT a typical data quality dimension?
A. Accuracy
B. Completeness
C. Popularity
D. Timeliness
Answer: C.
Explanation: Popularity isn’t a standard quality metric. - Accuracy means:
A. Data is in right format only.
B. Data reflects the real-world truth.
C. Data is large.
D. Data is encrypted.
Answer: B.
Explanation: Accurate data matches reality (e.g., correct PAN). - Completeness checks:
A. If no data exists.
B. If all required fields are present.
C. Only formatting.
D. Data speed.
Answer: B.
Explanation: Completeness means no missing required values. - Consistency ensures:
A. Same data across different systems.
B. Data is always encrypted.
C. Data is kept for long time.
D. Only backup copies exist.
Answer: A.
Explanation: Consistency avoids conflicting values in different systems. - Timeliness refers to:
A. Data being old.
B. Data being updated when needed.
C. Data size.
D. Data stored offline.
Answer: B.
Explanation: Timely data is current and relevant. - Uniqueness means:
A. Data is rare.
B. No duplicate records for the same real-world entity.
C. Data is encrypted uniquely.
D. One database file.
Answer: B.
Explanation: Uniqueness prevents duplicate customers/CIFs. - Validity checks:
A. Data obeys formatting and business rules.
B. Data is popular.
C. Data is large.
D. Data is in cloud.
Answer: A.
Explanation: Validity ensures values fall within allowed rules. - Which is the correct example of timeliness?
A. Monthly NPA report updated daily.
B. Outdated address used for months.
C. Duplicate customer entries.
D. Wrong PAN stored.
Answer: A.
Explanation: Daily updates make data timely. - Which tool is often used for finding data quality issues?
A. Data profiling tools.
B. Word processor.
C. Presentation software.
D. Photo editor.
Answer: A.
Explanation: Profiling analyzes distributions, missing values, duplicates. - Data cleansing means:
A. Deleting the whole database.
B. Correcting or removing incorrect, incomplete or duplicate data.
C. Encrypting data.
D. Archiving data.
Answer: B.
Explanation: Cleansing fixes quality problems. - Deduplication is used to:
A. Make extra copies.
B. Remove duplicate records.
C. Compress files.
D. Encrypt duplicates.
Answer: B.
Explanation: Deduplication merges or removes duplicates. - Data profiling helps to:
A. Create user interfaces.
B. Discover patterns, anomalies, and distribution in data.
C. Encrypt data.
D. Create backups.
Answer: B.
Explanation: Profiling finds quality issues for remediation. - Which metric measures completeness?
A. % of mandatory fields filled.
B. % of rows with duplicate IDs.
C. % of encrypted columns.
D. % of files backed up.
Answer: A.
Explanation: Completeness often tracked as percentage filled. - Which metric measures uniqueness?
A. Number of unique customer IDs / total customers.
B. % of null values.
C. Age of data.
D. Encryption strength.
Answer: A.
Explanation: Uniqueness ratio shows duplicates presence. - What is a data quality rule?
A. A firewall rule.
B. A validation that data must satisfy (e.g., PAN format).
C. A backup schedule.
D. A UI guideline.
Answer: B.
Explanation: Rules define acceptable values/formats. - Which action is part of data quality improvement?
A. Ignore issues.
B. Define rules, cleanse data, monitor KPIs.
C. Delete all old records.
D. Close the database.
Answer: B.
Explanation: Improvement is proactive and continuous.
C. Processes & Tools (29–44)
- ETL stands for:
A. Extract, Transform, Load.
B. Encrypt, Transfer, Log.
C. Edit, Test, Launch.
D. Extract, Test, Link.
Answer: A.
Explanation: ETL moves and transforms data into warehouses. - Data Warehouse is used for:
A. Live transaction processing.
B. Analytical reporting and historical data.
C. Storing images only.
D. Email storage.
Answer: B.
Explanation: Warehouses support analytics, not OLTP. - Master Data Management (MDM) aims to:
A. Create multiple versions of the same entity.
B. Create and maintain a single accurate master record for key entities.
C. Lose data.
D. Only backup data.
Answer: B.
Explanation: MDM builds the golden record. - Data catalog helps users to:
A. Order new hardware.
B. Find, understand and trust datasets and their metadata.
C. Encrypt all files.
D. Create passwords.
Answer: B.
Explanation: Catalogs document datasets for discovery. - Data dictionary provides:
A. A list of data elements, meanings and formats.
B. A spelling list.
C. A backup log.
D. A user manual for the app.
Answer: A.
Explanation: Dictionaries explain fields and accepted values. - Data validation at entry prevents:
A. Correct data only.
B. Invalid or malformed values entering systems.
C. Backups.
D. Data analysis.
Answer: B.
Explanation: Validation enforces rules at capture time. - Data profiling is usually done:
A. After data cleaning.
B. Before cleansing to find issues.
C. Never.
D. Only for images.
Answer: B.
Explanation: Profiling discovers issues to target cleansing. - Which is NOT part of MDM?
A. Matching and merging records.
B. Creating golden records.
C. Managing master workflows.
D. Creating temporary log files only.
Answer: D.
Explanation: MDM focuses on long-term master records. - Data orchestration mainly deals with:
A. Running ETL jobs, pipelines and scheduling tasks.
B. Playing music.
C. Deleting files.
D. User training.
Answer: A.
Explanation: Orchestration automates data flows. - A data quality dashboard shows:
A. Server temperatures.
B. Metrics like % completeness, duplicates, errors.
C. Only email alerts.
D. Historical stock prices.
Answer: B.
Explanation: Dashboards visualize data health KPIs. - Which technology is used for data lineage visualization?
A. Data lineage tools / metadata management tools.
B. Word processors.
C. Presentation slides only.
D. Plain text editors.
Answer: A.
Explanation: Specialized tools trace and visualize lineage. - Business glossary is:
A. A list of company terms and definitions for data.
B. A legal agreement.
C. A backup protocol.
D. A coding standard.
Answer: A.
Explanation: Glossary ensures consistent business meaning.
D. Security, Privacy & Compliance (41–56)
- Data encryption ensures:
A. Data is always accurate.
B. Confidentiality by converting data into unreadable form without keys.
C. Data is deleted.
D. Data is copied.
Answer: B.
Explanation: Encryption protects data confidentiality. - Access control means:
A. Anyone can access any data.
B. Limiting access to data based on roles and permissions.
C. Only the CEO can access data.
D. Access via USB only.
Answer: B.
Explanation: Role-based access gives least privilege. - Anonymization differs from pseudonymization how?
A. They are the same.
B. Anonymization is irreversible; pseudonymization can be reversed with keys.
C. Pseudonymization deletes data.
D. Anonymization increases data accuracy.
Answer: B.
Explanation: Anonymized data cannot be traced back to individuals. - Which regulation focuses on consumer data privacy (global example)?
A. GDPR.
B. RBI circular only.
C. SQL standard.
D. ISO 9000.
Answer: A.
Explanation: GDPR is a major privacy law; local rules may vary. - Banks must follow which of the following for KYC?
A. RBI KYC master direction and guidelines.
B. Only internal notes.
C. No rules.
D. Marketing guidelines.
Answer: A.
Explanation: RBI issues KYC and AML rules for banks. - Data localization means:
A. Data must be stored and processed within a country.
B. Data stored everywhere globally.
C. Data stored in local cache only.
D. Data sent to customers.
Answer: A.
Explanation: Some regulations require onshore storage. - What is a breach notification requirement?
A. No action after breach.
B. Inform regulators/customers when sensitive data is exposed.
C. Delete logs only.
D. Only internal memo.
Answer: B.
Explanation: Many laws require timely notification on breaches. - Least privilege principle means:
A. Give all users full access.
B. Users get the minimum access needed to do their job.
C. No one has access.
D. Access controlled by random.
Answer: B.
Explanation: Least privilege reduces insider risk. - Segregation of duties (SoD) helps prevent:
A. System upgrades.
B. Fraud by splitting critical tasks between people.
C. Data backup.
D. Data mapping.
Answer: B.
Explanation: Splitting tasks reduces collusion and mistakes. - Which is a privacy-preserving technique?
A. Encryption, anonymization, tokenization.
B. Printing data.
C. Sending data over email plain text.
D. Leaving files on desktop.
Answer: A.
Explanation: These techniques reduce exposure of personal data. - Tokenization replaces:
A. Entire database.
B. Sensitive values with non-sensitive tokens.
C. Files with images.
D. Laptops.
Answer: B.
Explanation: Tokens map to sensitive data stored securely elsewhere. - Data retention policy defines:
A. How long data must be kept and when to delete/archieve.
B. Who to hire.
C. How to encrypt.
D. How to write code.
Answer: A.
Explanation: Retention balances legal, business and storage needs. - Audit trail helps to:
A. Hide activities.
B. Record who accessed or changed data and when.
C. Delete records.
D. Produce music.
Answer: B.
Explanation: Audit trails support investigations and compliance. - RBI IT governance for banks covers:
A. IT strategy, risk management and controls.
B. Only marketing tools.
C. Only loan products.
D. HR policies only.
Answer: A.
Explanation: RBI issues guidelines on IT governance for banking safety. - Regulatory reporting data must be:
A. Random.
B. Accurate, auditable and timely.
C. Encrypted only.
D. Anonymous always.
Answer: B.
Explanation: Regulators depend on reliable data for oversight. - Privacy by design principle means:
A. Consider privacy from the start when designing systems.
B. Add privacy later.
C. Ignore privacy.
D. Design only front-end.
Answer: A.
Explanation: Privacy integrated into system architecture from day one.
E. Governance Structure & Roles (57–68)
- A Data Governance Committee (DGC) typically:
A. Is only for marketing.
B. Sets policies, resolves issues and sponsors governance activities.
C. Writes only code.
D. Sells hardware.
Answer: B.
Explanation: DGC is a steering group for governance. - Which role approves access policies and exceptions?
A. Data Owner or Data Governance Committee.
B. Any intern.
C. External vendors only.
D. Customers.
Answer: A.
Explanation: Owners/committees approve rules and exceptions. - Data stewardship council focuses on:
A. Operational tasks for ensuring day-to-day data quality.
B. Just backups.
C. Only hiring.
D. Marketing.
Answer: A.
Explanation: Council coordinates stewards across domains. - Chief Data Officer (CDO) is responsible for:
A. Leading data strategy, governance and quality initiatives.
B. Only email server.
C. Office cleaning.
D. Legal disputes only.
Answer: A.
Explanation: CDO drives enterprise-level data program. - Data owner vs data custodian – correct pairing:
A. Owner = accountable; Custodian = implements controls.
B. Owner = implements controls; Custodian = accountable.
C. Both identical always.
D. Owner = user; Custodian = vendor.
Answer: A.
Explanation: Owner defines policy; custodian handles technical tasks. - Data steward typically reports to:
A. Business unit or data governance office.
B. External auditor only.
C. No one.
D. Only to customers.
Answer: A.
Explanation: Stewards are often within business teams with governance links. - Escalation path in governance is for:
A. Ignoring issues.
B. Raising unresolved data issues to higher authority.
C. Deleting data.
D. Changing passwords only.
Answer: B.
Explanation: Enables problem resolution at appropriate level. - SLA in context of data services means:
A. Service Level Agreement: Defines expected uptime, quality, response times.
B. Security Level Agreement only.
C. Supplier Legal Agreement.
D. Salary level arrangement.
Answer: A.
Explanation: SLAs set performance targets for data services. - KPIs for data governance might include:
A. % of data quality issues closed, % completeness, MTTR for issues.
B. Office attendance.
C. Number of meetings only.
D. Number of coffee cups.
Answer: A.
Explanation: KPIs quantify success of governance.
F. Bank-Specific & Exam-Focused Scenarios (66–80)
- If branch address differs between CBS and CRM, this is a problem of:
A. Security.
B. Consistency.
C. Speed.
D. Storage.
Answer: B.
Explanation: Same customer data must match across systems. - Missing PAN in loan account KYC is primarily a:
A. Completeness issue and regulatory compliance risk.
B. Network issue.
C. Encryption problem.
D. Hardware failure.
Answer: A.
Explanation: PAN is mandatory; missing PAN causes compliance failure. - Duplicate CIFs for same customer cause:
A. Better reporting.
B. Risk in credit scoring and reporting.
C. Faster service.
D. No impact.
Answer: B.
Explanation: Duplicates distort customer view and risk metrics. - Which action helps prevent duplicate customer records?
A. Strong matching rules and MDM.
B. Delete all old data.
C. Increase disk space.
D. Remove validation.
Answer: A.
Explanation: Matching logic and golden record creation reduce duplicates. - A bank report used by RBI must be:
A. Prepared randomly.
B. Accurate, consistent, and auditable.
C. Secret only to CEO.
D. Only in local language.
Answer: B.
Explanation: Regulator relies on precise and verifiable data. - Which is a common cause of poor data quality in banks?
A. Manual data entry errors and system mismatches.
B. Too many automations only.
C. Perfect processes only.
D. Excessive use of reports.
Answer: A.
Explanation: Manual processes and disconnected systems create errors. - An effective data governance program in a bank should:
A. Be technology-only.
B. Combine people, process and technology.
C. Only depend on external auditors.
D. Ignore business units.
Answer: B.
Explanation: Governance is a mix of organization, rules and tools. - Regulatory fines due to wrong reporting are caused by:
A. High data quality.
B. Poor data governance and data quality.
C. Too many backups.
D. Data encryption.
Answer: B.
Explanation: Inaccurate reports can trigger penalties. - KYC refresh is needed to maintain:
A. Data timeliness and compliance.
B. Only hardware maintenance.
C. Encryption keys.
D. Office supplies.
Answer: A.
Explanation: Periodic KYC updates ensure current and compliant customer info. - To improve credit decisioning, the bank should ensure:
A. Accurate, complete and timely customer and transaction data.
B. Lowering staff.
C. Increasing meetings.
D. Deleting transaction history.
Answer: A.
Explanation: Quality data leads to correct credit scoring. - Which of the following is NOT a preventive measure for data breaches?
A. Strong access controls.
B. Regular security patching.
C. Leaving default passwords.
D. Encryption of sensitive data.
Answer: C.
Explanation: Default passwords increase breach risk. - Data reconciliation between systems means:
A. Comparing and aligning data so systems agree.
B. Formatting spreadsheets only.
C. Changing ownership.
D. Removing data.
Answer: A.
Explanation: Reconciliation finds and corrects mismatches. - Which is a sign of poor data quality?
A. High rate of returned letters due to wrong address.
B. Accurate reports.
C. Unified customer view.
D. Low error rates.
Answer: A.
Explanation: Wrong addresses show poor data quality. - A data quality improvement plan includes:
A. Identify issues, prioritize, implement fixes, monitor.
B. Only buy new servers.
C. Only increase staff.
D. Delete database.
Answer: A.
Explanation: A structured plan drives continuous improvement. - Regulatory audits for data ask for:
A. Evidence of controls, lineage and data quality checks.
B. Office cleanliness evidence.
C. Music playlists.
D. Only marketing brochures.
Answer: A.
Explanation: Audits assess control effectiveness and data accuracy.
