SlideShare a Scribd company logo
© 2013 EnterpriseDB Corporation. All rights reserved. 1
WAL for DBAs – Everything you
want to know
Devrim Gündüz
Principal Systems Engineer @ EnterpriseDB
devrim.gunduz@EnterpriseDB.com
Twitter : @DevrimGunduz
PGDay.Asia 2017
WAL for DBAs
2
●
Who is this guy?
– Using Red Hat (and then Fedora) since 1996.
– Using PostgreSQL since 1998.
– Responsible for PostgreSQL YUM repository.
●
Used to break website, but recently gave up.
– Started some work on PostgreSQL Dockerfiles
recently. https://www.pgdocker.org
– Working at EnterpriseDB since 2011.
– The Guy With The PostgreSQL Tattoo! (imitations may
exist :) )
– Istanbul, Turkiye.
About me
FOSDEM PGDAY 2017
WAL for DBAs
3
●
Please tweet!
– #PostgreSQL
– #pgday
– #pgdayasia
Social media
FOSDEM PGDAY 2017
WAL for DBAs
4
(Did you tweet? Thanks!)
Social media
FOSDEM PGDAY 2017
WAL for DBAs
5
(Did you tweet? Thanks!)
Alternative hashtag:
#blamemagnus
Social media
FOSDEM PGDAY 2017
WAL for DBAs
6
June 26-28, Boston
We want to see great speakers who
can talk to the technical aspects of
using Postgres in the enterprise.
http://postgresvision.com/
Postgres Vision 2017
PGDay.Asia 2017
WAL for DBAs
7
Agenda (in random order)
•
What is WAL?
•
What does it include?
•
How to read it?
•
What about wal_level ?
•
Replication and WAL
•
Backup and WAL
•
PITR and WAL
•
Other topics
FOSDEM PGDAY 2017
WAL for DBAs
8
Please do not delete WAL files
manually.
Please.
Before we actually start:
FOSDEM PGDAY 2017
WAL for DBAs
9
Please do not delete WAL files
manually.
Please.
Please.
Before we actually start:
FOSDEM PGDAY 2017
WAL for DBAs
10
Please do not delete WAL files
manually.
Please.
Please.
PLEASE.
Before we actually start:
PGDay.Asia 2017
WAL for DBAs
11
What is WAL?
•
Write Ahead Log:
•
Logging of transactions
•
a.k.a. xlog (transaction log),
•
16 MB in most of the installations (can be configured, --with-
wal-segsize)
•
8 kB page size (can be configured, --with-wal-blocksize)
•
pg_xlog (<= 9.6)→ pg_wal (10+)
•
Because people deleted files under “log” directory.
FOSDEM PGDAY 2017
WAL for DBAs
12
MAKE WAL GREAT AGAIN!
So:
PGDay.Asia 2017
WAL for DBAs
13
What is WAL?
•
Designed to prevent data loss in most of the situations
•
OS crash, hardware failure, PostgreSQL crash.
•
Write transactions are written to WAL
•
Before transaction result is sent to the client
•
Data files are not changed on each transaction
•
Performance benefit
•
Should be kept in a separate drive.
•
Initdb, or symlink
PGDay.Asia 2017
WAL for DBAs
14
What is WAL?
•
Built-in feature
•
Life before WAL (not before B.C., though):
•
All changes go to durable storage (eventually), but:
•
Data page is loaded to shared_buffers
•
Changes are made there
•
Dirty buffers!
•
But not timely!
•
Crash → Data loss!
PGDay.Asia 2017
WAL for DBAs
15
What is WAL?
•
Life after WAL:
•
Almost all “modifications” are “logged” to WAL files (xlog
record)
•
Even if the transaction is aborted (ROLLBACK)
•
wal_buffers → WAL segments (files)
•
Ability to recover data after a crash
•
Checkpoint!
PGDay.Asia 2017
WAL for DBAs
16
Where is it used?
•
Transaction logging!
•
Replication
•
PITR
•
REDO
•
Sequentially availability is a must.
•
REDO vs UNDO
•
No REDO for temp tables and unlogged tables.
PGDay.Asia 2017
WAL for DBAs
17
WAL file naming
•
24 chars, hex.
•
1st 8 chars: timelineID
•
00000001 is the timelineID created by initdb
•
2nd
8 chars: logical WAL file
•
3rd
8 chars: WAL segment name (physical WAL file)
•
000000010000000000000001 → 000000010000000000000002
•
… 0000000100000000000000FF →
000000010000000100000000
•
Use PostgreSQL’s internal tools to manage them
•
pg_archivecleanup
•
pg_resetxlog
•
pg_xlogdump
•
....
PGDay.Asia 2017
WAL for DBAs
18
pg_xlogdump
•
We are all human.
•
Use pg_xlogdump, if you want to see contents of WAL files
•
rmgr --help to get list of all resource names, -f for follow, -n for limit. -z for stats.
•
pg_xlogdump -n 20 -f 000000010000000700000033
•
rmgr: Heap len (rec/tot): 3/ 59, tx: 389744, lsn: 7/33B66228, prev 7/33B661F0, desc: INSERT+INIT off 1,
blkref #0: rel 1663/13326/190344 blk 0
•
rmgr: Heap len (rec/tot): 3/ 59, tx: 389744, lsn: 7/33B66268, prev 7/33B66228, desc: INSERT off 2, blkref
#0: rel 1663/13326/190344 blk 0
•
rmgr: Transaction len (rec/tot): 8/ 34, tx: 389744, lsn: 7/33B662A8, prev 7/33B66268, desc: COMMIT 2017-02-
03 03:03:49.482223 +03
•
rmgr: Heap len (rec/tot): 14/ 69, tx: 389745, lsn: 7/33B662D0, prev 7/33B662A8, desc: HOT_UPDATE off
1 xmax 389745 ; new off 3 xmax 0, blkref #0: rel 1663/13326/190344 blk 0
•
rmgr: Transaction len (rec/tot): 8/ 34, tx: 389745, lsn: 7/33B66318, prev 7/33B662D0, desc: COMMIT 2017-02-
03 03:03:54.091645 +03
•
rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 7/33B66340, prev 7/33B66318, desc:
CHECKPOINT_ONLINE redo 7/33B66340; tli 1; prev tli 1; fpw true; xid 0/389746; oid 198532; multi 1; offset 0; oldest
xid 1866 in DB 129795; oldest multi 1 in DB 90123; oldest/newest commit timestamp xid: 388437/389745; oldest
running xid 0; online
•
rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn: 7/33B663B0, prev 7/33B66340, desc: SWITCH
•
PGDay.Asia 2017
WAL for DBAs
19
Shared Buffers, Bgwriter and checkpointer
•
shared_buffers in PostgreSQL
•
Dirty buffers
•
This is where transactions are performed
•
Side effect: Causes inconsistency(?) on durable storage, due
to dirty buffers.
•
Bgwriter: Background writer
•
LRU
•
Checkpointer
•
Pushing all dirty buffers to durable storage
•
Triggered automatically or manually
•
Backends may also write data to heap
PGDay.Asia 2017
WAL for DBAs
20
WAL: LSN
•
Log Sequence Number
•
Position of the record in WAL file.
•
Provides uniqueness for each xlog record.
•
Per docs: “Pointer to a location in WAL file”
•
During recovery, LSN on the page and LSN in the WAL file are
compared.
•
The larger one wins.
PGDay.Asia 2017
WAL for DBAs
21
●
Probably not the last one in ls list!
WAL: Finding current WAL file
•
postgres=# SELECT * from pg_current_xlog_location();
pg_current_xlog_location
--------------------------
40E6/2C85AC10
•
postgres=# SELECT pg_xlogfile_name(‘40E6/2C85AC10');
pg_xlogfile_name
--------------------------
00000003000040E60000002C
So:
•
postgres=# SELECT pg_xlogfile_name(pg_current_xlog_location());
pg_xlogfile_name
--------------------------
00000003000040E60000002C
PGDay.Asia 2017
WAL for DBAs
22
Checkpoint, and pg_control
●
As soon as the checkpoint starts, REDO point is stored in shared buffers.
●
A WAL record is created referencing checkpoint start, and it is first written to WAL
buffers, and then eventually to pg_control.
– pg_control is under $PGDATA/global
●
Unlike bgwriter, checkpointer writes all of the data in the shared_buffers to
durable storage.
●
PostgreSQL knows the latest REDO point, by looking at pg_control file.
PGDay.Asia 2017
WAL for DBAs
23
Checkpoint, and pg_control
●
pg_controldata:
– Latest checkpoint location: 40E7/E43B16B8
– Prior checkpoint location: 40E7/D8689090
They are LSN.
●
When checkpoint is completed, pg_control is updated with the position of
checkpoint.
●
After checkpoint, old WAL files are either recycled, or removed.
●
An “estimation” is done while recycling (based on previous checkpoint cycles)
●
9.5+: In minimum, min_wal_size WAL files are always recycled for future usage
PGDay.Asia 2017
WAL for DBAs
24
pg_control and REDO
●
postmaster reads pg_control on startup.
/usr/pgsql-10/bin/pg_controldata -D /var/lib/pgsql/10/data | grep state
– “Database cluster state”:
●
starting up
●
shut down
●
shut down in recovery
●
shutting down
●
in crash recovery
●
in archive recovery
●
in production
●
If pg_control says “in production”, but db server is not running, then this instance
is eligible for a recovery!
PGDay.Asia 2017
WAL for DBAs
25
pg_control and REDO
●
pg_control is the critical piece
– Should not be corrupted
– Per docs: “...theoretically a weak spot”
●
REDO: All WAL files must be sequentially available for complete recovery.
PGDay.Asia 2017
WAL for DBAs
26
Moving to the new WAL
●
A WAL segment may be full
●
PostgreSQL archiver will switch to the new xlog, if PostgreSQL reaches
archive_timeout value.
●
DBA issues pg_switch_xlog() function.
PGDay.Asia 2017
WAL for DBAs
27
WAL: Archiving
•
Replication, backup, PITR
•
archive_mode
•
archive_command
•
archive_timeout
PGDay.Asia 2017
WAL for DBAs
28
WAL: Point-In-Time Recovery (PITR)
•
A base backup (pg_basebackup!) and the WAL files are needed.
•
WAL files must be sequentially complete – otherwise PITR won’t
be finished.
•
“Roll-forward recovery”
PGDay.Asia 2017
WAL for DBAs
29
WAL: Point-In-Time Recovery (PITR)
•
PITR: Replaying WAL files on base backups, until recovery
target.
•
recovery_target_{time, xid, name, lsn}
•
If not specified, all archived WAL files are replayed.
•
recovery.conf and backup_label: Enters recovery mode.
•
restore_command,
recovery_target_XXX,recovery_target_inclusive
•
backup_label: Also includes checkpoint location (starting point of
recovery)
•
Almost like regular recovery process (WAL replay)
•
Up to recovery_target_XXX is replayed.
PGDay.Asia 2017
WAL for DBAs
30
WAL: Point-In-Time Recovery (PITR)
•
After recovery process, timelineID is increased by 1 (also
physical WAL file name is also increased by 1)
•
A .history file is created.
•
$ cat 00000003.history
1 403F/58000098 no recovery target specified
2 4048/43000098 before 2017-01-28 11:13:21.124512+03
“WAL files were replayed until the given time above, and theit
replay location is 4048/43000098.
•
•
•
PGDay.Asia 2017
WAL for DBAs
31
Full page writes
●
A WAL record cannot be replayed on a page which is corrupted during
bgwriter and/or checkpointer, because of hardware failure, OS crash,
kernel failure, etc.
●
Full page writes IYF.
●
Enabled by default.
– Please turn it off, if you want to throw a lot of money to PostgreSQL
support companies. Otherwise, don’t do so ;)
●
PostgreSQL writes header data + the entire page as XLOG record,
when a page changes after every checkpoint.
– Increasing checkpoint_timeout helps.
– Full-page image, backup block.
●
PostgreSQL can even recover itself from write failures (not hw failures,
though)
PGDay.Asia 2017
WAL for DBAs
32
WAL parameters
•
wal_level: Minimal, replica or logical
•
Must be > minimal for archiver to be able to run
•
fsync : Always on, please.
•
synchronous_commit: May lose some of the latest transactions
•
Server returns success to the client
•
Server waits a bit to flush the data to durable storage.
•
Less risky than fsync
•
wal_sync_method : fdatasync is usually better. Use
pg_test_fsync for testing.
PGDay.Asia 2017
WAL for DBAs
33
WAL parameters
•
wal_log_hints: When this value is set to on , the server writes the entire content of
each disk page to WAL after a checkpoint and during the first modification of that
page, even for non-critical modifications of so-called hint bits.
•
wal_compression: off by default. Less WAL files, more CPU overhead.
•
wal_buffers: -1: Automatic tuning of wal buffers: 1/32 of shared_buffers (not less than
64kB or no more than 16 MB (1 WAL file)
•
wal_writer_delay : Rounds between WAL writer flushes WAL.
•
wal_writer_flush_after: New in 9.6
FOSDEM PGDAY 2017
WAL for DBAs
34
Questions, comments?
© 2013 EnterpriseDB Corporation. All rights reserved. 35
WAL for DBAs – Everything you
want to know
Devrim Gündüz
Principal Systems Engineer @ EnterpriseDB
devrim.gunduz@EnterpriseDB.com
Twitter : @DevrimGunduz

More Related Content

PDF
Linux tuning to improve PostgreSQL performance
PDF
Postgresql database administration volume 1
PPTX
Mendix Cloud Hosting on CloudFoundry
PDF
アーキテクチャから理解するPostgreSQLのレプリケーション
PDF
PostgreSQL Deep Internal
PDF
High Availability PostgreSQL with Zalando Patroni
PPTX
分散システムについて語らせてくれ
PDF
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Linux tuning to improve PostgreSQL performance
Postgresql database administration volume 1
Mendix Cloud Hosting on CloudFoundry
アーキテクチャから理解するPostgreSQLのレプリケーション
PostgreSQL Deep Internal
High Availability PostgreSQL with Zalando Patroni
分散システムについて語らせてくれ
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

What's hot (20)

PDF
PostgreSQL Performance Tuning
PPTX
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
PPTX
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
PDF
Mastering PostgreSQL Administration
 
PDF
Troubleshooting PostgreSQL Streaming Replication
PDF
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PDF
PostgreSQL and RAM usage
PDF
pg_walinspectについて調べてみた!(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
PDF
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
PDF
Ceph issue 해결 사례
PDF
Understanding PostgreSQL LW Locks
PDF
MySQL Administrator 2021 - 네오클로바
PDF
Deep dive into PostgreSQL statistics.
PDF
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
PPTX
PostGreSQL Performance Tuning
ODP
PostgreSQL Administration for System Administrators
PPTX
Introduction to MongoDB
PPTX
High Performance, High Reliability Data Loading on ClickHouse
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
PDF
[pgday.Seoul 2022] PostgreSQL with Google Cloud
PostgreSQL Performance Tuning
PostgreSQLのfull_page_writesについて(第24回PostgreSQLアンカンファレンス@オンライン 発表資料)
re:Invent 2022 DAT326 Deep dive into Amazon Aurora and its innovations
Mastering PostgreSQL Administration
 
Troubleshooting PostgreSQL Streaming Replication
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PostgreSQL and RAM usage
pg_walinspectについて調べてみた!(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Ceph issue 해결 사례
Understanding PostgreSQL LW Locks
MySQL Administrator 2021 - 네오클로바
Deep dive into PostgreSQL statistics.
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
PostGreSQL Performance Tuning
PostgreSQL Administration for System Administrators
Introduction to MongoDB
High Performance, High Reliability Data Loading on ClickHouse
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
[pgday.Seoul 2022] PostgreSQL with Google Cloud
Ad

Similar to PostgreSQL WAL for DBAs (20)

PDF
Postgres Vision 2018: WAL: Everything You Want to Know
 
PPTX
M|18 Battle of the Online Schema Change Methods
PDF
More Than Just The Tip Of The Iceberg.pdf
PDF
The Accidental DBA
PDF
How We Added Replication to QuestDB - JonTheBeach
PDF
Aplicações 10x a 100x mais rápida com o postgre sql
PDF
Patroni - HA PostgreSQL made easy
PDF
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
PDF
2008 Collaborate IOUG Presentation
PDF
Useful PostgreSQL Extensions
 
PDF
Oracle to Postgres Migration - part 2
PDF
High performance json- postgre sql vs. mongodb
DOCX
Oracle GoldenGate
PDF
PostgreSQL 9.5 - Major Features
PDF
GLOC 2014 NEOOUG - Oracle Database 12c New Features
PDF
Securefile LOBs
PDF
COUG_AAbate_Oracle_Database_12c_New_Features
PDF
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
PDF
Big data should be simple
PDF
Recent MariaDB features to learn for a happy life
Postgres Vision 2018: WAL: Everything You Want to Know
 
M|18 Battle of the Online Schema Change Methods
More Than Just The Tip Of The Iceberg.pdf
The Accidental DBA
How We Added Replication to QuestDB - JonTheBeach
Aplicações 10x a 100x mais rápida com o postgre sql
Patroni - HA PostgreSQL made easy
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
2008 Collaborate IOUG Presentation
Useful PostgreSQL Extensions
 
Oracle to Postgres Migration - part 2
High performance json- postgre sql vs. mongodb
Oracle GoldenGate
PostgreSQL 9.5 - Major Features
GLOC 2014 NEOOUG - Oracle Database 12c New Features
Securefile LOBs
COUG_AAbate_Oracle_Database_12c_New_Features
OSMC 2008 | PostgreSQL Monitoring - Introduction, Internals And Monitoring S...
Big data should be simple
Recent MariaDB features to learn for a happy life
Ad

More from PGConf APAC (20)

PDF
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PDF
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PDF
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PDF
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
PDF
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
PDF
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PDF
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PDF
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PDF
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PDF
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PDF
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PDF
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PDF
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
PDF
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
PDF
PGConf APAC 2018 - Tale from Trenches
PDF
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
PDF
Amazon (AWS) Aurora
PDF
Use Case: PostGIS and Agribotics
PDF
How to teach an elephant to rock'n'roll
PDF
PostgreSQL on Amazon RDS
PGConf APAC 2018: Sponsored Talk by Fujitsu - The growing mandatory requireme...
PGConf APAC 2018: PostgreSQL 10 - Replication goes Logical
PGConf APAC 2018 - Lightening Talk #3: How To Contribute to PostgreSQL
PGConf APAC 2018 - Lightening Talk #2 - Centralizing Authorization in PostgreSQL
Sponsored Talk @ PGConf APAC 2018 - Choosing the right partner in your Postgr...
PGConf APAC 2018 - A PostgreSQL DBAs Toolbelt for 2018
PGConf APAC 2018 - Patroni: Kubernetes-native PostgreSQL companion
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Where's Waldo - Text Search and Pattern in PostgreSQL
PGConf APAC 2018 - Managing replication clusters with repmgr, Barman and PgBo...
PGConf APAC 2018 - PostgreSQL HA with Pgpool-II and whats been happening in P...
PGConf APAC 2018 - PostgreSQL performance comparison in various clouds
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
PGConf APAC 2018 - Tale from Trenches
PGConf APAC 2018 Keynote: PostgreSQL goes eleven
Amazon (AWS) Aurora
Use Case: PostGIS and Agribotics
How to teach an elephant to rock'n'roll
PostgreSQL on Amazon RDS

Recently uploaded (20)

PPTX
assetexplorer- product-overview - presentation
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
top salesforce developer skills in 2025.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Nekopoi APK 2025 free lastest update
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Design an Analysis of Algorithms I-SECS-1021-03
assetexplorer- product-overview - presentation
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
top salesforce developer skills in 2025.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Reimagine Home Health with the Power of Agentic AI​
iTop VPN Free 5.6.0.5262 Crack latest version 2025
Operating system designcfffgfgggggggvggggggggg
CHAPTER 2 - PM Management and IT Context
PTS Company Brochure 2025 (1).pdf.......
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Nekopoi APK 2025 free lastest update
Navsoft: AI-Powered Business Solutions & Custom Software Development
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Transform Your Business with a Software ERP System
Designing Intelligence for the Shop Floor.pdf
Design an Analysis of Algorithms I-SECS-1021-03

PostgreSQL WAL for DBAs

  • 1. © 2013 EnterpriseDB Corporation. All rights reserved. 1 WAL for DBAs – Everything you want to know Devrim Gündüz Principal Systems Engineer @ EnterpriseDB [email protected] Twitter : @DevrimGunduz
  • 2. PGDay.Asia 2017 WAL for DBAs 2 ● Who is this guy? – Using Red Hat (and then Fedora) since 1996. – Using PostgreSQL since 1998. – Responsible for PostgreSQL YUM repository. ● Used to break website, but recently gave up. – Started some work on PostgreSQL Dockerfiles recently. https://www.pgdocker.org – Working at EnterpriseDB since 2011. – The Guy With The PostgreSQL Tattoo! (imitations may exist :) ) – Istanbul, Turkiye. About me
  • 3. FOSDEM PGDAY 2017 WAL for DBAs 3 ● Please tweet! – #PostgreSQL – #pgday – #pgdayasia Social media
  • 4. FOSDEM PGDAY 2017 WAL for DBAs 4 (Did you tweet? Thanks!) Social media
  • 5. FOSDEM PGDAY 2017 WAL for DBAs 5 (Did you tweet? Thanks!) Alternative hashtag: #blamemagnus Social media
  • 6. FOSDEM PGDAY 2017 WAL for DBAs 6 June 26-28, Boston We want to see great speakers who can talk to the technical aspects of using Postgres in the enterprise. http://postgresvision.com/ Postgres Vision 2017
  • 7. PGDay.Asia 2017 WAL for DBAs 7 Agenda (in random order) • What is WAL? • What does it include? • How to read it? • What about wal_level ? • Replication and WAL • Backup and WAL • PITR and WAL • Other topics
  • 8. FOSDEM PGDAY 2017 WAL for DBAs 8 Please do not delete WAL files manually. Please. Before we actually start:
  • 9. FOSDEM PGDAY 2017 WAL for DBAs 9 Please do not delete WAL files manually. Please. Please. Before we actually start:
  • 10. FOSDEM PGDAY 2017 WAL for DBAs 10 Please do not delete WAL files manually. Please. Please. PLEASE. Before we actually start:
  • 11. PGDay.Asia 2017 WAL for DBAs 11 What is WAL? • Write Ahead Log: • Logging of transactions • a.k.a. xlog (transaction log), • 16 MB in most of the installations (can be configured, --with- wal-segsize) • 8 kB page size (can be configured, --with-wal-blocksize) • pg_xlog (<= 9.6)→ pg_wal (10+) • Because people deleted files under “log” directory.
  • 12. FOSDEM PGDAY 2017 WAL for DBAs 12 MAKE WAL GREAT AGAIN! So:
  • 13. PGDay.Asia 2017 WAL for DBAs 13 What is WAL? • Designed to prevent data loss in most of the situations • OS crash, hardware failure, PostgreSQL crash. • Write transactions are written to WAL • Before transaction result is sent to the client • Data files are not changed on each transaction • Performance benefit • Should be kept in a separate drive. • Initdb, or symlink
  • 14. PGDay.Asia 2017 WAL for DBAs 14 What is WAL? • Built-in feature • Life before WAL (not before B.C., though): • All changes go to durable storage (eventually), but: • Data page is loaded to shared_buffers • Changes are made there • Dirty buffers! • But not timely! • Crash → Data loss!
  • 15. PGDay.Asia 2017 WAL for DBAs 15 What is WAL? • Life after WAL: • Almost all “modifications” are “logged” to WAL files (xlog record) • Even if the transaction is aborted (ROLLBACK) • wal_buffers → WAL segments (files) • Ability to recover data after a crash • Checkpoint!
  • 16. PGDay.Asia 2017 WAL for DBAs 16 Where is it used? • Transaction logging! • Replication • PITR • REDO • Sequentially availability is a must. • REDO vs UNDO • No REDO for temp tables and unlogged tables.
  • 17. PGDay.Asia 2017 WAL for DBAs 17 WAL file naming • 24 chars, hex. • 1st 8 chars: timelineID • 00000001 is the timelineID created by initdb • 2nd 8 chars: logical WAL file • 3rd 8 chars: WAL segment name (physical WAL file) • 000000010000000000000001 → 000000010000000000000002 • … 0000000100000000000000FF → 000000010000000100000000 • Use PostgreSQL’s internal tools to manage them • pg_archivecleanup • pg_resetxlog • pg_xlogdump • ....
  • 18. PGDay.Asia 2017 WAL for DBAs 18 pg_xlogdump • We are all human. • Use pg_xlogdump, if you want to see contents of WAL files • rmgr --help to get list of all resource names, -f for follow, -n for limit. -z for stats. • pg_xlogdump -n 20 -f 000000010000000700000033 • rmgr: Heap len (rec/tot): 3/ 59, tx: 389744, lsn: 7/33B66228, prev 7/33B661F0, desc: INSERT+INIT off 1, blkref #0: rel 1663/13326/190344 blk 0 • rmgr: Heap len (rec/tot): 3/ 59, tx: 389744, lsn: 7/33B66268, prev 7/33B66228, desc: INSERT off 2, blkref #0: rel 1663/13326/190344 blk 0 • rmgr: Transaction len (rec/tot): 8/ 34, tx: 389744, lsn: 7/33B662A8, prev 7/33B66268, desc: COMMIT 2017-02- 03 03:03:49.482223 +03 • rmgr: Heap len (rec/tot): 14/ 69, tx: 389745, lsn: 7/33B662D0, prev 7/33B662A8, desc: HOT_UPDATE off 1 xmax 389745 ; new off 3 xmax 0, blkref #0: rel 1663/13326/190344 blk 0 • rmgr: Transaction len (rec/tot): 8/ 34, tx: 389745, lsn: 7/33B66318, prev 7/33B662D0, desc: COMMIT 2017-02- 03 03:03:54.091645 +03 • rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 7/33B66340, prev 7/33B66318, desc: CHECKPOINT_ONLINE redo 7/33B66340; tli 1; prev tli 1; fpw true; xid 0/389746; oid 198532; multi 1; offset 0; oldest xid 1866 in DB 129795; oldest multi 1 in DB 90123; oldest/newest commit timestamp xid: 388437/389745; oldest running xid 0; online • rmgr: XLOG len (rec/tot): 0/ 24, tx: 0, lsn: 7/33B663B0, prev 7/33B66340, desc: SWITCH •
  • 19. PGDay.Asia 2017 WAL for DBAs 19 Shared Buffers, Bgwriter and checkpointer • shared_buffers in PostgreSQL • Dirty buffers • This is where transactions are performed • Side effect: Causes inconsistency(?) on durable storage, due to dirty buffers. • Bgwriter: Background writer • LRU • Checkpointer • Pushing all dirty buffers to durable storage • Triggered automatically or manually • Backends may also write data to heap
  • 20. PGDay.Asia 2017 WAL for DBAs 20 WAL: LSN • Log Sequence Number • Position of the record in WAL file. • Provides uniqueness for each xlog record. • Per docs: “Pointer to a location in WAL file” • During recovery, LSN on the page and LSN in the WAL file are compared. • The larger one wins.
  • 21. PGDay.Asia 2017 WAL for DBAs 21 ● Probably not the last one in ls list! WAL: Finding current WAL file • postgres=# SELECT * from pg_current_xlog_location(); pg_current_xlog_location -------------------------- 40E6/2C85AC10 • postgres=# SELECT pg_xlogfile_name(‘40E6/2C85AC10'); pg_xlogfile_name -------------------------- 00000003000040E60000002C So: • postgres=# SELECT pg_xlogfile_name(pg_current_xlog_location()); pg_xlogfile_name -------------------------- 00000003000040E60000002C
  • 22. PGDay.Asia 2017 WAL for DBAs 22 Checkpoint, and pg_control ● As soon as the checkpoint starts, REDO point is stored in shared buffers. ● A WAL record is created referencing checkpoint start, and it is first written to WAL buffers, and then eventually to pg_control. – pg_control is under $PGDATA/global ● Unlike bgwriter, checkpointer writes all of the data in the shared_buffers to durable storage. ● PostgreSQL knows the latest REDO point, by looking at pg_control file.
  • 23. PGDay.Asia 2017 WAL for DBAs 23 Checkpoint, and pg_control ● pg_controldata: – Latest checkpoint location: 40E7/E43B16B8 – Prior checkpoint location: 40E7/D8689090 They are LSN. ● When checkpoint is completed, pg_control is updated with the position of checkpoint. ● After checkpoint, old WAL files are either recycled, or removed. ● An “estimation” is done while recycling (based on previous checkpoint cycles) ● 9.5+: In minimum, min_wal_size WAL files are always recycled for future usage
  • 24. PGDay.Asia 2017 WAL for DBAs 24 pg_control and REDO ● postmaster reads pg_control on startup. /usr/pgsql-10/bin/pg_controldata -D /var/lib/pgsql/10/data | grep state – “Database cluster state”: ● starting up ● shut down ● shut down in recovery ● shutting down ● in crash recovery ● in archive recovery ● in production ● If pg_control says “in production”, but db server is not running, then this instance is eligible for a recovery!
  • 25. PGDay.Asia 2017 WAL for DBAs 25 pg_control and REDO ● pg_control is the critical piece – Should not be corrupted – Per docs: “...theoretically a weak spot” ● REDO: All WAL files must be sequentially available for complete recovery.
  • 26. PGDay.Asia 2017 WAL for DBAs 26 Moving to the new WAL ● A WAL segment may be full ● PostgreSQL archiver will switch to the new xlog, if PostgreSQL reaches archive_timeout value. ● DBA issues pg_switch_xlog() function.
  • 27. PGDay.Asia 2017 WAL for DBAs 27 WAL: Archiving • Replication, backup, PITR • archive_mode • archive_command • archive_timeout
  • 28. PGDay.Asia 2017 WAL for DBAs 28 WAL: Point-In-Time Recovery (PITR) • A base backup (pg_basebackup!) and the WAL files are needed. • WAL files must be sequentially complete – otherwise PITR won’t be finished. • “Roll-forward recovery”
  • 29. PGDay.Asia 2017 WAL for DBAs 29 WAL: Point-In-Time Recovery (PITR) • PITR: Replaying WAL files on base backups, until recovery target. • recovery_target_{time, xid, name, lsn} • If not specified, all archived WAL files are replayed. • recovery.conf and backup_label: Enters recovery mode. • restore_command, recovery_target_XXX,recovery_target_inclusive • backup_label: Also includes checkpoint location (starting point of recovery) • Almost like regular recovery process (WAL replay) • Up to recovery_target_XXX is replayed.
  • 30. PGDay.Asia 2017 WAL for DBAs 30 WAL: Point-In-Time Recovery (PITR) • After recovery process, timelineID is increased by 1 (also physical WAL file name is also increased by 1) • A .history file is created. • $ cat 00000003.history 1 403F/58000098 no recovery target specified 2 4048/43000098 before 2017-01-28 11:13:21.124512+03 “WAL files were replayed until the given time above, and theit replay location is 4048/43000098. • • •
  • 31. PGDay.Asia 2017 WAL for DBAs 31 Full page writes ● A WAL record cannot be replayed on a page which is corrupted during bgwriter and/or checkpointer, because of hardware failure, OS crash, kernel failure, etc. ● Full page writes IYF. ● Enabled by default. – Please turn it off, if you want to throw a lot of money to PostgreSQL support companies. Otherwise, don’t do so ;) ● PostgreSQL writes header data + the entire page as XLOG record, when a page changes after every checkpoint. – Increasing checkpoint_timeout helps. – Full-page image, backup block. ● PostgreSQL can even recover itself from write failures (not hw failures, though)
  • 32. PGDay.Asia 2017 WAL for DBAs 32 WAL parameters • wal_level: Minimal, replica or logical • Must be > minimal for archiver to be able to run • fsync : Always on, please. • synchronous_commit: May lose some of the latest transactions • Server returns success to the client • Server waits a bit to flush the data to durable storage. • Less risky than fsync • wal_sync_method : fdatasync is usually better. Use pg_test_fsync for testing.
  • 33. PGDay.Asia 2017 WAL for DBAs 33 WAL parameters • wal_log_hints: When this value is set to on , the server writes the entire content of each disk page to WAL after a checkpoint and during the first modification of that page, even for non-critical modifications of so-called hint bits. • wal_compression: off by default. Less WAL files, more CPU overhead. • wal_buffers: -1: Automatic tuning of wal buffers: 1/32 of shared_buffers (not less than 64kB or no more than 16 MB (1 WAL file) • wal_writer_delay : Rounds between WAL writer flushes WAL. • wal_writer_flush_after: New in 9.6
  • 34. FOSDEM PGDAY 2017 WAL for DBAs 34 Questions, comments?
  • 35. © 2013 EnterpriseDB Corporation. All rights reserved. 35 WAL for DBAs – Everything you want to know Devrim Gündüz Principal Systems Engineer @ EnterpriseDB [email protected] Twitter : @DevrimGunduz