Yann Neuhaus

Subscribe to Yann Neuhaus feed Yann Neuhaus
Updated: 10 min 15 sec ago

SQL Server: Manage large data ranges using partitioning

Mon, 2024-02-26 10:59
Introduction: 

When it comes to moving ranges of data with many rows across different tables in SQL-Server, the partitioning functionality of SQL-Server can provide a good solution for manageability and performance optimizing. In this blog we will look at the different advantages and the concept of partitioning in SQL-Server.

Concept overview: 

In SQL-Server, partitioning can divide the data of an index or table into smaller units. These units are called partitions. For that purpose, every row is assigned to a range and every range in turn is assigned to a specific partition. Practically there are two main components: The partition function and the partition scheme.  

The partition function defines the range borders through boundary values and thus the number of partitions, in consideration with the data values, as well. You can define a partition function either as “range right” or “range left”. The main difference is how the boundary value gets treated. In a range right partition function, the boundary value is the first value of the next partition while in a range left partition function the boundary value is the last value of the previous partition. For example: 

We want to partition a table by year and the datatype of the column where we want to apply the partition function has the datatype “date”. Totally we have entries for the year 2023 and 2024 which means, we want 2x partitions. In a range right function, the boundary value must be the first day of the year 2024 whereas in a range left function the boundary value must be the last day of the year 2023.  

See example below: 

 Partition right  Partition left

The partition scheme is used to map the different partitions, which are defined through the partition function, to multiple or one filegroup.

Main benefits of partitioning:

There are multiple scenarios where performance or manageability of a data model can be increased through partitioning. The main advantage of partitioning is that it reduces the contention on the whole table as a database object and restricts it to the partition level when performing operations on the corresponding data range. Partitioning also facilitates data transfer with the “switch partition” statement, this statement performs a switch-in or switch-out of o whole partition. Through that, a large amount of data can be transferred very quickly.

Demo Lab:

For demo purposes I created the following script, which will create three tables with 5 million rows of historical data from 11 years in the past until today:

USE [master] 
GO 
 
--Create Test Database 
CREATE DATABASE [TestPartition] 
GO 
 
--Change Recovery Model 
ALTER DATABASE [TestPartition] SET RECOVERY SIMPLE WITH NO_WAIT 
GO 
 
--Create Tables 
Use [TestPartition] 
GO 
 
CREATE TABLE [dbo].[Table01_HEAP]( 
[Entry_Datetime] [datetime] NOT NULL, 
[Entry_Text] [nvarchar](50) NULL 
) 
GO 
 
CREATE TABLE [dbo].[Table01_CLUSTEREDINDEX]( 
[Entry_Datetime] [datetime] NOT NULL, 
[Entry_Text] [nvarchar](50) NULL 
) 
GO 
 
CREATE TABLE [dbo].[Table01_PARTITIONED]( 
[Entry_Datetime] [datetime] NOT NULL, 
[Entry_Text] [nvarchar](50) NULL 
) 
GO 
 
--GENERATE DATA 
 
declare @date as datetime 
 
declare @YearSubtract int 
declare @DaySubtract int 
declare @HourSubtract int 
declare @MinuteSubtract int  
declare @SecondSubtract int  
declare @MilliSubtract int 
 
--Specifiy how many Years backwards data should be generated 
declare @YearsBackward int 
set @YearsBackward = 11 
 
--Specifiy how many rows of data should be generated 
declare @rows2generate int 
set @rows2generate = 5000000 
 
 
declare @counter int 
set @counter = 1 
 
--generate data entries 
while @counter <= @rows2generate  
begin 
 
--Year 
Set @YearSubtract = floor(rand() * (@YearsBackward - 0 + 1)) + 0 
--Day 
Set @DaySubtract = floor(rand() * (365 - 0 + 1)) + 0 
--Hour 
Set @HourSubtract = floor(rand() * (24 - 0 + 1)) + 0 
--Minute 
Set @MinuteSubtract = floor(rand() * (60 - 0 + 1)) + 0 
--Second 
Set @SecondSubtract = floor(rand() * (60 - 0 + 1)) + 0 
--Milisecond 
Set @MilliSubtract = floor(rand() * (1000 - 0 + 1)) + 0 
 
 
set @date = Dateadd(YEAR, -@YearSubtract , Getdate()) 
set @date = Dateadd(DAY, -@DaySubtract , @date) 
set @date = Dateadd(HOUR, -@HourSubtract , @date) 
set @date = Dateadd(MINUTE, -@MinuteSubtract , @date) 
set @date = Dateadd(SECOND, -@SecondSubtract , @date) 
set @date = Dateadd(MILLISECOND, @MilliSubtract , @date) 
 
insert into Table01_HEAP (Entry_Datetime, Entry_Text) 
Values (@date, 'This is a entry from ' + convert(nvarchar, @date, 29)) 
 
set @counter = @counter + 1 
 
end 
 
--COPY DATA TO OTHER TABLES 
 
INSERT INTO dbo.Table01_CLUSTEREDINDEX 
  (Entry_Datetime, Entry_Text) 
SELECT Entry_Datetime, Entry_Text 
  FROM Table01_HEAP 
 
INSERT INTO dbo.Table01_PARTITIONED 
  (Entry_Datetime, Entry_Text) 
SELECT Entry_Datetime, Entry_Text 
  FROM Table01_HEAP 
 
--Create Clustered Indexes for dbo.Table01_CLUSTEREDINDEX and dbo.Table01_PARTITIONED 
 
CREATE CLUSTERED INDEX [ClusteredIndex_Table01_CLUSTEREDINDEX] ON [dbo].[Table01_CLUSTEREDINDEX] 
( 
[Entry_Datetime] ASC 
 
) on [PRIMARY] 
GO 
 
CREATE CLUSTERED INDEX [ClusteredIndex_Table01_PARTITIONED] ON [dbo].[Table01_PARTITIONED] 
( 
[Entry_Datetime] ASC 
 
) on [PRIMARY] 
GO 

The tables have the same data in it. The difference between the tables is, that one is a heap, one has clustered index and one has a clustered Index which will be partitioned in the next step:

 Generated data

After the tables are created with the corresponding data and indexes, the partition function and scheme must be created. This was done by the following script:

-- Create Partition Function as range right for every Year -10 Years 
Create Partition Function [myPF01_datetime] (datetime) 
AS Range Right for Values ( 
DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) + 0, 0), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 1, 0), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 2, 0), 
DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 3, 0), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 4, 0), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 5, 0),  
DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 6, 0), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 7, 0), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 8, 0),  
DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 9, 0), DATEADD(yy, DATEDIFF(yy, 0, GETDATE()) - 10, 0) 
); 
GO 
 
-- Create Partition Scheme for Partition Function myPF01_datetime 
CREATE PARTITION SCHEME [myPS01_datetime] 
AS PARTITION myPF01_datetime ALL TO ([PRIMARY]) 
GO 

I have used the DATEADD() function in combination with the DATEDIFF() function to retrieve the first millisecond of the year as datetime data type and that for the last 10 years and used this as range right boundary values. For sure it is also possible to hard code the boundary values like ‘2014-01-01 00:00:00.000’ but I prefer to keep it as dynamically as possible. At the end it is the same result:

 Select Dateadd - function

After creating the partition function, I have created the partition scheme. The partition scheme is mapped to the partition function. In my case I assign every partition to the primary filegroup. It is also possible to split the partitions across multiple filegroups.

As far as the partition function and scheme are created successfully it can be applied to the existing table: Table01_PARTITIONED. For achieving that, the clustered index of the table must be recreated on the partition scheme instead of the primary filegroup: 

-- Apply partitiononing on Table: Table01_PARTITIONED through recreating the Tables Clustered Index ClusteredIndex_Table01_PARTITIONED on Partition Scheme myPS01_datetime 
CREATE CLUSTERED INDEX [ClusteredIndex_Table01_PARTITIONED] ON [dbo].[Table01_PARTITIONED] 
( 
[Entry_Datetime] ASC 
 
) with (DROP_EXISTING = ON) on myPS01_datetime(Entry_Datetime);  
GO 

After doing that, the Table Table01_PARTITIONED has multiple partitions while the other tables have still only one partition: 

 Partitions of partitioned table  Partitions of clustered index table

There are at all 12 partitions for every year between 2014 and 2024 as well as one for every entry which has an earlier datetime than 2014-01-01 00:00:00.000 and one for every entry that has a later datetime value than 2024-01-01 00:00:00.000 while partition nr. 1 has the earliest data and partition nr. 12 has the latest data in it. See below: 

 Content of partition 1  content of partition 12 DEMO Tests:

First, I want to compare the performance when moving outdated data, which is older than 2014-01-01 00:00:00.000, from the table itself to a history table. For that purpose, I created a history table with the same data structure as the table Table01_CLUSTEREDINDEX:

Use [TestPartition] 
GO 
 
--Create History Table 
CREATE TABLE [dbo].[Table01_HISTORY01]( 
[Entry_Datetime] [datetime] NOT NULL, 
[Entry_Text] [nvarchar](50) NULL 
) 
GO 
 
--Create Clustered Indexes for dbo.Table01_HISTORY01 
CREATE CLUSTERED INDEX [ClusteredIndex_Table01_HISTORY01] ON [dbo].[Table01_HISTORY01] 
( 
[Entry_Datetime] ASC 
 
) on [PRIMARY] 
GO 

I am starting first with the table with the clustered index with a classic “insert into select” statement:

 Select insert data into history

We can see that we have 10932 reads in total and a total query run time of 761 milliseconds.

 Execution plan select insert

In the execution plan, we can see that a classical Index seek operation occurred. Which means, the database engine seeked for every row which has a datetime value previous to 2014-01-01 00:00:00.000 and wrote it into the history table.

For the delete operation we can see similar results:

 delete rows  Delete rows

Totally 785099 rows where moved and we have in the table Table01_CLUSTEREDINDEX no older entries than 2014-01-01 00:00:00.000 anymore:

 Verify table content

Next let us compare the data movement when using a “switch partition” statement. For switching a partition from a partitioned source table to a nonpartitioned destination table, we need to use the partition number of the source table. For that I run the following query:

 Switch partition

We can see that the partition number 1 was moved within 2 milliseconds. Compared to the previous query where it took 761 milliseconds for inserting the data and an additional 596 milliseconds for deleting the data, the switch partition operation is obviously much faster. But why is this the case? – that’s because switching partitions is a metadata operation. It does not seeking through an index (or even worse – scanning a table) and write every row one by one, instead it changes the metadata of the partition and remaps the partition to the target table. 

And as we can see, we have the same result:

 verify table content

Another big advantage is when it comes to deleting a whole data range. For example: Let us delete the entries of the year 2017 – we do not need them anymore.

For the table with the clustered Index, we must use a statement like this:

 delete operation

We can see that we have here a query runtime of 355 milliseconds and 68351 page reads in total for the delete operation with the clustered index. 

For the partitioned table instead, we can use a truncate operation on the specific partition. That’s because the partition is treated as a own physical unit and can for that be truncated.

And as we should know: Truncating is much faster, because this operation is deallocating the pages and writes only one transaction log entry for the page deallocation while a delete operation is going row by row and writes every row deletion in the transaction log.

So, let us try: The year 2017 is 7 years back so let us verify, that the right data range will be deleted:

 verify partition content

We can see with the short query above: 7 Years back, that would be the partition nr. 5 and the data range seems to be right.  So, let us truncate:

 Truncate partition

And we can see to truncate all the entries from the year 2017, the database engine took 1 millisecond compared to the 355 seconds for the delete operation again much faster.

Next: let’s see, how we can change the lock behavior of SQL-Server through partitioning. For that I ran the following update query for updating every entry text for dates which are younger than May 2018:

 Update data entries

While the update operation above was running, I queried the DMV sys.dm_tran_locks in another session for checking the locks my update operation above is holding: 

 lock contention

And we can see that we have a lot of page locks and also an exclusive lock on the object itself (in this case the Table01_HEAP).  That is because of SQL-Servers lock escalation behavior.

I ran the same update operation on the partitioned table but before I changed the lock escalation setting of the table from default value “table” to “auto”. This is necessary for enable locking on partition level: 

 Update lock escalation

And when I’m querying the dmv again while the update operation above is running, I get the following result:

 lock contention

We can see that we have no exclusive look on abject level anymore, we have an intended exclusive look, which will not prevent other transactions from accessing the data (as far as it has no other look on a more granular level). Instead, we have multiple exclusive looks on multiple resources called HOBT. And when we take a look at the “resource_associated_entity_id” and using them for querying the sys.partitions table, we can see the following information’s: 

 locked partitions

These resources locked through the update operation on the partitioned table are the partitions associated with the table. So, SQL-Server locked the partitions instead of locking the whole table. This has the advantage that locking happens in a more granular context which prevents lock contention on the table itself. 

Conclusion:

Partitioning can be a very powerful and useful functionality in SQL-Server when used in an appropriate situation. Especially when it comes to regular operations on whole data ranges, partitioning can be used for enhancing performance and manageability. With partitioning, it’s also possible to distribute the data of a table over multiple files groups. Additionally with splitting and merging partitions it’s possible to maintain partitions for growing or shrinking data.

L’article SQL Server: Manage large data ranges using partitioning est apparu en premier sur dbi Blog.

Kubernetes Networking by Using Cilium – Intermediate Level – Traditional Linux Routing

Mon, 2024-02-26 02:34

Welcome back in this blog post series about Kubernetes Networking by using Cilium. In the previous post about network interfaces, we’ve looked at how we can identify all the interfaces that will be involved in the routing between pods. I’ve also explained the routing in a Kubernetes cluster with Cilium in a non technical language in this blog post. Let’s now see it into actions for the techies!

Below is the drawing of where we left off:

We will continue to use the same method, you are the packet that will travel from your apartment (pod) 10.10.2.117 on the top left to other pods in this Kubernetes cluster. But first, let’s take this opportunity to talk about namespace and enrich our drawing with a new analogy.

Routing between namespaces

A namespace is a logical group of objects that provide isolation in the cluster. However, by default, all pods can communicate together in a “vanilla” Kubernetes. Whatever they belong to the same namespace or not. So this isolation provided by namespace doesn’t mean the pods can’t communicate together. To allow or deny such communication, you will need to create network policies. That could be the topic for another blog post!

We can use the analogy of a namespace being the same floor number of all building of our cluster. All apartments on the same floor in each building will be logically grouped into the same namespace. This is what we can see below in our namespace called networking101:

$ kubectl get po -n networking101 -owide
NAME                        READY   STATUS    RESTARTS       AGE    IP            NODE                NOMINATED NODE   READINESS GATES
busybox-c8bbbbb84-fmhwc     1/1     Running   1 (125m ago)   4d1h   10.10.1.164   mycluster-worker2   <none>           <none>
busybox-c8bbbbb84-t6ggh     1/1     Running   1 (125m ago)   4d1h   10.10.2.117   mycluster-worker    <none>           <none>
netshoot-7d996d7884-fwt8z   1/1     Running   0              103m   10.10.2.121   mycluster-worker    <none>           <none>
netshoot-7d996d7884-gcxrm   1/1     Running   0              103m   10.10.1.155   mycluster-worker2   <none>           <none>

That’s our 4 apartments / pods on the same floor, grouped together in one namespace:

The routing process doesn’t care about the pod’s namespace, only its destination IP Address will be used. Let’s now see how we can go from the apartment 10.10.2.117 to the apartment 10.10.2.121 in the same building (node).

Pod to pod routing on the same node

From the pod 10.10.2.117, you’ve then decided to go to pay a visit to 10.10.2.121. You first look at the routing table in order to know how to reach this destination. But you can’t go out if you don’t also have the MAC Address of your destination. You need both destination information (IP Address and MAC Address) before you can start to travel. You then look at the ARP table to find out this information. The ARP table contains the known mapping of a MAC Address to an IP Address in your IP subnet. If it is not there, you send first a scout to knock at the door of each apartment in your community until you find the MAC Address of your destination. This is called the ARP request. When the scout comes back with that information, you write it into the ARP table. You thank the scout for his help and are now ready to start your travel by exiting the pod.

Let’s see how we can trace this in our source pod 10.10.2.117

$ kubectl exec -it -n networking101 busybox-c8bbbbb84-t6ggh -- ip route
default via 10.10.2.205 dev eth0
10.10.2.205 dev eth0 scope link

Very simple routing instruction! For every destination, you go through 10.10.2.205 by using your only network interface eth0 in the pod. You can see from the drawing above that 10.10.2.205 is the IP Address of the cilium_host. You then check your ARP table:

$ kubectl exec -it -n networking101 busybox-c8bbbbb84-t6ggh -- arp -a

The arp -a command list the content of the ARP table and we can see there is nothing in there.

A way to send a scout out is by using the arping tool toward our destination. You may have noticed that for my pods I’m using busybox and netshoot images. Both provide networking tools that are useful for troubleshooting:

$ kubectl exec -it -n networking101 busybox-c8bbbbb84-t6ggh -- arping 10.10.2.121
ARPING 10.10.2.121 from 10.10.2.117 eth0
Unicast reply from 10.10.2.121 [d6:21:74:eb:67:6b] 0.028ms
Unicast reply from 10.10.2.121 [d6:21:74:eb:67:6b] 0.092ms
Unicast reply from 10.10.2.121 [d6:21:74:eb:67:6b] 0.123ms
^CSent 3 probe(s) (1 broadcast(s))
Received 3 response(s) (0 request(s), 0 broadcast(s))

We now have the piece of information that was missing, the MAC address of our destination. We can then just check it is written into our ARP table of our source pod:

$ kubectl exec -it -n networking101 busybox-c8bbbbb84-t6ggh -- arp -a
? (10.10.2.205) at d6:21:74:eb:67:6b [ether]  on eth0

Here it is! However you may wonder why we don’t see here the IP Address of our destination 10.10.2.121 right? In traditional networking this is what you will see but here we are in a Kubernetes cluster and we are using Cilium that is taking care of the networking in it. Also we have seen above from the routing table of the source pod that for every destination we go to this cilium_host interface.

So the cilium_host on that node is attracting all the traffic even for communication between pods in the same IP subnet.

As a side note, below is a command where you can quickly display all the IP Addresses of the cilium_host and the nodes in your cluster in one shot:

$ kubectl get ciliumnodes
NAME                      CILIUMINTERNALIP   INTERNALIP   AGE
mycluster-control-plane   10.10.0.54         172.18.0.3   122d
mycluster-worker          10.10.2.205        172.18.0.2   122d
mycluster-worker2         10.10.1.55        172.18.0.4   122d

In traditional networking, doing L2 switching, the MAC Address of the destination is the one related to the destination IP Address. That is not the case here in Kubernetes networking. So which interface has the MAC Address d6:21:74:eb:67:6b ? Let’s respond to that question immediately:

$ sudo docker exec -it mycluster-worker ip a | grep -iB1 d6:21:74:eb:67:6b
9: lxc4a891387ff1a@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d6:21:74:eb:67:6b brd ff:ff:ff:ff:ff:ff link-netns cni-67a5da05-a221-ade5-08dc-64808339ad05

That is the LXC interface of the node as it is indeed our next step from the source pod to reach our destination. You’ve learned from my first post blog of this networking series that there is a servant waiting here at the LXC interface to direct us toward our destination.

From there, we don’t see much of the travel to the destination from the traditional Linux routing point of view. This is because the routing is done by the Cilium agent using eBPF. As the destination is in the same IP subnet as the source, the Cilium agent just switch it directly to the destination LXC interface and then reach the destination pod.

When the destination pod responds to the source, the same process occurs and for the sake of completeness let’s look at the routing table and ARP table in the destination pod:

$ kubectl exec -it -n networking101 netshoot-7d996d7884-fwt8z -- ip route
default via 10.10.2.205 dev eth0 mtu 1450
10.10.2.205 dev eth0 scope link

$ kubectl exec -it -n networking101 netshoot-7d996d7884-fwt8z -- arp -a
? (10.10.2.205) at 92:65:df:09:dd:28 [ether]  on eth0

$ sudo docker exec -it mycluster-worker ip a | grep -iB1 92:65:df:09:dd:28
13: lxce84a702bb02c@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 92:65:df:09:dd:28 brd ff:ff:ff:ff:ff:ff link-netns cni-d259ef79-a81c-eba6-1255-6e46b8d1c779

So from the traditional Linux routing point of view, everything goes to the cilium_host and the destination MAC address is the LXC interface of the node that is linked to our pod. This is exactly the same we have seen with our source pod.

Pod to pod routing on a different node

Let’s now have a look at how we could reach the pod 10.10.1.155 from the source pod 10.10.2.117 which is hosted in another node. The routing is the same at the beginning but when talking to the servant at the LXC interface, he sees that the destination IP Address doesn’t belong to the same IP subnet and so directs us to the cilium_host in the Lobby. From there we are routed to the cilium_vxlan interface to reach the node that host our destination pod.

Let’s now have a look at the routing table of the host:

$ sudo docker exec -it mycluster-worker ip route
default via 172.18.0.1 dev eth0
10.10.0.0/24 via 10.10.2.205 dev cilium_host proto kernel src 10.10.2.205 mtu 1450
10.10.1.0/24 via 10.10.2.205 dev cilium_host proto kernel src 10.10.2.205 mtu 1450
10.10.2.0/24 via 10.10.2.205 dev cilium_host proto kernel src 10.10.2.205
10.10.2.205 dev cilium_host proto kernel scope link
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2

We don’t see much here as the routing is using eBPF and is managed by the Cilium agent as we’ve seen before.

As a side note and to share everything with you, the output of the network interfaces as well as the ip route in the Cilium agent pod is identical to the one of the node. This is because at startup the Cilium agent provides these information to the node. You can check the Cilium agent with the following commands:

$ kubectl exec -it -n kube-system cilium-dprvh -- ip a
$ kubectl exec -it -n kube-system cilium-dprvh -- ip route

So you go through the VXLAN tunnel and you reach the node mycluster-worker2. Here is the routing table of this node:

$ sudo docker exec -it mycluster-worker2 ip route
default via 172.18.0.1 dev eth0
10.10.0.0/24 via 10.10.1.55 dev cilium_host proto kernel src 10.10.1.55 mtu 1450
10.10.1.0/24 via 10.10.1.55 dev cilium_host proto kernel src 10.10.1.55
10.10.1.55 dev cilium_host proto kernel scope link
10.10.2.0/24 via 10.10.1.55 dev cilium_host proto kernel src 10.10.1.55 mtu 1450
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.4

Again from the traditional Linux routing point of view there isn’t much to see, except that all the traffic for the pods subnet are going to the cilium_host that is managed by the Cilium agent. This is identical as what we’ve learned in the other node. When we reach the cilium_vxlan interface, a servant is waiting for us with his magical eBPF map and directs us through a secret passage to the LXC corridor interface of the top left pod where we can reach our destination.

Wrap up

We’ve explored all that can be seen in routing from the traditional Linux point of view by using the common networking tools.

Maybe you feel frustrated to not understand it completely because there are some gaps in this step-by-step packet routing? Cilium uses eBPF for routing the packets so it adds some complexity to the routing understanding. However it is much faster than the traditional Linux routing due to the secret passages opened by the eBPF servants.

If you want to know more about this, don’t miss my next blog post where I’ll dive deep into the meanders of eBPF routing. See you there!

L’article Kubernetes Networking by Using Cilium – Intermediate Level – Traditional Linux Routing est apparu en premier sur dbi Blog.

Physical Online Migration to ExaCC with Oracle Zero Downtime Migration (ZDM)

Sun, 2024-02-25 14:06

A while ago I had been testing and blogging about ZDM, see my previous articles. And I finally had the chance to implement it at one of our customer to migrate on-premises database to Exadata Cloud @Customer. After having been implementing Logical Offline migration with ZDM, see my previous article, https://www.dbi-services.com/blog/logical-offline-migration-to-exacc-with-oracle-zero-downtime-migration-zdm/, I had the opportunity to implement a Physical Online Migration and testing it with one database taken as pilot. In this article I would like to share with you my experience I could get from migrating an on-premises database to ExaCC using ZDM Physical Online Migration. This method will use Data Guard and we can then only use it to migrate Oracle Enterprise Edition databases. We call it Physical Online, because ZDM will create a Standby database either through a backup or with Direct Data Transfer (active duplication with rman or restore from service) and synchronise it with the primary. During all the preparation the database is still available for the application, and the maintenance windows will be shorter with less downtime. It will just be needed for the switchover operation. Of course ZDM can include non-cdb to pdb conversion which will make it a little bit longer. The Physical Online is the only ZDM method including fallback. We intended to use this method at customer side, as mandatory one for Large Oracle EE databases, and preferred one for Small Oracle EE databases.

Read more: Physical Online Migration to ExaCC with Oracle Zero Downtime Migration (ZDM) Introduction

The on-premise databases are single-tenant (non-cdb) database running for all version 19.10.

The target databases are Oracle RAC databases running on ExaCC with Oracle version 19.21.

The Oracle Net port used on the on-premise site is 13000 and the Oracle Net port used on the ExaCC is 1521.

We will use ZDM to migrate the on-premise single-tenant database, to a PDB within a CDB. ZDM will then be in charge of migrating the database to the exacc using Data Guard, run datapatch, convert non-cdb database to pdb within a target cdb, upgrade Time Zone. The creation of the standby database will be done through a direct connection. Without any backup.

Of course I have anonymised all outputs to remove customer infrastructure names. So let’s take following convention.

ExaCC Cluster 01 node 01 : ExaCC-cl01n1
ExaCC Cluster 01 node 02 : ExaCC-cl01n2
On premises Source Host : vmonpr
Target db_unique_name on the ExaCC : ONPR_RZ2
Database Name to migrate : ONPR
ZDM Host : zdmhost
ZDM user : zdmuser
Domain : domain.com
ExaCC PDB to migrate to : ONPRZ_APP_001T

We will then migrate on-premise Single-Tenant database, named ONPR, to a PDB on the ExaCC. The PDB will be named ONPRZ_APP_001T.

Ports

It is important to mention that following ports are needed:

SourceDestinationPort ZDM HostOn-premise Host22 ZDM HostExaCC VM (both nodes)22 On-premise HostExaCC VM (scan listener and vip)Oracle Net (1521) ExaCCOn-premise HostOracle Net

If Oracle Net ports are for example not opened from the Exacc to the on-premise host, the migration evaluation will immediately stopped at one of the first steps named ZDM_PRECHECKS_TGT, and following errors will be found in the log file:

PRGZ-1132 : -eval failed for the phase ZDM_PRECHECKS_TGT with exception
PRGZ-3176 : a database connection cannot be established from target node ExaCC-cl01n1 to source node vmonpr
PRCC-1021 : One or more of the submitted commands did not execute successfully.
PRCZ-2103 : Failed to execute command "/u02/app/oracle/product/19.0.0.0/dbhome_2/bin/tnsping" on node "ExaCC-cl01n1" as user "root". Detailed error:
TNS Ping Utility for Linux: Version 19.0.0.0.0 - Production on 06-FEB-2024 10:06:37

Copyright (c) 1997, 2023, Oracle.  All rights reserved.

Used parameter files:

Used HOSTNAME adapter to resolve the alias
Attempting to contact (DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=))(ADDRESS=(PROTOCOL=tcp)(HOST=X.X.X.15)(PORT=13000)))
TNS-12535: TNS:operation timed out
PRCC-1025 : Command submitted on node ExaCC-cl01n2 timed out after 60 seconds..

If Oracle Net ports are for example not opened from the on-premise host to ExaCC, the migration evaluation will immediately stopped at one of the other steps named ZDM_PRECHECKS_SRC, and following errors will be found in the log file:

PRGZ-1132 : -eval failed for the phase ZDM_PRECHECKS_SRC with exception
PRGZ-3130 : failed to establish connection to target listener from nodes [vmonpr]
PRCC-1021 : One or more of the submitted commands did not execute successfully.
PRCC-1025 : Command submitted on node vmonpr timed out after 15 seconds...

Requirements… To be known before starting

There are few requirements that are needed.

ZDM Host
  • SSH connection allowed between Source and Target host
  • SSH authentication key pairs without passphrase should be established and tested for the user between ZDM host and both source and target database
Source Database
  • Both source and target need to be on the same release version.
  • Transparent Data Encryption (TDE) wallet must be configured (even if source database is not encrypted)
  • WALLET_TYPE should be set to either AUTOLOGIN or PASSWORD
  • Wallet STATUS should be OPEN
  • Wallet should be opened on all PDB in case the source is a container Database
  • The master key must be set for all the PDB and the container database
  • In case the source database is a RAC database, SNAPSHOT CONTROLFILE must be configured on a shared location on all cluster nodes
  • SCAN listener/listener connections allowed on both source and target DB
  • DB_UNIQUE_NAME parameter must be different than target database
  • SYSPASSWORD must be the same on the source and target database
Target Database
  • Database must be created prior the migration
  • Database release version should match source version.
  • The target database patch level should also be the same or higher than the source database. In case the target database patch level is higher, ZDM can be configured to run datapatch on the target database. Target database patch level can not be lower than source database.
  • For Oracle RAC databases, SSH connectivity between nodes for the oracle user should be setup
  • Storage size should be sufficient (same as source database)
  • DB_NAME parameter must be the same than the source database
  • DB_UNIQUE_NAME parameter must be different than the one on the source database
  • Automatic backups should be disabled (for ExaC@C section configure backups, option backup destination, none should be selected)
  • TDE should be activated
  • Wallet should be open and WALLET_TYPE should be set to either AUTOLOGIN or PASSWORD
  • SYSPASSWORD must be the same on the source and target database
Others
  • Ensure that all ports have been opened.
  • Oracle NET Services should be configured and tested on both source and target database for Data Guard synchronisation and deployment of the standby database with active duplication
  • We will need to go through a temporary multitenant database on the ExaCC which will have same DB_NAME than the source and different DB_UNIQUE_NAME. This CDB will host the final PDB.
  • The final PDB can then be relocated to the appropriate final CDB on the ExaCC laster on.
  • ZDM will create its own temporary database with DB_NAME as source database and DB_UNIQUE_NAME as final PDB name to build the Data Guard and will remove it during cleanup phase
Prepare ZDM Physical Online Response file

We will prepare the ZDM response file that will be used, by copying the template provided by ZDM:

[zdmuser@zdmhost migration]$ cp -p /u01/app/oracle/product/zdm/rhp/zdm/template/zdm_template.rsp ./zdm_ONPR_physical_online.rsp

The main parameters to take care of are :

ParameterExplanation TGT_DB_UNIQUE_NAMETarget database DB_UNIQUE_NAME.
For Cloud type Exadata Cloud at Customer (EXACC) Gen2, Exadata Cloud Service (EXACS)
• db_name – The target database db_name should be the same as the source database db_name
• db_unique_name: The target database db_unique_name parameter value must be unique to ensure that Oracle Data Guard can identify the target as a different database from the source database MIGRATION_METHODSpecifies if the migration will uses Oracle Data Guard (online) or backup and restore (offline). We are here using online migration so parameter will need to be setup with ONLINE_PHYSICAL value. DATA_TRANSFER_MEDIUMSpecifies the media used to create the standby database either through a backup using NFS or ZDLRA for example or a direct connection where the standby will be instantiated directly from source using SQL*Net connectivity (duplicate from active or restore from service).
Choose DIRECT as we are doing Physical Direct Data Transfer PLATFORM_TYPETarget Platform Type.
To be EXACC in our case. SRC_PDB_NAMESource database PDB Name.
Not needed here as all our on-premises database are Single Tenant. SRC_DB_LISTENER_PORTTo be used when there is Standalone Database (no Grid Infrastructure) configured with non-default SCAN listener port other than 1521.
To be 13000 in our case. NONCDBTOPDB_CONVERSIONSpecifies to convert a non-CDB source to PDB.
To be TRUE as we wan to convert our on-premises database to PDB during ZDM migration. NONCDBTOPDB_SWITCHOVERFor a physical migration using Data Guard switchover, indicates whether the switchover operations will be executed during a migration job with non-CDB to PDB conversion enabled.
Default is TRUE, to be kept as TRUE in our case. SKIP_FALLBACKIf setup to FALSE, the redo logs will be shipped from the new primary (ExaCC) once the switchover is completed, to the standby (on-premise database) in case of fallback is needed.
To be FALSE as we want fallback. TGT_RETAIN_DB_UNIQUE_NAMEAllow to add a new phase ZDM_RETAIN_DBUNIQUENAME_TGT (can also be ZDM_MODIFY_DBUNIQUENAME_TGT). Need to pause before this phase to keep the ZDM temporary database after the PDB conversion and in case fallback is needed. Resume the job once all is ok and the fallback will need to be removed. TGT_SKIP_DATAPATCHIf set to FALSE ZDM will run datapatch on the target database as part of the post-migration tasks. Useful in case Target patch is in a higher version than source patch.
To be FALSE as we target version is higher than source and we want ZDM to run datapatch. SHUTDOWN_SRCSpecifies to shutdown or not the source database after the migration completes.
To be FALSE. SRC_RMAN_CHANNELSNumber of RMAN channel on the source TGT_RMAN_CHANNELSNumber of RMAN channel on the destination ZDM_SKIP_DG_CONFIG_CLEANUPIf FALSE ZDM will deconfigure DataGuard parameters configured for migration on the source and target database at the end of the migration. ZDM_RMAN_DIRECT_METHODRMAN method to use for ONLINE_PHYISCAL direct data transfer, either using RMAN active duplicate or restore from service.
We kept default RESTORE FROM SERVICE. ZDM_USE_DG_BROKERIf TRUE ZDM will use Data Guard Broker for managing Data Guard configuration.
To be TRUE. ZDM_NONCDBTOPDB_PDB_NAMEWhen migrating non-CDB source to CDB target as a PDB, the PDB name to be used. ZDM_TGT_UPGRADE_TIMEZONEUpgrade target database time zone. Will required downtime for the database
To be TRUE ZDM_APPLY_LAG_MONITORING_INTERVALApply lag monitoring interval to verify both source and target for switchover ready.
Keep NONE.

Note that there is no parameter for the listener port on the target (ExaCC) so assuming this is hard coded to use default 1521 port.

Also note that as we configured SHUTDOWN_SRC as FALSE, additionnal steps will be required to ensure that application do not use the SOURCE (on-premise) database any more.

Updated ZDM response file compared to ZDM template for the migration we are going to run:

[zdmuser@zdmhost migration]$ diff zdm_ONPR_physical_online.rsp /u01/app/oracle/product/zdm/rhp/zdm/template/zdm_template.rsp
24c24
< TGT_DB_UNIQUE_NAME=ONPR_RZ2
---
> TGT_DB_UNIQUE_NAME=
32c32
< MIGRATION_METHOD=ONLINE_PHYSICAL
---
> MIGRATION_METHOD=
63c63
< DATA_TRANSFER_MEDIUM=DIRECT
---
> DATA_TRANSFER_MEDIUM=
75c75
< PLATFORM_TYPE=EXACC
---
> PLATFORM_TYPE=
119c119
< SRC_DB_LISTENER_PORT=13000
---
> SRC_DB_LISTENER_PORT=
230c230
< NONCDBTOPDB_CONVERSION=TRUE
---
> NONCDBTOPDB_CONVERSION=FALSE
252c252
< SKIP_FALLBACK=FALSE
---
> SKIP_FALLBACK=
268c268
< TGT_RETAIN_DB_UNIQUE_NAME=TRUE
---
> TGT_RETAIN_DB_UNIQUE_NAME=
312c312
< SHUTDOWN_SRC=FALSE
---
> SHUTDOWN_SRC=
333c333
< SRC_RMAN_CHANNELS=3
---
> SRC_RMAN_CHANNELS=
340c340
< TGT_RMAN_CHANNELS=6
---
> TGT_RMAN_CHANNELS=
526c526
< ZDM_USE_DG_BROKER=TRUE
---
> ZDM_USE_DG_BROKER=
574c574
< ZDM_NONCDBTOPDB_PDB_NAME=ONPRZ_APP_001T
---
> ZDM_NONCDBTOPDB_PDB_NAME=
595c595
< ZDM_TGT_UPGRADE_TIMEZONE=TRUE
---
> ZDM_TGT_UPGRADE_TIMEZONE=FALSE

ZDM Build Version

We are using ZDM build version 21.4:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli -build
version: 21.0.0.0.0
full version: 21.4.0.0.0
patch version: 21.4.1.0.0
label date: 221207.25
ZDM kit build date: Jul 31 2023 14:24:25 UTC
CPAT build version: 23.7.0

Passwordless Login

Passwordless Login needs to be configured between ZDM Host, the Source Host and Target Host. See my previous blog : https://www.dbi-services.com/blog/oracle-zdm-migration-java-security-invalidkeyexception-invalid-key-format/

If Passwordless Login is not configured with one node, you will see such error in the log file during migration evaluation:

PRCZ-2006 : Unable to establish SSH connection to node "ExaCC-cl01n2" to execute command "/u02/app/oracle/product/19.0.0.0/dbhome_2/bin/tnsping vmonpr:13000"
Creation of the target database

As explained in the requirements, we must create a target CDB on the ExaCC with same DB_NAME as the source database to be migrated but other DB_UNIQUE_NAME. In our case it will be ONPR for the DB_NAME and ONPR_RZ2 for the DB_UNIQUE_NAME. This database must exist before the migration is started with ZDM. ZDM will create another temporary database taking the final PDB name and will use this target CDB as a template.

TDE (Transparent Data Encryption) configuration

The source database doesn’t need to be encrypted. The target database will be encrypted in any case. ZDM supports the migration of an encrypted and non-encrypted source database. The target database encryption will be taken in account during migration process. Even if the source database is not encrypted a TDE wallet still needs to be configured prior the migration as ZDM will use it to encrypt data to the target.

We need to note that a downtime is needed to reboot the database when wallet_root parameter needs to be configured.

Also until the migration is completed it is more than recommended that the wallet is part of the backup strategy.

Configure instance parameter

Check that the WALLET directory exits otherwise create it:

SQL> !ls /u00/app/oracle/admin/ONPR/wallet
ls: cannot access /u00/app/oracle/admin/ONPR/wallet: No such file or directory

SQL> !mkdir /u00/app/oracle/admin/ONPR/wallet

Configure instance parameter for the database wallet and restart the database:

SQL> alter system set WALLET_ROOT='/u00/app/oracle/admin/ONPR/wallet' scope=spfile;

SQL> shutdown immediate

SQL> startup

Check the wallet. No WRL_PARAMETER should be displayed. WALLET_TYPE should be unknown and STATUS not_available.

SQL> select WRL_PARAMETER, WRL_TYPE,WALLET_TYPE, status from V$ENCRYPTION_WALLET;

WRL_PARAMETER                  WRL_TYPE             WALLET_TYPE          STATUS
------------------------------ -------------------- -------------------- ---------------------------
                               FILE                 UNKNOWN              NOT_AVAILABLE

Configure TDE:

SQL> alter system set tde_configuration='keystore_configuration=FILE' scope=both;

System altered.

Check the Wallet. WRL_PARAMETER should be displayed with the wallet location. WALLET_TYPE should still be unknown and STATUS not_available.

SQL> select WRL_PARAMETER, WRL_TYPE,WALLET_TYPE, status from V$ENCRYPTION_WALLET;

WRL_PARAMETER                            WRL_TYPE             WALLET_TYPE          STATUS
---------------------------------------- -------------------- -------------------- -----------------
/u00/app/oracle/admin/ONPR/wallet/tde/   FILE                 UNKNOWN              NOT_AVAILABLE

Create keystore

Create the keystore using appropriate ExaCC password. We recommend to use the same one for source and target, albeit they can be different.

SQL> ADMINISTER KEY MANAGEMENT CREATE KEYSTORE IDENTIFIED BY "*********************"

keystore altered.

We now can see that we have a wallet file in the TDE wallet directory:

SQL> !ls -ltrh /u00/app/oracle/admin/ONPR/wallet/tde/
total 4.0K
-rw-------. 1 oracle dba 2.5K Feb  9 16:11 ewallet.p12

And if we check the wallet status, we can see it is still UNKNOWN for the WALLET_TYPE, but now STATUS is set to CLOSED.

SQL> select WRL_PARAMETER, WRL_TYPE,WALLET_TYPE, status from V$ENCRYPTION_WALLET;

WRL_PARAMETER                            WRL_TYPE             WALLET_TYPE          STATUS
---------------------------------------- -------------------- -------------------- ------------------------------
/u00/app/oracle/admin/ONPR/wallet/tde/   FILE                 UNKNOWN              CLOSED

Open the keystore

The keystore can now be opened.

SQL> ADMINISTER KEY MANAGEMENT SET KEYSTORE OPEN IDENTIFIED BY "***************";

keystore altered.

And the wallet type is now set to PASSWORD and status is OPEN_NO_MASTER_KEY.

SQL> select WRL_PARAMETER, WRL_TYPE,WALLET_TYPE, status from V$ENCRYPTION_WALLET;

WRL_PARAMETER                            WRL_TYPE             WALLET_TYPE          STATUS
---------------------------------------- -------------------- -------------------- ------------------------------
/u00/app/oracle/admin/ONPR/wallet/tde/   FILE                 PASSWORD             OPEN_NO_MASTER_KEY

Create and activate the master encryption key

Using same password we can now create and activate the master encryption key using backup option. This will set the database encryption key into the wallet.

SQL> ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY "*************" with backup;

keystore altered.

If you are running version 19.10, as we are here, you will face following error:

ORA-28374: typed master key not found in wallet

This is related to following bug:

Bug 31500699 – ORA-28374: typed master key not found in wallet after tablespace TDE Enabled ( Doc ID 31500699.8 )

This is not an issue, we can move forward as the master encryption key has been anyhow created and added in the wallet. The only problem would be that it will impossible to encrypt any data. We do not care as we are not encrypting the source on-premise database and we should not, as we are not licensed with Oracle Advanced Security.

We now have a new wallet and a backup one:

SQL> !ls -ltrh /u00/app/oracle/admin/ONPR/wallet/tde/
total 8.0K
-rw-------. 1 oracle dba 2.5K Feb  9 16:16 ewallet_2024020915161059.p12
-rw-------. 1 oracle dba 4.0K Feb  9 16:16 ewallet.p12

Set autologin Wallet

We will change the wallet type from password to autologin using the same password, in order for the wallet to be opened automatically.

SQL> ADMINISTER KEY MANAGEMENT CREATE AUTO_LOGIN KEYSTORE FROM KEYSTORE '/u00/app/oracle/admin/ONPR/wallet/tde/' IDENTIFIED BY "************";

keystore altered.

SQL> ADMINISTER KEY MANAGEMENT SET KEYSTORE CLOSE IDENTIFIED BY "******************";

keystore altered.

And we can check that all has been configured appropriately for the wallet:

SQL> select WRL_PARAMETER, WRL_TYPE,WALLET_TYPE, status from V$ENCRYPTION_WALLET;

WRL_PARAMETER                            WRL_TYPE             WALLET_TYPE          STATUS
---------------------------------------- -------------------- -------------------- ------------------------------
/u00/app/oracle/admin/ONPR/wallet/tde/   FILE                 AUTOLOGIN            OPEN

And we can see that we have now an autologin cwallet.sso file:

SQL> !ls -ltrh /u00/app/oracle/admin/ONPR/wallet/tde/
total 14K
-rw-------. 1 oracle dba 4.0K Feb  9 17:07 ewallet.p12
-rw-------. 1 oracle dba 5.7K Feb  9 17:07 cwallet.sso
-rw-------. 1 oracle dba 2.5K Feb  9 16:16 ewallet_2024020915161059.p12

Change tablespace_encryption instance parameter to decrypt_only

This is mandatory not to encrypt any new tablespace created on the source database once TDE is configured, otherwise Oracle Advanced Security License will be activated. Also after ZDM switchover steps will be completed, the primary database will be running on the ExaCC, and tablespaces will be encrypted on the ExaCC. The generated redo logs that need to be applied on the standby database running on on-premise will be encrypted as well. Knowing we are not licensed with Oracle Advanced Security on on-premise, we also need to have this parameter set to decrypt_only to be able to decrypt the redo before applying them on the source database.

Unfortunately this parameter came only with Oracle 19.16 version. So if you are running an older version on the source on-premise database, as we are, you do not have the possibility to use this parameter.

This means that we/you will need to:

  • Deactivate fallback possibility. We will only be able, in our situation, to use ZDM to migrate the database but without any fallback possibilities. Bad situation…
  • We will need to ensure the parameter ENCRYPT_NEW_TABLESPACES is set to DDL and ensure no ENCRYPTION clause is specified in the statement for any new created tablespace
SQL> show parameter encry

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
encrypt_new_tablespaces              string      CLOUD_ONLY
tablespace_encryption                string      MANUAL_ENABLE

SQL> alter system set tablespace_encryption='decrypt_only' scope=spfile;

System altered.

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup
ORACLE instance started.

Total System Global Area 3.7581E+10 bytes
Fixed Size                 23061704 bytes
Variable Size            5100273664 bytes
Database Buffers         3.2346E+10 bytes
Redo Buffers              111153152 bytes
Database mounted.
Database opened.

SQL> show parameter encry

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
encrypt_new_tablespaces              string      DDL
tablespace_encryption                string      DECRYPT_ONLY

Update SYS user password on the on-premise database

Both source and target SYS users password should match.

Update the source one with the same ExaCC one you are using:

SQL> alter user sys identified by "********";

Update source listener.ora with static entry

If the on-premise source database is not running on oracle restart (grid infra), you will have to add a static entry for DGMGRL service on the appropriate listener. Unfortunately ZDM does not do it.

SID_LIST_<listener_name> =
  (SID_LIST =
    (SID_DESC = 
      (GLOBAL_DBNAME = <dbname> _DGMGRL.<domain>) 
      (ORACLE_HOME = <ORACLE_HOME>) 
      (SID_NAME = <SID>)
    )
  )

If you do not do so and you will resume the migration, you will have the ZDM switchover steps, ZDM_SWITCHOVER_SRC, failing with following error:

PRGZ-3605 : Oracle Data Guard Broker switchover to database "ONPRZ_APP_001T" on database "ONPR" failed.
ONPRZ_APP_001T
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Thu Feb 22 09:46:57 2024
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected to "ONPR"
Connected as SYSDG.
DGMGRL> Performing switchover NOW, please wait...
Operation requires a connection to database "onprz_app_001t"
Connecting ...
Connected to "ONPRZ_APP_001T"
Connected as SYSDBA.
New primary database "onprz_app_001t" is opening...
Operation requires start up of instance "ONPR" on database "onpr"
Starting instance "ONPR"...
Unable to connect to database using (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=vmONPR.domain.com)(Port=13000))(CONNECT_DATA=(SERVICE_NAME=ONPR_DGMGRL.domain.com)(INSTANCE_NAME=ONPR)(SERVER=DEDICATED)))
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Archived log backup on the source

You will have to ensure that the source database archived log deletion policy is set appropriately and ensure not to remove any archived log that would not be applied on the standby. This all to ensure no source archived log is missing for Data Guard.

Convert target database to single instance

I have converted the target database on the ExaCC used during ZDM migration (the one taken as template by ZDM and where the final PDB will be hosted), from RAC to single instance. And this for 2 reasons:

  • The first one, as we will see later, ZDM will create the standby database on a new instance, using PDB final name as ORACLE_SID. And this temporary database is any how single instance
  • If the target database is RAC, ZDM will create a second UNDO tablespace in the single instance source database. I do not want to make any change in the source database. Also as I’m running version 19.10 on the source, the UNDO will be encrypted and more over I will face bug 31500699 and ZDM migration will fail in error.
Update cluster_database instance parameter
oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] sqh

SQL> show parameter cluster_database

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     TRUE
cluster_database_instances           integer     2

SQL> set line 300
SQL> col name for a30
SQL> col value for a30
SQL> select inst_id, name, value from gv$parameter where lower(name)='cluster_database';

   INST_ID NAME                           VALUE
---------- ------------------------------ ------------------------------
         2 cluster_database               TRUE
         1 cluster_database               TRUE

SQL> alter system set cluster_database=FALSE scope=spfile sid='*';

System altered.

Stop cluster database
oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl status database -d ONPR_RZ2
Instance ONPR1 is running on node ExaCC-cl01n1
Instance ONPR2 is running on node ExaCC-cl01n2

oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl stop database -d ONPR_RZ2

oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl status database -d ONPR_RZ2
Instance ONPR1 is not running on node ExaCC-cl01n1
Instance ONPR2 is not running on node ExaCC-cl01n2

Change grid infrastructure configuration

We will remove second instance.

oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl config database -d ONPR_RZ2
Database unique name: ONPR_RZ2
Database name: ONPR
Oracle home: /u02/app/oracle/product/19.0.0.0/dbhome_2
Oracle user: oracle
Spfile: +DATAC1/ONPR_RZ2/PARAMETERFILE/spfile.634.1160214211
Password file: +DATAC1/ONPR_RZ2/PASSWORD/pwdonpr_rz2.562.1160213439
Domain: domain.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATAC1
Mount point paths: /acfs01
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: racoper
Database instances: ONPR1,ONPR2
Configured nodes: ExaCC-cl01n1,ExaCC-cl01n2
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services:
Database is administrator managed
oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)]

oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl remove instance -d ONPR_RZ2 -i ONPR2
Remove instance from the database ONPR_RZ2? (y/[n]) y
oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)]

oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl config database -d ONPR_RZ2
Database unique name: ONPR_RZ2
Database name: ONPR
Oracle home: /u02/app/oracle/product/19.0.0.0/dbhome_2
Oracle user: oracle
Spfile: +DATAC1/ONPR_RZ2/PARAMETERFILE/spfile.634.1160214211
Password file: +DATAC1/ONPR_RZ2/PASSWORD/pwdonpr_rz2.562.1160213439
Domain: domain.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATAC1
Mount point paths: /acfs01
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: racoper
Database instances: ONPR1
Configured nodes: ExaCC-cl01n1
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services:
Database is administrator managed

Start target database ONPR on ExaCC
oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl start database -d ONPR_RZ2

oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] srvctl status database -d ONPR_RZ2
Instance ONPR1 is running on node ExaCC-cl01n1

As we can see, only one instance is running. This can also be double checked with the instance parameter.

oracle@ExaCC-cl01n1:~/ [ONPR1 (CDB$ROOT)] sqh

SQL> set lines 300 pages 500
SQL> col name for a30
SQL> col value for a30
SQL> select inst_id, name, value from gv$parameter where lower(name)='cluster_database';

   INST_ID NAME                           VALUE
---------- ------------------------------ ------------------------------
         1 cluster_database               FALSE

Evaluating ZDM Migration

We are now all ready to evaluate ZDM Migration. We will first run zdmcli with the -eval option to evaluate the migration and test if all is ok.

We need to provide some arguments :

ArgumentValue -sourcesidDatabase Name of the source database in case the source database is a single instance deployed on a non Grid Infrastructure environment -rspZDM response file -sourcenodeSource host -srcauth with 3 sub-arguments:
-srcarg1
-srcarg2
-srcarg3 Name of the source authentication plug-in with 3 sub-arguments:
1st argument: user. Should be oracle
2nd argument: ZDM private RSA Key
3rd argument: sudo location -targetnodeTarget host -tgtauth with 3 sub-arguments:
-tgtarg1
-tgtarg2
-tgtarg3 Name of the target authentication plug-in with 3 sub-arguments:
1st argument: user. Should be opc
2nd argument: ZDM private RSA Key
3rd argument: sudo location -tdekeystorepasswdSource database TDE keystore password -tgttdekeystorepasswdTarget container database TDE keystore password

All steps done for evaluation have been completed successfully:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_physical_online.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -tdekeystorepasswd -tgttdekeystorepasswd -eval
zdmhost.domain.com: Audit ID: 428
Enter source database ONPR SYS password:
Enter source database ONPR TDE keystore password:
Enter target container database TDE keystore password:
zdmhost: 2024-02-14T13:18:19.773Z : Processing response file ...
Operation "zdmcli migrate database" scheduled with the job ID "39".

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli query job -jobid 39
zdmhost.domain.com: Audit ID: 434
Job ID: 39
User: zdmuser
Client: zdmhost
Job Type: "EVAL"
Scheduled job command: "zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_physical_online.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -tdekeystorepasswd -tgttdekeystorepasswd -eval"
Scheduled job execution start time: 2024-02-14T14:18:19+01. Equivalent local time: 2024-02-14 14:18:19
Current status: SUCCEEDED
Result file path: "/u01/app/oracle/chkbase/scheduled/job-39-2024-02-14-14:18:29.log"
Metrics file path: "/u01/app/oracle/chkbase/scheduled/job-39-2024-02-14-14:18:29.json"
Job execution start time: 2024-02-14 14:18:29
Job execution end time: 2024-02-14 14:21:18
Job execution elapsed time: 2 minutes 48 seconds
ZDM_GET_SRC_INFO ........... PRECHECK_PASSED
ZDM_GET_TGT_INFO ........... PRECHECK_PASSED
ZDM_PRECHECKS_SRC .......... PRECHECK_PASSED
ZDM_PRECHECKS_TGT .......... PRECHECK_PASSED
ZDM_SETUP_SRC .............. PRECHECK_PASSED
ZDM_SETUP_TGT .............. PRECHECK_PASSED
ZDM_PREUSERACTIONS ......... PRECHECK_PASSED
ZDM_PREUSERACTIONS_TGT ..... PRECHECK_PASSED
ZDM_VALIDATE_SRC ........... PRECHECK_PASSED
ZDM_VALIDATE_TGT ........... PRECHECK_PASSED
ZDM_POSTUSERACTIONS ........ PRECHECK_PASSED
ZDM_POSTUSERACTIONS_TGT .... PRECHECK_PASSED
ZDM_CLEANUP_SRC ............ PRECHECK_PASSED
ZDM_CLEANUP_TGT ............ PRECHECK_PASSED

Run the Migration with ZDM

We will run the migration adding a pause after the ZDM steps ZDM_CONFIGURE_DG_SRC. So ZDM will prepare all the environment (setting the environment, creating standby and configuring Data Guard). All this steps can be done without any downtime.

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_physical_online.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -tdekeystorepasswd -tgttdekeystorepasswd -pauseafter ZDM_CONFIGURE_DG_SRC
zdmhost.domain.com: Audit ID: 543
Enter source database ONPR SYS password:
Enter source database ONPR TDE keystore password:
Enter target container database TDE keystore password:
zdmhost: 2024-02-22T09:27:17.864Z : Processing response file ...
Operation "zdmcli migrate database" scheduled with the job ID "44".
[zdmuser@zdmhost migration]$

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli query job -jobid 44
zdmhost.domain.com: Audit ID: 551
Job ID: 44
User: zdmuser
Client: zdmhost
Job Type: "MIGRATE"
Scheduled job command: "zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_physical_online.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -tdekeystorepasswd -tgttdekeystorepasswd -pauseafter ZDM_CONFIGURE_DG_SRC"
Scheduled job execution start time: 2024-02-22T10:27:17+01. Equivalent local time: 2024-02-22 10:27:17
Current status: PAUSED
Current Phase: "ZDM_CONFIGURE_DG_SRC"
Result file path: "/u01/app/oracle/chkbase/scheduled/job-44-2024-02-22-10:27:27.log"
Metrics file path: "/u01/app/oracle/chkbase/scheduled/job-44-2024-02-22-10:27:27.json"
Job execution start time: 2024-02-22 10:27:27
Job execution end time: 2024-02-22 10:39:38
Job execution elapsed time: 12 minutes 11 seconds
ZDM_GET_SRC_INFO ................ COMPLETED
ZDM_GET_TGT_INFO ................ COMPLETED
ZDM_PRECHECKS_SRC ............... COMPLETED
ZDM_PRECHECKS_TGT ............... COMPLETED
ZDM_SETUP_SRC ................... COMPLETED
ZDM_SETUP_TGT ................... COMPLETED
ZDM_PREUSERACTIONS .............. COMPLETED
ZDM_PREUSERACTIONS_TGT .......... COMPLETED
ZDM_VALIDATE_SRC ................ COMPLETED
ZDM_VALIDATE_TGT ................ COMPLETED
ZDM_DISCOVER_SRC ................ COMPLETED
ZDM_COPYFILES ................... COMPLETED
ZDM_PREPARE_TGT ................. COMPLETED
ZDM_SETUP_TDE_TGT ............... COMPLETED
ZDM_RESTORE_TGT ................. COMPLETED
ZDM_RECOVER_TGT ................. COMPLETED
ZDM_FINALIZE_TGT ................ COMPLETED
ZDM_CONFIGURE_DG_SRC ............ COMPLETED
ZDM_SWITCHOVER_SRC .............. PENDING
ZDM_SWITCHOVER_TGT .............. PENDING
ZDM_POST_DATABASE_OPEN_TGT ...... PENDING
ZDM_DATAPATCH_TGT ............... PENDING
ZDM_MODIFY_DBUNIQUENAME_TGT ..... PENDING
ZDM_NONCDBTOPDB_PRECHECK ........ PENDING
ZDM_NONCDBTOPDB_CONVERSION ...... PENDING
ZDM_POST_MIGRATE_TGT ............ PENDING
TIMEZONE_UPGRADE_PREPARE_TGT .... PENDING
TIMEZONE_UPGRADE_TGT ............ PENDING
ZDM_POSTUSERACTIONS ............. PENDING
ZDM_POSTUSERACTIONS_TGT ......... PENDING
ZDM_CLEANUP_SRC ................. PENDING
ZDM_CLEANUP_TGT ................. PENDING

Pause After Phase: "ZDM_CONFIGURE_DG_SRC"

We can see that all steps have been completed successfully, and ZDM has paused the migration after the Data Guard has been configured.

I can review full ZDM log:

[zdmuser@zdmhost ~]$ cat /u01/app/oracle/chkbase/scheduled/job-44-2024-02-22-10:27:27.log
zdmhost: 2024-02-22T09:27:27.341Z : Processing response file ...
zdmhost: 2024-02-22T09:27:27.345Z : Processing response file ...
zdmhost: 2024-02-22T09:27:32.418Z : Starting zero downtime migrate operation ...
zdmhost: 2024-02-22T09:27:34.498Z : Executing phase ZDM_GET_SRC_INFO
zdmhost: 2024-02-22T09:27:34.498Z : Retrieving information from source node "vmonpr" ...
zdmhost: 2024-02-22T09:27:34.498Z : retrieving information about database "ONPR" ...
zdmhost: 2024-02-22T09:27:38.793Z : Execution of phase ZDM_GET_SRC_INFO completed
zdmhost: 2024-02-22T09:27:38.819Z : Executing phase ZDM_GET_TGT_INFO
zdmhost: 2024-02-22T09:27:38.819Z : Retrieving information from target node "ExaCC-cl01n1" ...
zdmhost: 2024-02-22T09:27:46.173Z : Determined value for parameter TGT_DATADG is '+DATAC1'
zdmhost: 2024-02-22T09:27:46.173Z : Determined value for parameter TGT_REDODG is '+DATAC1'
zdmhost: 2024-02-22T09:27:46.173Z : Determined value for parameter TGT_RECODG is '+RECOC1'
zdmhost: 2024-02-22T09:27:46.284Z : Execution of phase ZDM_GET_TGT_INFO completed
zdmhost: 2024-02-22T09:27:46.821Z : Executing phase ZDM_PRECHECKS_SRC
zdmhost: 2024-02-22T09:27:46.821Z : Execution of phase ZDM_PRECHECKS_SRC completed
zdmhost: 2024-02-22T09:27:47.080Z : Executing phase ZDM_PRECHECKS_TGT
zdmhost: 2024-02-22T09:27:47.080Z : Execution of phase ZDM_PRECHECKS_TGT completed
zdmhost: 2024-02-22T09:27:47.118Z : Executing phase ZDM_SETUP_SRC
zdmhost: 2024-02-22T09:27:47.118Z : Setting up ZDM on the source node vmonpr ...
vmonpr: 2024-02-22T09:28:49.592Z : TNS aliases successfully setup on the source node vmonpr...
zdmhost: 2024-02-22T09:28:49.694Z : Execution of phase ZDM_SETUP_SRC completed
####################################################################
zdmhost: 2024-02-22T09:28:49.730Z : Executing phase ZDM_SETUP_TGT
zdmhost: 2024-02-22T09:28:49.730Z : Setting up ZDM on the target node ExaCC-cl01n1 ...
ExaCC-cl01n1: 2024-02-22T09:29:12.976Z : TNS aliases successfully setup on the target node ExaCC-cl01n1...
zdmhost: 2024-02-22T09:29:12.979Z : Execution of phase ZDM_SETUP_TGT completed
####################################################################
zdmhost: 2024-02-22T09:29:13.023Z : Executing phase ZDM_VALIDATE_SRC
zdmhost: 2024-02-22T09:29:13.024Z : Validating source environment on node vmonpr ...
vmonpr: 2024-02-22T09:29:23.649Z : Validating SYS account password specified..
vmonpr: 2024-02-22T09:29:34.470Z : Validating source environment...
vmonpr: 2024-02-22T09:29:34.470Z : Ensuring source database is running in ARCHIVELOG mode...
vmonpr: 2024-02-22T09:29:34.871Z : Validating Oracle TDE setup
vmonpr: 2024-02-22T09:29:37.474Z : Validating Oracle Password file
vmonpr: 2024-02-22T09:29:38.476Z : Validating database ONPR role is PRIMARY...
vmonpr: 2024-02-22T09:29:38.478Z : Source environment validated successfully
zdmhost: 2024-02-22T09:29:38.487Z : Execution of phase ZDM_VALIDATE_SRC completed
####################################################################
zdmhost: 2024-02-22T09:29:38.521Z : Executing phase ZDM_VALIDATE_TGT
zdmhost: 2024-02-22T09:29:38.521Z : Validating target environment on node ExaCC-cl01n1 ...
zdmhost: 2024-02-22T09:29:38.573Z : Source database timezone file version 32 is less than target database timezone file version 42. Timezone upgrade operation will be performed on target database after completion of database migration.
ExaCC-cl01n1: 2024-02-22T09:29:50.315Z : Validating specified Oracle ASM storage locations...
ExaCC-cl01n1: 2024-02-22T09:29:54.219Z : validating target database size allocation...
ExaCC-cl01n1: 2024-02-22T09:29:56.922Z : Verifying SQL*Net connectivity to source database ...
ExaCC-cl01n1: 2024-02-22T09:29:57.223Z : verifying passwordless connectivity between target nodes
ExaCC-cl01n1: 2024-02-22T09:29:58.425Z : Target environment validated successfully
zdmhost: 2024-02-22T09:29:58.433Z : Execution of phase ZDM_VALIDATE_TGT completed
####################################################################
zdmhost: 2024-02-22T09:29:58.455Z : Executing phase ZDM_DISCOVER_SRC
zdmhost: 2024-02-22T09:29:58.455Z : Setting up the source node vmonpr for creating standby on the target node ExaCC-cl01n1 ...
vmonpr: 2024-02-22T09:30:09.186Z : Enabling force logging on database ONPR...
vmonpr: 2024-02-22T09:30:09.287Z : Creating standby logs on database ONPR...
vmonpr: 2024-02-22T09:30:13.591Z : Source environment set up successfully
zdmhost: 2024-02-22T09:30:13.700Z : Execution of phase ZDM_DISCOVER_SRC completed
####################################################################
zdmhost: 2024-02-22T09:30:13.729Z : Executing phase ZDM_COPYFILES
zdmhost: 2024-02-22T09:30:13.729Z : Copying files from source node vmonpr to target node ExaCC-cl01n1 ...
vmonpr: 2024-02-22T09:30:24.849Z : Source database "ONPR" credentials exported successfully on node "vmonpr"
zdmhost: 2024-02-22T09:30:29.112Z : Execution of phase ZDM_COPYFILES completed
####################################################################
zdmhost: 2024-02-22T09:30:29.148Z : Executing phase ZDM_PREPARE_TGT
zdmhost: 2024-02-22T09:30:29.148Z : Setting up standby on the target node ExaCC-cl01n1 ...
ExaCC-cl01n1: 2024-02-22T09:31:03.106Z : Target environment set up successfully
zdmhost: 2024-02-22T09:31:03.115Z : Execution of phase ZDM_PREPARE_TGT completed
####################################################################
zdmhost: 2024-02-22T09:31:03.137Z : Executing phase ZDM_SETUP_TDE_TGT
zdmhost: 2024-02-22T09:31:03.137Z : Setting up Oracle Transparent Data Encryption (TDE) keystore on the target node ExaCC-cl01n1 ...
ExaCC-cl01n1: 2024-02-22T09:31:13.880Z : target environment Oracle Transparent Data Encryption (TDE) set up successfully
zdmhost: 2024-02-22T09:31:13.889Z : Execution of phase ZDM_SETUP_TDE_TGT completed
####################################################################
zdmhost: 2024-02-22T09:31:13.913Z : Executing phase ZDM_RESTORE_TGT
zdmhost: 2024-02-22T09:31:13.913Z : Restoring database on the target node ExaCC-cl01n1 ...
ExaCC-cl01n1: 2024-02-22T09:31:36.483Z : database ONPRZ_APP_001T dropped successfully
ExaCC-cl01n1: 2024-02-22T09:31:54.048Z : Target database "ONPRZ_APP_001T" credentials staged successfully on node "ExaCC-cl01n1"
ExaCC-cl01n1: 2024-02-22T09:32:06.392Z : Restoring SPFILE ...
ExaCC-cl01n1: 2024-02-22T09:32:45.923Z : SPFILE restored to /u02/app/oracle/product/19.0.0.0/dbhome_2/dbs/spfileONPRZ_APP_001T1.ora successfully
ExaCC-cl01n1: 2024-02-22T09:32:59.196Z : Restoring control files ...
ExaCC-cl01n1: 2024-02-22T09:33:20.313Z : Control files restored successfully
ExaCC-cl01n1: 2024-02-22T09:33:31.775Z : Restoring and encrypting data files ...
ExaCC-cl01n1: 2024-02-22T09:34:38.628Z : Data files restored and encrypted successfully
ExaCC-cl01n1: 2024-02-22T09:34:38.629Z : Cleaning up any orphaned data ...
ExaCC-cl01n1: 2024-02-22T09:34:38.730Z : Orphaned files clean up successful
ExaCC-cl01n1: 2024-02-22T09:34:39.034Z : Data files restored successfully
ExaCC-cl01n1: 2024-02-22T09:34:51.398Z : Renaming TEMP files and online redo log files ...
ExaCC-cl01n1: 2024-02-22T09:35:03.909Z : TEMP files and online redo log files renamed successfully
ExaCC-cl01n1: 2024-02-22T09:35:16.374Z : Recovering data files ...
ExaCC-cl01n1: 2024-02-22T09:35:20.080Z : Data files recovered successfully
zdmhost: 2024-02-22T09:35:20.094Z : Execution of phase ZDM_RESTORE_TGT completed
####################################################################
zdmhost: 2024-02-22T09:35:20.115Z : Executing phase ZDM_RECOVER_TGT
zdmhost: 2024-02-22T09:35:20.115Z : Recovering database on the target node ExaCC-cl01n1 ...
ExaCC-cl01n1: 2024-02-22T09:35:37.674Z : Target database "ONPRZ_APP_001T" credentials staged successfully on node "ExaCC-cl01n1"
ExaCC-cl01n1: 2024-02-22T09:35:50.018Z : Restoring control files ...
ExaCC-cl01n1: 2024-02-22T09:36:31.849Z : Running RMAN crosscheck on database "ONPRZ_APP_001T" ...
ExaCC-cl01n1: 2024-02-22T09:36:36.354Z : RMAN crosscheck on database "ONPRZ_APP_001T" ran successfully
ExaCC-cl01n1: 2024-02-22T09:36:36.354Z : Running RMAN catalog ...
ExaCC-cl01n1: 2024-02-22T09:36:38.357Z : RMAN catalog ran successfully
ExaCC-cl01n1: 2024-02-22T09:36:38.357Z : Control files restored successfully
ExaCC-cl01n1: 2024-02-22T09:36:50.818Z : Starting incremental restore of data files ...
ExaCC-cl01n1: 2024-02-22T09:36:55.122Z : Incremental restore of data files executed successfully
ExaCC-cl01n1: 2024-02-22T09:36:55.123Z : Cleaning up any orphaned data ...
ExaCC-cl01n1: 2024-02-22T09:36:55.224Z : Orphaned files clean up successful
ExaCC-cl01n1: 2024-02-22T09:36:55.327Z : Data files restored successfully
ExaCC-cl01n1: 2024-02-22T09:37:07.696Z : Renaming TEMP files and online redo log files ...
ExaCC-cl01n1: 2024-02-22T09:37:20.207Z : TEMP files and online redo log files renamed successfully
ExaCC-cl01n1: 2024-02-22T09:37:32.669Z : Recovering data files ...
ExaCC-cl01n1: 2024-02-22T09:37:36.575Z : Data files recovered successfully
zdmhost: 2024-02-22T09:37:36.590Z : Execution of phase ZDM_RECOVER_TGT completed
####################################################################
zdmhost: 2024-02-22T09:37:36.614Z : Executing phase ZDM_FINALIZE_TGT
zdmhost: 2024-02-22T09:37:36.614Z : Finalizing creation of standby database on the target node ExaCC-cl01n1 ...
ExaCC-cl01n1: 2024-02-22T09:37:47.554Z : Updating database cluster resource dependency ...
ExaCC-cl01n1: 2024-02-22T09:38:30.389Z : Creating standby redo logs on target database ONPRZ_APP_001T
ExaCC-cl01n1: 2024-02-22T09:38:30.490Z : Enabling Oracle Data Guard Broker on "ONPRZ_APP_001T" ...
ExaCC-cl01n1: 2024-02-22T09:38:33.493Z : Oracle Data Guard Broker enabled successfully on "ONPRZ_APP_001T"
ExaCC-cl01n1: 2024-02-22T09:38:33.694Z : Target database updated successfully
zdmhost: 2024-02-22T09:38:33.704Z : Execution of phase ZDM_FINALIZE_TGT completed
####################################################################
zdmhost: 2024-02-22T09:38:33.725Z : Executing phase ZDM_CONFIGURE_DG_SRC
zdmhost: 2024-02-22T09:38:33.726Z : Finalize steps done on the source node vmonpr for creating standby on the target node ExaCC-cl01n1 ...
vmonpr: 2024-02-22T09:38:44.648Z : Configuring Oracle Data Guard Broker on "ONPR" ...
vmonpr: 2024-02-22T09:39:38.289Z : Oracle Data Guard Broker configured successfully on "ONPR"
vmonpr: 2024-02-22T09:39:38.390Z : Source database updated successfully
zdmhost: 2024-02-22T09:39:38.398Z : Execution of phase ZDM_CONFIGURE_DG_SRC completed
####################################################################
zdmhost: 2024-02-22T09:39:38.403Z : Job execution paused after phase "ZDM_CONFIGURE_DG_SRC".

We can see ZDM will valide source and target, check SYS password, check and setup TDE, validate Oracle Net connections, validate ASM storage, create standby redo logs, creating standby database using direct transfer data method (the application tablespace will be encrypted), and configure Data Guard.

We can check additional and detailed logs on the source database:

oracle@vmonpr:/home/oracle/ [ONPR] cd  /u00/app/oracle/zdm/zdm_ONPR_44/zdm/log/

oracle@vmonpr:/u00/app/oracle/zdm/zdm_ONPR_44/zdm/log/ [ONPR] ls -ltrh
total 228K
-rw-rw-rw-. 1 oracle dba  14K Feb 22 10:28 zdm_setup_tns_alias_src_12728.log
-rwxrwxrwx. 1 oracle dba    0 Feb 22 10:29 default.log
-rw-rw-rw-. 1 oracle dba  17K Feb 22 10:29 zdm_validate_sys_pass_src_13181.log
-rw-rw-rw-. 1 root   root 57K Feb 22 10:29 zdm_validate_src_13240.log
-rw-rw-rw-. 1 root   root 62K Feb 22 10:30 zdm_discover_src_13722.log
-rw-rw-rw-. 1 root   root 22K Feb 22 10:30 zdm_export_db_cred_src_15024.log
-rw-rw-rw-. 1 oracle dba  43K Feb 22 10:39 zdm_configure_dg_src_18159.log

And on the target (ExaCC):

[opc@ExaCC-cl01n1 ~]$ cd /u02/app/oracle/zdm/zdm_ONPR_RZ2_44/zdm/log/

[opc@ExaCC-cl01n1 log]$ ls -ltrh
total 516K
-rw-rw-rw- 1 oracle root  33K Feb 22 10:29 zdm_setup_tns_alias_tgt_224683.log
-rwxrwxrwx 1 oracle root    0 Feb 22 10:29 default.log
-rw-rw-rw- 1 oracle root  29K Feb 22 10:29 zdm_validate_tgt_231351.log
-rw-rw-rw- 1 root   root  59K Feb 22 10:31 zdm_prepare_tgt_239052.log
-rw-rw-rw- 1 root   root 7.9K Feb 22 10:31 zdm_setup_tde_tgt_242739.log
-rw-rw-rw- 1 root   root  16K Feb 22 10:31 zdm_oss_restore_tgt_dropdatabase_244302.log
-rw-rw-rw- 1 root   root  21K Feb 22 10:31 zdm_import_db_cred_tgt_250414.log
-rw-rw-rw- 1 root   root  24K Feb 22 10:32 zdm_oss_restore_tgt_restoreinit_252706.log
-rw-rw-rw- 1 root   root  19K Feb 22 10:33 zdm_oss_restore_tgt_restorecntrl_260310.log
-rw-rw-rw- 1 root   root  37K Feb 22 10:34 zdm_oss_restore_tgt_restoredb_263322.log
-rw-rw-rw- 1 root   root  36K Feb 22 10:35 zdm_oss_restore_tgt_renametemp_279405.log
-rw-rw-rw- 1 root   root  22K Feb 22 10:35 zdm_oss_restore_tgt_recoverdb_281523.log
-rw-rw-rw- 1 root   root  22K Feb 22 10:35 zdm_import_db_cred_tgt_283595.log
-rw-rw-rw- 1 root   root  28K Feb 22 10:36 zdm_oss_recover_tgt_restorecntrl_288735.log
-rw-rw-rw- 1 root   root  37K Feb 22 10:36 zdm_oss_recover_tgt_restoredb_297044.log
-rw-rw-rw- 1 root   root  36K Feb 22 10:37 zdm_oss_recover_tgt_renametemp_298270.log
-rw-rw-rw- 1 root   root  22K Feb 22 10:37 zdm_oss_recover_tgt_recoverdb_300955.log
-rw-rw-rw- 1 root   root  41K Feb 22 10:38 zdm_finalize_tgt_305026.log

Finally I can check Data Guard configuration and see that my standby is synchronized (no gap).

oracle@vmonpr:/home/oracle/ [ONPR] dgmgrl
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Thu Feb 22 10:42:39 2024
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.

DGMGRL> connect /
Connected to "ONPR"
Connected as SYSDG.

DGMGRL> show configuration lag

Configuration - ZDM_onpr

  Protection Mode: MaxPerformance
  Members:
  onpr           - Primary database
    onprz_app_001t - Physical standby database
                     Transport Lag:      0 seconds (computed 1 second ago)
                     Apply Lag:          0 seconds (computed 1 second ago)

Fast-Start Failover:  Disabled

Configuration Status:
SUCCESS   (status updated 36 seconds ago)

I can see that ZDM created a new instance on the ExaCC, called as the final PDB name:

oracle@ExaCC-cl01n1:/u02/app/oracle/zdm/zdm_ONPR_RZ2_38/zdm/log/ [ONPR1 (CDB$ROOT)] ps -ef | grep [p]mon | grep -i ONPRZ_APP_001T1
oracle   236556      1  0 12:22 ?        00:00:00 ora_pmon_ONPRZ_APP_001T1

If we connect to it, we can see that the instance name is as the final PDB name and the db name is matching the source database one to migrate. The db_unique_name will be the pdb name.

oracle@ExaCC-cl01n1:/u02/app/oracle/product/19.0.0.0/dbhome_2/ [ONPR1 (CDB$ROOT)] export ORACLE_SID=ONPRZ_APP_001T1

oracle@ExaCC-cl01n1:/u02/app/oracle/product/19.0.0.0/dbhome_2/ [ONPRZ_APP_001T1 (CDB$ROOT)] sqh

SQL*Plus: Release 19.0.0.0.0 - Production on Wed Feb 14 13:22:42 2024
Version 19.21.0.0.0

Copyright (c) 1982, 2022, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Version 19.21.0.0.0

SQL> set lines 300 pages 500
SQL> select instance_name from v$instance;

INSTANCE_NAME
------------------------------------------------
ONPRZ_APP_001T1

SQL> select name, db_unique_name, open_mode, database_role from v$database;

NAME                        DB_UNIQUE_NAME                                                                             OPEN_MODE                                                    DATABASE_ROLE
--------------------------- ------------------------------------------------------------------------------------------ ------------------------------------------------------------ ------------------------------------------------
ONPR                        ONPRZ_APP_001T                                                                             MOUNTED                                                      PHYSICAL STANDBY

And I could see that this temporary database was single instance database:

SQL> show parameter cluster_database;

NAME                                 TYPE                              VALUE
------------------------------------ --------------------------------- ------------------------------
cluster_database                     boolean                           FALSE
cluster_database_instances           integer                           1

Migration – Maintenance Windows

Now we will have the maintenance windows and switch the database to the ExaCC. After the switchover, ZDM will run datapatch (to patch the database to the new version 19.21), convert the noncdb to pdb, upgrade the timezone and any other post migration tasks.

For this we just need to resume the job. We could even resume it adding a new pause if we want to do each steps separately.

To resume the job:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli resume job -jobid 44
zdmhost.domain.com: Audit ID: 552

Wrap up

ZDM is really a nice tool, and ZDM will mainly configure all for you. Albeit I was facing sometimes some problem that I had to troubleshoot, I could always find a solution. Oracle ZDM team is also very flexible and available to discuss. Knowing we have all the on-premises databases running 19.10, and as customer does not want to go without fallback, I could unfortunately not test the switchover and next conversion steps. I will blog later on it, once customer databases will be patched to last 19c version, and that I will be able to move forward with the test.

L’article Physical Online Migration to ExaCC with Oracle Zero Downtime Migration (ZDM) est apparu en premier sur dbi Blog.

PostgreSQL 17: transaction_timeout

Thu, 2024-02-22 02:21

PostgreSQL already comes with various time out parameters when it comes to sessions and statements. There is idle_in_transaction_session_timeout, idle_session_timeout, and there is statement_timeout. All of them are disabled by default but can be turned on to prevent either long running sessions or statements. Starting with PostgreSQL 17 there will be another time out related parameter: transaction_timeout. As the name implies, this one applies on the transaction level.

An easy test to see how this works is this:

postgres=# set transaction_timeout = '5s';
SET
postgres=# begin;
BEGIN
postgres=*# select pg_sleep(6);
FATAL:  terminating connection due to transaction timeout
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# 

If idle_in_transaction_session_timeout and statement_timeout are set as well but transaction_timeout is set to a shorter time, then transaction_timeout will be the one which counts:

postgres=# set idle_in_transaction_session_timeout = '10s';
SET
postgres=# set statement_timeout = '10s';
SET
postgres=# set transaction_timeout = '5s';
SET
postgres=# begin;
BEGIN
postgres=*# select pg_sleep(6);
FATAL:  terminating connection due to transaction timeout
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# 

Be aware that setting this on the instance level (postgresql.conf or postgresql.auto.conf) will make this active for all transactions and this is probably not what you want. Use it with care where you really need.

L’article PostgreSQL 17: transaction_timeout est apparu en premier sur dbi Blog.

Logical Offline Migration to ExaCC with Oracle Zero Downtime Migration (ZDM)

Tue, 2024-02-20 17:47

A while ago I had been testing and blogging about ZDM, see my previous articles. And I finally had the chance to implement it at one of our customer to migrate on-premises database to Exadata Cloud @Customer. In this article I would like to share with you my experience migrating an on-premises database to ExaCC using ZDM Logical Offline Migration with a backup location. We intended to use this method, as mandatory one for small Oracle SE2 databases, and preferred one for huge Oracle SE2 databases.

Read more: Logical Offline Migration to ExaCC with Oracle Zero Downtime Migration (ZDM) Naming convention

Of course I have anonymised all outputs to remove customer infrastructure names. So let’s take following convention.

ExaCC Cluster 01 node 01 : ExaCC-cl01n1
ExaCC Cluster 01 node 02 : ExaCC-cl01n2
On premises Source Host : vmonpr
Target db_unique_name on the ExaCC : ONPR_RZ2
Database Name to migrate : ONPR
ZDM Host : zdmhost
ZDM user : zdmuser
Domain : domain.com
ExaCC PDB to migrate to : ONPRZ_APP_001T

We will then migrate on-premise Single-Tenant database, named ONPR, to a PDB on the ExaCC. The PDB will be named ONPRZ_APP_001T.

We will migrate 3 schemas : USER1, USER2 and USER3

Ports

It is important to mention that following ports are needed:

SourceDestinationPortZDM HostOn-premise Host22ZDM HostOn-premise HostOracle NetZDM HostExaCC VM (both nodes)22ZDM HostExaCC (scan + VIP)Oracle NetOn-premise HostNFS Server111
2049ExaCCNFS Server111
2049

If Oracle Net ports are for example not opened between ZDM Host and ExaCC, the migration evaluation will immediately stopped at first steps named ZDM_VALIDATE_TGT, and following errors will be found in the log file:

PRGZ-3181 : Internal error: ValidateTargetDbLogicalZdm-5-PRGD-1059 : query to retrieve NLS database parameters failed
PRGD-1002 : SELECT statement "SELECT * FROM GLOBAL_NAME" execution as user "system" failed for database with Java Database Connectivity (JDBC) URL "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(port=1521)(host=ExaCC-cl01-scan.domain.com))(connect_data=(service_name=ONPRZ_APP_001T_PRI.domain.com)))"
IO Error: The Network Adapter could not establish the connection (CONNECTION_ID=9/tZ9Bt5Q5q5VfqU7JC/xA==)
Requirements

There is a few requirements that are needed

streams_pool_size instance parameter on the source database

To have an initial pool allocated and optimal Data Pump performance, source DB instance parameter needs to be set to minimal 256-300 MB for Logical Offline Migration.


Passwordless Login

Passwordless Login needs to be configured between ZDM Host, the Source Host and Target Host. See my previous blog : https://www.dbi-services.com/blog/oracle-zdm-migration-java-security-invalidkeyexception-invalid-key-format/

If Passwordless Login is not configured with one node, you will see such error in the log file during migration evaluation:

PRCZ-2006 : Unable to establish SSH connection to node "ExaCC-cl01n2" to execute command "<command_to_be_executed>"
No more authentication methods available.

Database Character Set

ExaCC target CDB should be in the same character set as on-premise source db. If the final CDB where you would like to host your new PDB has got a character set of AL32UTF8 for example (so this CDB can host various PDB character set) and your source DB is not in AL32UTF8 you will need to go through a temporary CDB on the ExaCC before relocating the PDB to the final one.

To check the character set, run following statement on the on-premise source DB:

SQL> select parameter, value from v$nls_parameters where parameter='NLS_CHARACTERSET';

If your ExaCC target CDB character set (here as example AL32UTF8) does not match the on-premise source DB character set (here as example WE8ISO8859P1), you will get following ZDM error during the evaluation of the migration:

PRGZ-3549 : Source NLS character set WE8ISO8859P1 is different from target NLS character set AL32UTF8.

Create PDB on the ExaCC

Final PDB will have to be created in one of the ExaCC container database according to the character set of the source database.


Create NFS directory

NFS directory and Oracle directories need to be setup to store Oracle dump file created automatically by ZDM. We will create the file system directory on the NFS Mount point and a new Oracle Directory named MIG_SOURCE_DEST in both databases (source and target). NFS directory should be accessible and shared between both environments.

If you do not have any shared NFS between source and target, you will get following kind of errors when evaluating the migration:

zdmhost: 2024-02-06T14:14:17.001Z : Executing phase ZDM_VALIDATE_DATAPUMP_SETTINGS_TGT
zdmhost: 2024-02-06T14:14:19.583Z : validating Oracle Data Pump dump directory /u02/app/oracle/product/19.0.0.0/dbhome_2/rdbms/log/10B7A59DF2E82A9AE063021FA10ABD38 ...
zdmhost: 2024-02-06T14:14:19.587Z : listing directory path /u02/app/oracle/product/19.0.0.0/dbhome_2/rdbms/log/10B7A59DF2E82A9AE063021FA10ABD38 on node ExaCC-cl01n1.domain.com ...
PRGZ-1211 : failed to validate specified database directory object path "/u02/app/oracle/product/19.0.0.0/dbhome_2/rdbms/log/10B7A59DF2E82A9AE063021FA10ABD38"
PRGZ-1420 : specified database import directory object path "/u02/app/oracle/product/19.0.0.0/dbhome_2/rdbms/log/10B7A59DF2E82A9AE063021FA10ABD38" is not shared between source and target database server

After having created the directory on the shared NFS, directory which will be shared on both the source and the target, you will need to create (or use an existing one) an oracle directory. I decided to create a new one, named MIG_SOURCE_DEST. The following will have to be run on both the source and the target databases.

SQL> create directory MIG_SOURCE_DEST as '/mnt/nfs_share/ONPR/';

Directory created.

SQL> select directory_name, directory_path from dba_directories where upper(directory_name) like '%MIG%';

DIRECTORY_NAME                 DIRECTORY_PATH
------------------------------ ------------------------------------------------------------
MIG_SOURCE_DEST                /mnt/nfs_share/ONPR/

You will also need to set correct permissions to the folder knowing that ExaCC OS user might not have the same id than the Source Host OS user.


Source user password version

It is mandatory that the password for all user schemas been migrated is in at least 12c versions. For old password version like 10G or 11G, password for user needs to be change to avoid additional troubleshooting and actions with ZDM migration.

To check user password version on the source, run following SQL statement:

SQL> select username, account_status, lock_date, password_versions from dba_users where ORACLE_MAINTAINED='N';

Prepare ZDM response file

We will use ZDM response file template named zdm_logical_template.rsp and adapt it.

[zdmuser@zdmhost migration]$ cp /u01/app/oracle/product/zdm/rhp/zdm/template/zdm_logical_template.rsp ./zdm_ONPR_logical_offline.rsp

The main parameters to take care of are :

ParameterExplanation DATA_TRANSFER_MEDIUMSpecifies how data will be transferred from the source database system to the target database system.
To be NFS TARGETDATABASE_ADMINUSERNAMEUser to be used on the target for the migration.
To be SYSTEM SOURCEDATABASE_ADMINUSERNAMEUser to be used on the source for the migration.
To be SYSTEM SOURCEDATABASE_CONNECTIONDETAILS_HOSTSource listener host SOURCEDATABASE_CONNECTIONDETAILS_PORTSource listener port. SOURCEDATABASE_CONNECTIONDETAILS_SERVICENAMESource database service name TARGETDATABASE_CONNECTIONDETAILS_HOSTTarget listener host (on ExaCC scan listener) TARGETDATABASE_CONNECTIONDETAILS_PORTTarget listener port.
To be 1521 TARGETDATABASE_CONNECTIONDETAILS_SERVICENAMETarget database service name TARGETDATABASE_DBTYPETarget environment
To be EXADATA DATAPUMPSETTINGS_SCHEMABATCH-1Comma separated list of Database schemas to be migrated DATAPUMPSETTINGS_SCHEMABATCHCOUNTExclusive with schemaBatch option. If specified, user schemas are identified DATAPUMPSETTINGS_DATAPUMPPARAMETERS_IMPORTPARALLELISMDEGREEMaximum number of worker processes that can be used for a Data Pump Import job
For SE2 needs to be configure to 1 DATAPUMPSETTINGS_DATAPUMPPARAMETERS_EXPORTPARALLELISMDEGREEMaximum number of worker processes that can be used for a Data Pump Export job
For SE2 needs to be configure to 1 DATAPUMPSETTINGS_DATAPUMPPARAMETERS_EXCLUDETYPELISTSpecifies a comma separated list of object types to exclude DATAPUMPSETTINGS_EXPORTDIRECTORYOBJECT_NAMEOracle DBA directory that was created on the source for the export DATAPUMPSETTINGS_EXPORTDIRECTORYOBJECT_PATHNFS directory for dump that is used for export DATAPUMPSETTINGS_IMPORTDIRECTORYOBJECT_NAMEOracle DBA directory that was created on the source for the import DATAPUMPSETTINGS_IMPORTDIRECTORYOBJECT_PATHNFS directory for dump that is used for import TABLESPACEDETAILS_AUTOCREATEIf set to TRUE, ZDM will automatically create the tablespaces
To be TRUE

Updated ZDM response file compared to ZDM template for the migration we are going to run:

[zdmuser@zdmhost migration]$ diff zdm_ONPR_logical_offline.rsp /u01/app/oracle/product/zdm/rhp/zdm/template/zdm_logical_template.rsp
30c30
< DATA_TRANSFER_MEDIUM=NFS
---
> DATA_TRANSFER_MEDIUM=OSS
47c47
< TARGETDATABASE_ADMINUSERNAME=system
---
> TARGETDATABASE_ADMINUSERNAME=
63c63
< SOURCEDATABASE_ADMINUSERNAME=system
---
> SOURCEDATABASE_ADMINUSERNAME=
80c80
< SOURCEDATABASE_CONNECTIONDETAILS_HOST=vmonpr
---
> SOURCEDATABASE_CONNECTIONDETAILS_HOST=
90c90
< SOURCEDATABASE_CONNECTIONDETAILS_PORT=13000
---
> SOURCEDATABASE_CONNECTIONDETAILS_PORT=
102c102
< SOURCEDATABASE_CONNECTIONDETAILS_SERVICENAME=ONPR.domain.com
---
> SOURCEDATABASE_CONNECTIONDETAILS_SERVICENAME=
153c153
< TARGETDATABASE_CONNECTIONDETAILS_HOST=ExaCC-cl01-scan.domain.com
---
> TARGETDATABASE_CONNECTIONDETAILS_HOST=
163c163
< TARGETDATABASE_CONNECTIONDETAILS_PORT=1521
---
> TARGETDATABASE_CONNECTIONDETAILS_PORT=
175c175
< TARGETDATABASE_CONNECTIONDETAILS_SERVICENAME=ONPRZ_APP_001T_PRI.domain.com
---
> TARGETDATABASE_CONNECTIONDETAILS_SERVICENAME=
307c307
< TARGETDATABASE_DBTYPE=EXADATA
---
> TARGETDATABASE_DBTYPE=
726c726
< DATAPUMPSETTINGS_SCHEMABATCH-1=USER1,USER2,USER3
---
> DATAPUMPSETTINGS_SCHEMABATCH-1=
947c947
< DATAPUMPSETTINGS_DATAPUMPPARAMETERS_IMPORTPARALLELISMDEGREE=1
---
> DATAPUMPSETTINGS_DATAPUMPPARAMETERS_IMPORTPARALLELISMDEGREE=
957c957
< DATAPUMPSETTINGS_DATAPUMPPARAMETERS_EXPORTPARALLELISMDEGREE=1
---
> DATAPUMPSETTINGS_DATAPUMPPARAMETERS_EXPORTPARALLELISMDEGREE=
969c969
< DATAPUMPSETTINGS_DATAPUMPPARAMETERS_EXCLUDETYPELIST=STATISTICS
---
> DATAPUMPSETTINGS_DATAPUMPPARAMETERS_EXCLUDETYPELIST=
1137c1137
< DATAPUMPSETTINGS_EXPORTDIRECTORYOBJECT_NAME=MIG_SOURCE_DEST
---
> DATAPUMPSETTINGS_EXPORTDIRECTORYOBJECT_NAME=
1146c1146
< DATAPUMPSETTINGS_EXPORTDIRECTORYOBJECT_PATH=/mnt/nfs_share/ONPR
---
> DATAPUMPSETTINGS_EXPORTDIRECTORYOBJECT_PATH=
1166c1166
< DATAPUMPSETTINGS_IMPORTDIRECTORYOBJECT_NAME=MIG_SOURCE_DEST
---
> DATAPUMPSETTINGS_IMPORTDIRECTORYOBJECT_NAME=
1175c1175
< DATAPUMPSETTINGS_IMPORTDIRECTORYOBJECT_PATH=/mnt/nfs_nfs_share/ONPR
---
> DATAPUMPSETTINGS_IMPORTDIRECTORYOBJECT_PATH=
2146c2146
< TABLESPACEDETAILS_AUTOCREATE=TRUE
---
> TABLESPACEDETAILS_AUTOCREATE=

ZDM Build Version

I’m using ZDM build 21.4.

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli -build
version: 21.0.0.0.0
full version: 21.4.0.0.0
patch version: 21.4.1.0.0
label date: 221207.25
ZDM kit build date: Jul 31 2023 14:24:25 UTC
CPAT build version: 23.7.0

The migration will be done using ZDM CLI (zdmcli), which run migration through jobs. We can abort, query, modify, suspend or resume a running job.

Evaluate the migration

We will first run zdmcli with the -eval option to evaluate the migration and test if all is ok.

We need to provide some arguments :

ArgumentValue -sourcesidDatabase Name of the source database in case the source database is a single instance deployed on a non Grid Infrastructure environment -rspZDM response file -sourcenodeSource host -srcauth with 3 sub-arguments:
-srcarg1
-srcarg2
-srcarg3 Name of the source authentication plug-in with 3 sub-arguments:
1st argument: user. Should be oracle
2nd argument: ZDM private RSA Key
3rd argument: sudo location -targetnodeTarget host -tgtauth with 3 sub-arguments:
-tgtarg1
-tgtarg2
-tgtarg3 Name of the target authentication plug-in with 3 sub-arguments:
1st argument: user. Should be opc
2nd argument: ZDM private RSA Key
3rd argument: sudo location
[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_logical_offline.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -eval
zdmhost.domain.com: Audit ID: 194
Enter source database administrative user "system" password:
Enter target database administrative user "system" password:
Operation "zdmcli migrate database" scheduled with the job ID "27".
[zdmuser@zdmhost migration]$

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli query job -jobid 27
zdmhost.domain.com: Audit ID: 197
Job ID: 27
User: zdmuser
Client: zdmhost
Job Type: "EVAL"
Scheduled job command: "zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_logical_offline.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -eval"
Scheduled job execution start time: 2024-02-06T16:03:49+01. Equivalent local time: 2024-02-06 16:03:49
Current status: SUCCEEDED
Result file path: "/u01/app/oracle/chkbase/scheduled/job-27-2024-02-06-16:04:01.log"
Metrics file path: "/u01/app/oracle/chkbase/scheduled/job-27-2024-02-06-16:04:01.json"
Excluded objects file path: "/u01/app/oracle/chkbase/scheduled/job-27-filtered-objects-2024-02-06T16:04:13.522.json"
Job execution start time: 2024-02-06 16:04:01
Job execution end time: 2024-02-06 16:05:55
Job execution elapsed time: 1 minutes 54 seconds
ZDM_VALIDATE_TGT ...................... COMPLETED
ZDM_VALIDATE_SRC ...................... COMPLETED
ZDM_SETUP_SRC ......................... COMPLETED
ZDM_PRE_MIGRATION_ADVISOR ............. COMPLETED
ZDM_VALIDATE_DATAPUMP_SETTINGS_SRC .... COMPLETED
ZDM_VALIDATE_DATAPUMP_SETTINGS_TGT .... COMPLETED
ZDM_PREPARE_DATAPUMP_SRC .............. COMPLETED
ZDM_DATAPUMP_ESTIMATE_SRC ............. COMPLETED
ZDM_CLEANUP_SRC ....................... COMPLETED

We can see that the Job Type is EVAL, and that the Current Status is SUCCEEDED with all prechecks steps having a COMPLETED status.

We can also review the log file which will provide us more information. We will see all the checks that the tool is doing. We can also review the output of the advisor which is already warning us about old password for users. Reviewing all the advisor outputs might help. We can also see that ZDM will ignore as non critical a few ORA errors. This make senses because the migration should still happen even if the user is already created with empty objects for example.

[zdmuser@zdmhost ~]$ cat /u01/app/oracle/chkbase/scheduled/job-27-2024-02-06-16:04:01.log
zdmhost: 2024-02-06T15:04:01.505Z : Starting zero downtime migrate operation ...
zdmhost: 2024-02-06T15:04:01.511Z : Executing phase ZDM_VALIDATE_TGT
zdmhost: 2024-02-06T15:04:04.952Z : Fetching details of on premises Exadata Database "ONPRZ_APP_001T_PRI.domain.com"
zdmhost: 2024-02-06T15:04:04.953Z : Type of database : "Exadata at Customer"
zdmhost: 2024-02-06T15:04:05.014Z : Verifying configuration and status of target database "ONPRZ_APP_001T_PRI.domain.com"
zdmhost: 2024-02-06T15:04:09.067Z : Global database name: ONPRZ_APP_001T.DOMAIN.COM
zdmhost: 2024-02-06T15:04:09.067Z : Target PDB name : ONPRZ_APP_001T
zdmhost: 2024-02-06T15:04:09.068Z : Database major version : 19
zdmhost: 2024-02-06T15:04:09.069Z : obtaining database ONPRZ_APP_001T.DOMAIN.COM tablespace configuration details...
zdmhost: 2024-02-06T15:04:09.585Z : Execution of phase ZDM_VALIDATE_TGT completed
zdmhost: 2024-02-06T15:04:09.670Z : Executing phase ZDM_VALIDATE_SRC
zdmhost: 2024-02-06T15:04:09.736Z : Verifying configuration and status of source database "ONPR.domain.com"
zdmhost: 2024-02-06T15:04:09.737Z : source database host vmonpr service ONPR.domain.com
zdmhost: 2024-02-06T15:04:13.464Z : Global database name: ONPR.DOMAIN.COM
zdmhost: 2024-02-06T15:04:13.465Z : Database major version : 19
zdmhost: 2024-02-06T15:04:13.466Z : Validating database time zone compatibility...
zdmhost: 2024-02-06T15:04:13.521Z : Database objects which will be migrated : [USER2, USER3]
zdmhost: 2024-02-06T15:04:13.530Z : Execution of phase ZDM_VALIDATE_SRC completed
zdmhost: 2024-02-06T15:04:13.554Z : Executing phase ZDM_SETUP_SRC
zdmhost: 2024-02-06T15:05:04.925Z : Execution of phase ZDM_SETUP_SRC completed
zdmhost: 2024-02-06T15:05:04.944Z : Executing phase ZDM_PRE_MIGRATION_ADVISOR
zdmhost: 2024-02-06T15:05:05.371Z : Running CPAT (Cloud Premigration Advisor Tool) on the source node vmonpr ...
zdmhost: 2024-02-06T15:05:07.894Z : Premigration advisor output:
Cloud Premigration Advisor Tool Version 23.7.0
CPAT-4007: Warning: the build date for this version of the Cloud Premigration Advisor Tool is over 216 days.  Please run "premigration.sh --updatecheck" to see if a more recent version of this tool is available.
Please download the latest available version of the CPAT application.

Cloud Premigration Advisor Tool completed with overall result: Review Required
Cloud Premigration Advisor Tool generated report location: /u00/app/oracle/zdm/zdm_ONPR_27/out/premigration_advisor_report.json
Cloud Premigration Advisor Tool generated report location: /u00/app/oracle/zdm/zdm_ONPR_27/out/premigration_advisor_report.txt

 CPAT exit code: 2
 RESULT: Review Required

Schemas Analyzed (2): USER3,USER2
A total of 17 checks were performed
There were 0 checks with Failed results
There were 0 checks with Action Required results
There were 2 checks with Review Required results: has_noexport_object_grants (8 relevant objects), has_users_with_10g_password_version (1 relevant objects)
There were 0 checks with Review Suggested results has_noexport_object_grants
         RESULT: Review Required
         DESCRIPTION: Not all object grants are exported by Data Pump.
         ACTION: Recreate any required grants on the target instance.  See Oracle Support Document ID 1911151.1 for more information. Note that any SELECT grants on system objects will need to be replaced with READ grants; SELECT is no longer allowed on system objects.
has_users_with_10g_password_version
         RESULT: Review Required
         DESCRIPTION: Case-sensitive passwords are required on ADB.
         ACTION: To avoid Data Pump migration warnings change the passwords for the listed users before migration. Alternatively, modify these users passwords after migration to avoid login failures. See Oracle Support Document ID 2289453.1 for more information.

zdmhost: 2024-02-06T15:05:07.894Z : Execution of phase ZDM_PRE_MIGRATION_ADVISOR completed
zdmhost: 2024-02-06T15:05:07.948Z : Executing phase ZDM_VALIDATE_DATAPUMP_SETTINGS_SRC
zdmhost: 2024-02-06T15:05:08.545Z : validating Oracle Data Pump dump directory /mnt/nfs_share/ONPR/ ...
zdmhost: 2024-02-06T15:05:08.545Z : validating Data Pump dump directory path /mnt/nfs_share/ONPR/ on node vmonpr.domain.com ...
zdmhost: 2024-02-06T15:05:08.975Z : validating if target database user can read files shared on medium NFS
zdmhost: 2024-02-06T15:05:08.976Z : setting Data Pump dump file permission at source node...
zdmhost: 2024-02-06T15:05:08.977Z : changing group of Data Pump dump files in directory path /mnt/nfs_share/ONPR/ on node vmonpr.domain.com ...
zdmhost: 2024-02-06T15:05:09.958Z : Execution of phase ZDM_VALIDATE_DATAPUMP_SETTINGS_SRC completed
zdmhost: 2024-02-06T15:05:10.005Z : Executing phase ZDM_VALIDATE_DATAPUMP_SETTINGS_TGT
zdmhost: 2024-02-06T15:05:13.307Z : validating Oracle Data Pump dump directory /mnt/nfs_nfs_share/ONPR ...
zdmhost: 2024-02-06T15:05:13.308Z : listing directory path /mnt/nfs_nfs_share/ONPR on node ExaCC-cl01n1.domain.com ...
zdmhost: 2024-02-06T15:05:14.008Z : Execution of phase ZDM_VALIDATE_DATAPUMP_SETTINGS_TGT completed
zdmhost: 2024-02-06T15:05:14.029Z : Executing phase ZDM_PREPARE_DATAPUMP_SRC
zdmhost: 2024-02-06T15:05:14.033Z : Execution of phase ZDM_PREPARE_DATAPUMP_SRC completed
zdmhost: 2024-02-06T15:05:14.058Z : Executing phase ZDM_DATAPUMP_ESTIMATE_SRC
zdmhost: 2024-02-06T15:05:14.059Z : starting Data Pump Dump estimate for database "ONPR.DOMAIN.COM"
zdmhost: 2024-02-06T15:05:14.060Z : running Oracle Data Pump job "ZDM_27_DP_ESTIMATE_6279" for database "ONPR.DOMAIN.COM"
zdmhost: 2024-02-06T15:05:14.071Z : applying Data Pump dump compression ALL algorithm MEDIUM
zdmhost: 2024-02-06T15:05:14.135Z : applying Data Pump dump encryption ALL algorithm AES128
zdmhost: 2024-02-06T15:05:14.135Z : Oracle Data Pump Export parallelism set to 1 ...
zdmhost: 2024-02-06T15:05:14.286Z : Oracle Data Pump errors to be ignored are ORA-31684,ORA-39111,ORA-39082...
zdmhost: 2024-02-06T15:05:23.515Z : Oracle Data Pump log located at /mnt/nfs_share/ONPR//ZDM_27_DP_ESTIMATE_6279.log in the Database Server node
zdmhost: 2024-02-06T15:05:53.643Z : Total estimation using BLOCKS method: 3.112 GB
zdmhost: 2024-02-06T15:05:53.644Z : Execution of phase ZDM_DATAPUMP_ESTIMATE_SRC completed
zdmhost: 2024-02-06T15:05:53.721Z : Executing phase ZDM_CLEANUP_SRC
zdmhost: 2024-02-06T15:05:54.261Z : Cleaning up ZDM on the source node vmonpr ...
zdmhost: 2024-02-06T15:05:55.506Z : Execution of phase ZDM_CLEANUP_SRC completed

Migrate Source database to ExaCC

Once the evaluation is all good, we can move forward with running the migration. It is exactly the same zdmcli command without the option -eval.

Let’s have a try and run it. We will have to provide both source and target system password:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_logical_offline.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo
zdmhost.domain.com: Audit ID: 205
Enter source database administrative user "system" password:
Enter target database administrative user "system" password:
Operation "zdmcli migrate database" scheduled with the job ID "29".

We will query the job:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli query job -jobid 29
zdmhost.domain.com: Audit ID: 211
Job ID: 29
User: zdmuser
Client: zdmhost
Job Type: "MIGRATE"
Scheduled job command: "zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_logical_offline.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo"
Scheduled job execution start time: 2024-02-07T08:21:38+01. Equivalent local time: 2024-02-07 08:21:38
Current status: FAILED
Result file path: "/u01/app/oracle/chkbase/scheduled/job-29-2024-02-07-08:22:03.log"
Metrics file path: "/u01/app/oracle/chkbase/scheduled/job-29-2024-02-07-08:22:03.json"
Excluded objects file path: "/u01/app/oracle/chkbase/scheduled/job-29-filtered-objects-2024-02-07T08:22:16.074.json"
Job execution start time: 2024-02-07 08:22:03
Job execution end time: 2024-02-07 08:30:29
Job execution elapsed time: 8 minutes 25 seconds
ZDM_VALIDATE_TGT ...................... COMPLETED
ZDM_VALIDATE_SRC ...................... COMPLETED
ZDM_SETUP_SRC ......................... COMPLETED
ZDM_PRE_MIGRATION_ADVISOR ............. COMPLETED
ZDM_VALIDATE_DATAPUMP_SETTINGS_SRC .... COMPLETED
ZDM_VALIDATE_DATAPUMP_SETTINGS_TGT .... COMPLETED
ZDM_PREPARE_DATAPUMP_SRC .............. COMPLETED
ZDM_DATAPUMP_ESTIMATE_SRC ............. COMPLETED
ZDM_PREPARE_DATAPUMP_TGT .............. COMPLETED
ZDM_PARALLEL_EXPORT_IMPORT ............ FAILED
ZDM_POST_DATAPUMP_SRC ................. PENDING
ZDM_POST_DATAPUMP_TGT ................. PENDING
ZDM_POST_ACTIONS ...................... PENDING
ZDM_CLEANUP_SRC ....................... PENDING

As we can see the jobs failed during import of the data.

Checking ZDM logs file I could see following errors:

ORA-39384: Warning: User USER2 has been locked and the password expired.
ORA-39384: Warning: User USER1 has been locked and the password expired.

Checking the user on the source, I could see that USER1 and USER2 is having only password in old 10G version, which definitively will make problem :

SQL> select username, account_status, lock_date, password_versions from dba_users where ORACLE_MAINTAINED='N';

USERNAME                       ACCOUNT_STATUS                   LOCK_DATE            PASSWORD_VERSIONS
------------------------------ -------------------------------- -------------------- -----------------
USER1                          OPEN                                                  10G
USER2                          OPEN                                                  10G
USER3                          OPEN                                                  10G 11G 12C

3 rows selected.

Checking on the target PDB on the ExaCC, I could see that, as these 2 users were having 10G password, ZDM, after importing the data, locked the related users:

SQL> select username, account_status, lock_date from dba_users where ORACLE_MAINTAINED='N';

USERNAME                       ACCOUNT_STATUS                   LOCK_DATE
------------------------------ -------------------------------- --------------------
USER1                          EXPIRED & LOCKED                 07-FEB-2024 08:26:10
ADMIN                          LOCKED                           06-FEB-2024 14:36:18
USER2                          EXPIRED & LOCKED                 07-FEB-2024 08:26:10
USER3                          OPEN

4 rows selected.

On the ExaCC target PDB, I unlocked the user and changed the password.

SQL> alter user USER2 account unlock;

User altered.

SQL> alter user user1 account unlock;

User altered.

SQL> alter user USER2 identified by ************;

User altered.

SQL> alter user user1 identified by ************;

User altered.

SQL> select username, account_status, lock_date from dba_users where ORACLE_MAINTAINED='N';

USERNAME                       ACCOUNT_STATUS                   LOCK_DATE
------------------------------ -------------------------------- --------------------
USER1                          OPEN
ADMIN                          LOCKED                           06-FEB-2024 14:36:18
USER2                          OPEN
USER3                          OPEN

6 rows selected.

And I resumed the zdmcli jobs so he would start again where it was failing:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli resume job -jobid 29
zdmhost.domain.com: Audit ID: 213

The job was still failing at the same step, and in the log file I could find several errors like :

BATCH1 : Non-ignorable errors found in Oracle Data Pump job ZDM_29_DP_IMPORT_5005_BATCH1 log are
ORA-39151: Table "USER3"."OPB_MAP_OPTIONS" exists. All dependent metadata and data will be skipped due to table_exists_action of skip
ORA-39151: Table "USER3"."OPB_USER_GROUPS" exists. All dependent metadata and data will be skipped due to table_exists_action of skip

In fact, as ZDM previously failed on the import step, ZDM tried to import the data again. But table was still there.

So I had to cleanup the Target PDB on the ExaCC for USER3 and USER2. USER1 had no objects.

As I did not want to change the on-premise source database, changing user password, I checked how the users were created on the ExaCC, I dropped them to create them again before resuming the jobs.

SQL> set long 99999999
SQL> select dbms_metadata.get_ddl('USER','USER2') from dual;

DBMS_METADATA.GET_DDL('USER','USER2')
--------------------------------------------------------------------------------

   CREATE USER "USER2" IDENTIFIED BY VALUES 'S:C5EF**********3F79'
      DEFAULT TABLESPACE "TSP******"
      TEMPORARY TABLESPACE "TEMP"

SQL> select dbms_metadata.get_ddl('USER','USER3') from dual;

DBMS_METADATA.GET_DDL('USER','USER3')
--------------------------------------------------------------------------------

   CREATE USER "USER3" IDENTIFIED BY VALUES 'S:EDD8**********FD44'
      DEFAULT TABLESPACE "TSP******"
      TEMPORARY TABLESPACE "TEMP"

SQL> drop user USER2 cascade;

User dropped.

SQL> drop user USER3 cascade;

User dropped.

SQL> CREATE USER "USER3" IDENTIFIED BY VALUES 'S:EDD86**********8FD44'
  2  DEFAULT TABLESPACE "TSP******"
  3  TEMPORARY TABLESPACE "TEMP";

User created.

SQL> CREATE USER "USER2" IDENTIFIED BY VALUES 'S:C5EF**********3F79'
  2  DEFAULT TABLESPACE "TSP******"
  3  TEMPORARY TABLESPACE "TEMP";

User created.

And I resumed the job once again:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli resume job -jobid 29
zdmhost.domain.com: Audit ID: 219

And now the migration has been completed successfully. The job type is MIGRATE now and Current Status is SUCCEEDED:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli query job -jobid 29
zdmhost.domain.com: Audit ID: 223
Job ID: 29
User: zdmuser
Client: zdmhost
Job Type: "MIGRATE"
Scheduled job command: "zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_logical_offline.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo"
Scheduled job execution start time: 2024-02-07T08:21:38+01. Equivalent local time: 2024-02-07 08:21:38
Current status: SUCCEEDED
Result file path: "/u01/app/oracle/chkbase/scheduled/job-29-2024-02-07-08:22:03.log"
Metrics file path: "/u01/app/oracle/chkbase/scheduled/job-29-2024-02-07-08:22:03.json"
Excluded objects file path: "/u01/app/oracle/chkbase/scheduled/job-29-filtered-objects-2024-02-07T08:22:16.074.json"
Job execution start time: 2024-02-07 08:22:03
Job execution end time: 2024-02-07 09:01:21
Job execution elapsed time: 14 minutes 43 seconds
ZDM_VALIDATE_TGT ...................... COMPLETED
ZDM_VALIDATE_SRC ...................... COMPLETED
ZDM_SETUP_SRC ......................... COMPLETED
ZDM_PRE_MIGRATION_ADVISOR ............. COMPLETED
ZDM_VALIDATE_DATAPUMP_SETTINGS_SRC .... COMPLETED
ZDM_VALIDATE_DATAPUMP_SETTINGS_TGT .... COMPLETED
ZDM_PREPARE_DATAPUMP_SRC .............. COMPLETED
ZDM_DATAPUMP_ESTIMATE_SRC ............. COMPLETED
ZDM_PREPARE_DATAPUMP_TGT .............. COMPLETED
ZDM_PARALLEL_EXPORT_IMPORT ............ COMPLETED
ZDM_POST_DATAPUMP_SRC ................. COMPLETED
ZDM_POST_DATAPUMP_TGT ................. COMPLETED
ZDM_POST_ACTIONS ...................... COMPLETED
ZDM_CLEANUP_SRC ....................... COMPLETED

ZDM log file:

[zdmuser@zdmhost ~]$ tail -37 /u01/app/oracle/chkbase/scheduled/job-29-2024-02-07-08:22:03.log
####################################################################
zdmhost: 2024-02-07T07:56:33.580Z : Resuming zero downtime migrate operation ...
zdmhost: 2024-02-07T07:56:33.587Z : Starting zero downtime migrate operation ...
zdmhost: 2024-02-07T07:56:37.205Z : Fetching details of on premises Exadata Database "ONPRZ_APP_001T_PRI.domain.com"
zdmhost: 2024-02-07T07:56:37.205Z : Type of database : "Exadata at Customer"
zdmhost: 2024-02-07T07:56:37.283Z : Skipping phase ZDM_VALIDATE_SRC on resume
zdmhost: 2024-02-07T07:56:37.365Z : Skipping phase ZDM_SETUP_SRC on resume
zdmhost: 2024-02-07T07:56:37.377Z : Skipping phase ZDM_PRE_MIGRATION_ADVISOR on resume
zdmhost: 2024-02-07T07:56:37.391Z : Skipping phase ZDM_VALIDATE_DATAPUMP_SETTINGS_SRC on resume
zdmhost: 2024-02-07T07:56:37.406Z : Skipping phase ZDM_VALIDATE_DATAPUMP_SETTINGS_TGT on resume
zdmhost: 2024-02-07T07:56:37.422Z : Skipping phase ZDM_PREPARE_DATAPUMP_SRC on resume
zdmhost: 2024-02-07T07:56:37.437Z : Skipping phase ZDM_DATAPUMP_ESTIMATE_SRC on resume
zdmhost: 2024-02-07T07:56:37.455Z : Skipping phase ZDM_PREPARE_DATAPUMP_TGT on resume
zdmhost: 2024-02-07T07:56:37.471Z : Executing phase ZDM_PARALLEL_EXPORT_IMPORT
zdmhost: 2024-02-07T07:56:37.482Z : Skipping phase ZDM_DATAPUMP_EXPORT_SRC_BATCH1 on resume
zdmhost: 2024-02-07T07:56:37.485Z : Skipping phase ZDM_TRANSFER_DUMPS_SRC_BATCH1 on resume
zdmhost: 2024-02-07T07:56:37.487Z : Executing phase ZDM_DATAPUMP_IMPORT_TGT_BATCH1
zdmhost: 2024-02-07T07:56:38.368Z : listing directory path /mnt/nfs_nfs_share/ONPR on node ExaCC-cl01n1.domain.com ...
zdmhost: 2024-02-07T07:56:39.474Z : Oracle Data Pump Import parallelism set to 1 ...
zdmhost: 2024-02-07T07:56:39.481Z : Oracle Data Pump errors to be ignored are ORA-31684,ORA-39111,ORA-39082...
zdmhost: 2024-02-07T07:56:39.481Z : starting Data Pump Import for database "ONPRZ_APP_001T.DOMAIN.COM"
zdmhost: 2024-02-07T07:56:39.482Z : running Oracle Data Pump job "ZDM_29_DP_IMPORT_5005_BATCH1" for database "ONPRZ_APP_001T.DOMAIN.COM"
zdmhost: 2024-02-07T08:00:46.569Z : Oracle Data Pump job "ZDM_29_DP_IMPORT_5005_BATCH1" for database "ONPRZ_APP_001T.DOMAIN.COM" completed.
zdmhost: 2024-02-07T08:00:46.569Z : Oracle Data Pump log located at /mnt/nfs_nfs_share/ONPR/ZDM_29_DP_IMPORT_5005_BATCH1.log in the Database Server node
zdmhost: 2024-02-07T08:01:17.239Z : Execution of phase ZDM_DATAPUMP_IMPORT_TGT_BATCH1 completed
zdmhost: 2024-02-07T08:01:17.248Z : Execution of phase ZDM_PARALLEL_EXPORT_IMPORT completed
zdmhost: 2024-02-07T08:01:17.268Z : Executing phase ZDM_POST_DATAPUMP_SRC
zdmhost: 2024-02-07T08:01:17.272Z : listing directory path /mnt/nfs_share/ONPR/ on node vmonpr.domain.com ...
zdmhost: 2024-02-07T08:01:17.811Z : deleting Data Pump dump in directory path /mnt/nfs_share/ONPR/ on node vmonpr.domain.com ...
zdmhost: 2024-02-07T08:01:19.052Z : Execution of phase ZDM_POST_DATAPUMP_SRC completed
zdmhost: 2024-02-07T08:01:19.070Z : Executing phase ZDM_POST_DATAPUMP_TGT
zdmhost: 2024-02-07T08:01:19.665Z : Execution of phase ZDM_POST_DATAPUMP_TGT completed
zdmhost: 2024-02-07T08:01:19.689Z : Executing phase ZDM_POST_ACTIONS
zdmhost: 2024-02-07T08:01:19.693Z : Execution of phase ZDM_POST_ACTIONS completed
zdmhost: 2024-02-07T08:01:19.716Z : Executing phase ZDM_CLEANUP_SRC
zdmhost: 2024-02-07T08:01:20.213Z : Cleaning up ZDM on the source node vmonpr ...
zdmhost: 2024-02-07T08:01:21.458Z : Execution of phase ZDM_CLEANUP_SRC completed
[zdmuser@zdmhost ~]$

If we check the ZDM import log saved on the NFS shared folder, here named ZDM_32_DP_IMPORT_1847_BATCH1.log, we would see that the import has been done successfully with 3 errors. The 3 errors are displayed in the same log file and are:

09-FEB-24 10:00:22.534: W-1 Processing object type SCHEMA_EXPORT/USER
09-FEB-24 10:00:22.943: ORA-31684: Object type USER:"USER1" already exists
09-FEB-24 10:00:22.943: ORA-31684: Object type USER:"USER2" already exists
09-FEB-24 10:00:22.943: ORA-31684: Object type USER:"USER3" already exists

These errors are here because we created the user on the ExaCC target DB previously to resuming zdmcli job, thus before performing the import again. These errors are fortunately part of the list that ZDM would ignore, which make senses.

Checks

We can then of course do some tests as comparing the number of objects for the migrated users on the source and the target, checking pdb violation, checking invalid objects, ensuring that tablespace are encrypted on the ExaCC target DB, and so on.

To compare number of objects:

SQL> select owner, count(*) from dba_objects where owner in ('USER1','USER2','USER3') group by owner order by 1;

OWNER             COUNT(*)
--------------- ----------
USER3                758
USER2                760

To check that tablespace are encrypted:

SQL> select a.con_id, a.tablespace_name, nvl(b.ENCRYPTIONALG,'NOT ENCRYPTED') from  cdb_tablespaces a, (select x.con_id, y.ENCRYPTIONALG, x.name from V$TABLESPACE x,  V$ENCRYPTED_TABLESPACES y where x.ts#=y.ts# and x.con_id=y.con_id) b where a.con_id=b.con_id(+) and a.tablespace_name=b.name(+) order by 1,2;

To check pdb violations:

SQL> select status, message from pdb_plug_in_violations;

To check invalid objects:

SQL> select count(*) from dba_invalid_objects;

And we could, of course, if needed, relocate the PDB to another ExaCC CDB.

Conclusion

That’s it. We could easily migrate a single-tenant on-premise database to ExaCC PDB using ZDM Logical Offline. The tools really have advantages. We do not need to deal with any oracle command, like running datapump on ourselves.

In the next blog I will show you we migrated on-premises database to ExaCC on our customer system using ZDM Physical Online Migration.

L’article Logical Offline Migration to ExaCC with Oracle Zero Downtime Migration (ZDM) est apparu en premier sur dbi Blog.

Keep your data sorted with AI

Tue, 2024-02-20 09:08

As mentioned in my previous posts about Enterprise Content Management (like this one). A key point to have your Content efficiently stored is the use of Metadata. It helps to sort and retrieve easily your company data.

But to be honest, this part is often boring for technical people like me. Imagine it for people for whom IT is just a tool to get the job done.

On the one hand, we need to be precise to have a fine organization, but on the other hand, the more metadata we insert, the less likely we are to have filled them correctly.

Based on my experience, if there are a lot of properties required before uploading or creating a document, it leads irremediably to a partial adoption of the solution. Or even worst to wrong information associated to the document, which can cause other issues and prevent the system to serve the business as designed.

All the challenge is to find the right balance. But no worries, M-Files is there, to help you!

Let’s introduce the M-Files Intelligent Metadata Layer module.

IML includes 2 main aspects.

The First one is a repository-neutral approach to unify your Enterprise Content irrespective of the sources (as soon as a connector exists or you develop it ), as seen below:

M-Files Intelligent Metadata LayerM-Files IML (Source M-Files)

The second, the one we are interested of today, is Intelligent Services. These are artificial intelligence components that provide metadata suggestions by analyzing content with natural language processing techniques and text analytics.

Again this Intelligent Services is sub-divided into several services and we will focus on two:

M-Files Matcher

This module analyzes the documents and search for a matching value.

For example this module can catch the name of a supplier present in a document and suggest you to put it as a property “supplier” for the document.

Basic M-Files MatcherBasic M-Files Matcher

But It can also be more smart, like you have an e-mail address in the document, this e-mail address is the one used to contact the supplier, then M-Files can make the link and offer you to relate it to the document.

Advanced M-Files MatcherAdvanced M-Files Matcher M-Files Text Analytics

This module is a bit different, with some configuration made by your favorite M-Files administrator (us), it can suggest some property values detected in the document.

Of course, in the example below the capacities are not exhaustive

M-Files Text AnalyticsM-Files Text Analytics

Firstly based on the document title, we can select the type of document, in this case SOP or Procedure suggest the class “SOP Working copy”

Then it detect:

  • the title
  • the ID of the document
  • the short description
  • The department concerned by the document

As mentioned before, it is only suggestions. If you are not agree with a value, feel free to put your own.

This is what I configured for this example, but as soon as you can write a Regular expression to catch the data then you can automate it, Awesome!

Find more information about M-Files AI click here.

L’article Keep your data sorted with AI est apparu en premier sur dbi Blog.

Kubernetes Networking by Using Cilium – Intermediate Level – Part-1

Tue, 2024-02-20 01:36

If you are new or uneasy with networking in Kubernetes, you may benefit from my previous blog for beginner level. In this blog post I will show you in a Kubernetes cluster what a building and its networking components look like. As a reminder, below is the picture I drew in my previous blog to illustrate the networking in a Kubernetes cluster with Cilium:

If you want to understand this networking in Kubernetes in more details, read on, this blog post is for you! I’ll consider you know the basics about Kubernetes and how to interact with it, otherwise you may find our training course on it very interesting (in English or in French)!

Diving into the IP Addresses configuration

Let’s start by checking our environment and update our picture with real information from our Kubernetes cluster:

$ kubectl get no -owide
NAME                      STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION      CONTAINER-RUNTIME
mycluster-control-plane   Ready    control-plane   113d   v1.27.3   172.18.0.3    <none>        Debian GNU/Linux 11 (bullseye)   5.15.0-94-generic   containerd://1.7.1
mycluster-worker          Ready    <none>          113d   v1.27.3   172.18.0.2    <none>        Debian GNU/Linux 11 (bullseye)   5.15.0-94-generic   containerd://1.7.1
mycluster-worker2         Ready    <none>          113d   v1.27.3   172.18.0.4    <none>        Debian GNU/Linux 11 (bullseye)   5.15.0-94-generic   containerd://1.7.1

$ kubectl get po -n networking101 -owide
NAME                        READY   STATUS    RESTARTS      AGE     IP            NODE                NOMINATED NODE   READINESS GATES
busybox-c8bbbbb84-fmhwc     1/1     Running   1 (24m ago)   3d23h   10.10.1.164   mycluster-worker2   <none>           <none>
busybox-c8bbbbb84-t6ggh     1/1     Running   1 (24m ago)   3d23h   10.10.2.117   mycluster-worker    <none>           <none>
netshoot-7d996d7884-fwt8z   1/1     Running   0             79s     10.10.2.121   mycluster-worker    <none>           <none>
netshoot-7d996d7884-gcxrm   1/1     Running   0             80s     10.10.1.155   mycluster-worker2   <none>           <none>

You can now see for real that the IP subnets of the pods are different than the one of the nodes. Also the IP subnet of pods on each node is different from each other. If you are not sure why, you are perfectly right because it is not so clear at this stage. So let’s clarify it by checking our Cilium configuration.

I’ve told you in my previous blog that there is one Cilium Agent per building. This Agent is a pod itself and he takes care about networking in the node. This is what they look like in our cluster:

$ kubectl get po -n kube-system -owide|grep cilium
cilium-9zh9s                                      1/1     Running   5 (65m ago)   113d   172.18.0.3    mycluster-control-plane   <none>           <none>
cilium-czffc                                      1/1     Running   5 (65m ago)   113d   172.18.0.4    mycluster-worker2         <none>           <none>
cilium-dprvh                                      1/1     Running   5 (65m ago)   113d   172.18.0.2    mycluster-worker          <none>           <none>
cilium-operator-6b865946df-24ljf                  1/1     Running   5 (65m ago)   113d   172.18.0.2    mycluster-worker          <none>           <none>

There is two things to notice here:

  • The Cilium Agent is a Daemonset so that is how you make sure to always have one on each node of our cluster. As it is a pod, it also gets an IP Address… but wait a minute… this is the same IP Address as the node! Exactly! This is a special case for pods IP Address assignation, usually for system pods that need direct access to the node (host) network. If you look at the pods in the kube-system namespace, you’ll see most of them uses the node IP Address.
  • The Cilium Operator pod is responsible for IP address management in the cluster and so it gives to each Cilium Agent its range to use.

Now you want to see which IP range is used by each node right? Let’s just check that Cilium Agent on each node as we have found their name above:

$ kubectl exec -it -n kube-system cilium-dprvh -- cilium debuginfo | grep IPAM
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
IPAM:                   IPv4: 5/254 allocated from 10.10.2.0/24,

$ kubectl exec -it -n kube-system cilium-czffc -- cilium debuginfo | grep IPAM
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
IPAM:                   IPv4: 5/254 allocated from 10.10.1.0/24,

You can now see the different IP subnet on each node. In my previous blog I told you that an IP Address belong to a group and it uses the subnet mask. This subnet mask is here /24 which means for the first node that any address starting with 10.10.2 belongs to the same group. For the second node it is 10.10.1 and so they are both in a separate group or IP subnet.

What now about checking the interfaces that are the doors of our drawing?

Diving into the interfaces configuration

Let’s explore our buildings and see what we could find out! We are going to start with our four pods:

$ kubectl get po -n networking101 -owide
NAME                        READY   STATUS    RESTARTS       AGE    IP            NODE                NOMINATED NODE   READINESS GATES
busybox-c8bbbbb84-fmhwc     1/1     Running   1 (125m ago)   4d1h   10.10.1.164   mycluster-worker2   <none>           <none>
busybox-c8bbbbb84-t6ggh     1/1     Running   1 (125m ago)   4d1h   10.10.2.117   mycluster-worker    <none>           <none>
netshoot-7d996d7884-fwt8z   1/1     Running   0              103m   10.10.2.121   mycluster-worker    <none>           <none>
netshoot-7d996d7884-gcxrm   1/1     Running   0              103m   10.10.1.155   mycluster-worker2   <none>           <none>

$ kubectl exec -it -n networking101 busybox-c8bbbbb84-t6ggh -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
8: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue qlen 1000
    link/ether 9e:80:70:0d:d9:37 brd ff:ff:ff:ff:ff:ff
    inet 10.10.2.117/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::9c80:70ff:fe0d:d937/64 scope link
       valid_lft forever preferred_lft forever

$ kubectl exec -it -n networking101 netshoot-7d996d7884-fwt8z -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ea:c4:71:d6:4f:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.10.2.121/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::e8c4:71ff:fed6:4fa0/64 scope link
       valid_lft forever preferred_lft forever

$ kubectl exec -it -n networking101 netshoot-7d996d7884-gcxrm -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether be:57:3d:54:40:f1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.10.1.155/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::bc57:3dff:fe54:40f1/64 scope link
       valid_lft forever preferred_lft forever

$ kubectl exec -it -n networking101 busybox-c8bbbbb84-fmhwc -- ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
10: eth0@if11: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue qlen 1000
    link/ether 2a:7f:05:a0:69:db brd ff:ff:ff:ff:ff:ff
    inet 10.10.1.164/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::287f:5ff:fea0:69db/64 scope link
       valid_lft forever preferred_lft forever

You can see that each container has only one network interface in addition to its local loopback. The format is for example 8: eth0@if9 which means the interface in the container has the number 9 and is linked to its pair interface number 8 of the node it is hosted on. These are the 2 doors connected by a corridor in my drawing.

Then check the nodes network interfaces:

$ sudo docker exec -it mycluster-worker ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:84:64:22:90:7f brd ff:ff:ff:ff:ff:ff
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ca:7e:e1:cc:4e:74 brd ff:ff:ff:ff:ff:ff
    inet 10.10.2.205/32 scope global cilium_host
       valid_lft forever preferred_lft forever
4: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether f6:bf:81:9b:e2:c5 brd ff:ff:ff:ff:ff:ff
5: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::2/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:2/64 scope link
       valid_lft forever preferred_lft forever
7: lxc_health@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8a:de:c1:2c:f5:83 brd ff:ff:ff:ff:ff:ff link-netnsid 1
9: lxc4a891387ff1a@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d6:21:74:eb:67:6b brd ff:ff:ff:ff:ff:ff link-netns cni-67a5da05-a221-ade5-08dc-64808339ad05
11: lxc5b7b34955e61@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f2:80:da:a5:17:74 brd ff:ff:ff:ff:ff:ff link-netns cni-0b438679-e5d3-d429-85c0-b6e3c8914250
13: lxc73d2e1d7cf4f@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f6:87:b6:c3:a6:45 brd ff:ff:ff:ff:ff:ff link-netns cni-f608f13c-1869-6134-3d6b-a0f76fd6d483

$ sudo docker exec -it mycluster-worker2 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f2:91:2b:31:1f:47 brd ff:ff:ff:ff:ff:ff
3: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether be:7f:e0:2b:d6:b1 brd ff:ff:ff:ff:ff:ff
    inet 10.10.1.55/32 scope global cilium_host
       valid_lft forever preferred_lft forever
4: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether e6:c8:8d:5d:1e:2d brd ff:ff:ff:ff:ff:ff
6: lxc_health@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether d2:cf:ec:4c:51:b6 brd ff:ff:ff:ff:ff:ff link-netnsid 1
8: lxcdc5fb9751595@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fe:b4:3a:e0:67:a3 brd ff:ff:ff:ff:ff:ff link-netns cni-c0d4bea2-92fd-03fb-ba61-3656864d8bd7
9: eth0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:42:ac:12:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.4/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::4/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:4/64 scope link
       valid_lft forever preferred_lft forever
11: lxc174c023046ff@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:7a:b9:6b:b3:c1 brd ff:ff:ff:ff:ff:ff link-netns cni-4172177b-df75-61a8-884c-f9d556165df2
13: lxce84a702bb02c@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 92:65:df:09:dd:28 brd ff:ff:ff:ff:ff:ff link-netns cni-d259ef79-a81c-eba6-1255-6e46b8d1c779

On each node there are several interfaces to notice. I’ll take the first node for example:

  • eth0@if6: As our Kubernetes cluster is created with Kind, a node is actually a container (and this interface open a corridor to its pair interface on my laptop). If it feels like the movie Inception, well, it is a perfectly correct comparison! This interface is the main door of the building.
  • lxc4a891387ff1a@if8: This is the pair interface number 8 that is linked to the left container above.
  • lxc73d2e1d7cf4f@if12: This is the pair interface number 12 that is linked to the right container above.
  • cilium_host@cilium_net: This is the circle interface in my drawing that allows the routing to/from other nodes in our cluster.
  • cilium_vxlan: This is the rectangle in my drawing and is the tunnel interface that will transport you to/from the other nodes in our cluster.

Let’s now get the complete picture by updating our drawing with these information:

Wrap up

With this foundation knowledge, you now have all the key elements to understand the communication between pods on the same node or on different nodes. This is what we will look at in my next blog post. Stay tuned!

L’article Kubernetes Networking by Using Cilium – Intermediate Level – Part-1 est apparu en premier sur dbi Blog.

Oracle ZDM Migration – java.security.InvalidKeyException: invalid key format

Sun, 2024-02-18 11:04

ZDM tool migration requires SSH Passwordless Login without passphrase between ZDM Host, the source and the target. Configuring appropriate keys might still result in a java security exception on this one. In this blog I will tell you how to deal with such a problem. I faced this problem implementing ZDM to migrate On-Premise Database to new ExaCC at one of our customer.

Read more: Oracle ZDM Migration – java.security.InvalidKeyException: invalid key format Setting up SSH Passwordless Login

First of all we need to create the private and public key on the ZDM Host.

From the ZDM host, with zdmuser, go in the ~/.ssh folder and run ssh-keygen.

[zdmuser@zdmhost .ssh]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/zdmuser/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/zdmuser/.ssh/id_rsa.
Your public key has been saved in /home/zdmuser/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:8uTp************************ziw zdmuser@zdmhost
The key's randomart image is:
+---[RSA 3072]----+
|   oo+==.        |
...
...
...
|    o.+..        |
+----[SHA256]-----+

This will create 2 keys, one private (id_rsa) and one public (id_rsa_pub).

Get the content of the public key.

[zdmuser@zdmhost .ssh]$ cat id_rsa.pub
ssh-rsa AAAA************************vaU= zdmuser@zdmhost

And add the content of the public RSA key to the authorized_keys file from both ExaCC Cluster VMs (target ExaCC-cl01n1 and ExaCC-cl01n2) opc user and the on-premises VM (source vmonpr) oracle user.

[opc@ExaCC-cl01n1 .ssh]$ echo "ssh-rsa AAAA************************vaU= zdmuser@zdmhost" >> authorized_keys

[opc@ExaCC-cl01n2 .ssh]$ echo "ssh-rsa AAAA************************vaU= zdmuser@zdmhost" >> authorized_keys

oracle@vmonpr:/home/oracle/.ssh/ [ONPR] echo "ssh-rsa AAAA************************vaU= zdmuser@zdmhost" >> authorized_keys

We will then test SSH connection to the 3 VMs and ensure no password are requested. Example:

[zdmuser@zdmhost migration]$ ssh opc@ExaCC-cl01n1
Last login: Fri Feb  2 16:58:04 2024 from 10.160.52.122
[opc@ExaCC-cl01n1 ~]$

Check ZDM migration

Checking ZDM migration with zdmcli and -eval option might get failed:

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_physical_online.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -tdekeystorepasswd -tgttdekeystorepasswd -eval
zdmhost.domain.com: Audit ID: 50
Enter source database ONPR SYS password:
zdmhost: 2024-02-02T16:30:19.487Z : Processing response file ...
Operation "zdmcli migrate database" scheduled with the job ID "11".

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli query job -jobid 11
zdmhost.domain.com: Audit ID: 52
Job ID: 11
User: zdmuser
Client: zdmhost
Job Type: "EVAL"
Scheduled job command: "zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_physical_online.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -tdekeystorepasswd -tgttdekeystorepasswd -eval"
Scheduled job execution start time: 2024-02-02T17:30:19+01. Equivalent local time: 2024-02-02 17:30:19
Current status: FAILED
Result file path: "/u01/app/oracle/chkbase/scheduled/job-11-2024-02-02-17:30:48.log"
Metrics file path: "/u01/app/oracle/chkbase/scheduled/job-11-2024-02-02-17:30:48.json"
Job execution start time: 2024-02-02 17:30:48
Job execution end time: 2024-02-02 17:30:48
Job execution elapsed time: 0 seconds

Result file "/u01/app/oracle/chkbase/scheduled/job-11-2024-02-02-17:30:48.log" contents:
zdmhost: 2024-02-02T16:30:48.591Z : Processing response file ...
zdmhost: 2024-02-02T16:30:48.595Z : Processing response file ...
PRCZ-4002 : failed to execute command "/bin/cp" using the privileged execution plugin "zdmauth" on nodes "ExaCC-cl01n1"
java.security.InvalidKeyException: invalid key format

Error of failed execution is :

java.security.InvalidKeyException: invalid key format
Solution

The problem is due to the fact that ZDM only supports RSA key and the generated key was an OPENSSH key.

Checking current key, we can see that the key is an openssh key:

[zdmuser@zdmhost .ssh]$ head -n1 id_rsa
-----BEGIN OPENSSH PRIVATE KEY-----

[zdmuser@zdmhost .ssh]$ tail -n 1 id_rsa
-----END OPENSSH PRIVATE KEY-----

We need to convert the private key to PEM format.

[zdmuser@zdmhost .ssh]$ ssh-keygen -p -m PEM -f ~/.ssh/id_rsa
Key has comment 'zdmuser@zdmhost'
Enter new passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved with the new passphrase.

The new key looks now like.

[zdmuser@zdmhost .ssh]$ head -n1 id_rsa
-----BEGIN RSA PRIVATE KEY-----

[zdmuser@zdmhost .ssh]$ tail -n 1 id_rsa
-----END RSA PRIVATE KEY-----

And, now, zdmcli eval command is succeeding.

[zdmuser@zdmhost migration]$ /u01/app/oracle/product/zdm/bin/zdmcli query job -jobid 39
zdmhost.domain.com: Audit ID: 434
Job ID: 39
User: zdmuser
Client: zdmhost
Job Type: "EVAL"
Scheduled job command: "zdmcli migrate database -sourcesid ONPR -rsp /home/zdmuser/migration/zdm_ONPR_physical_online.rsp -sourcenode vmonpr -srcauth zdmauth -srcarg1 user:oracle -srcarg2 identity_file:/home/zdmuser/.ssh/id_rsa -srcarg3 sudo_location:/usr/bin/sudo -targetnode ExaCC-cl01n1 -tgtauth zdmauth -tgtarg1 user:opc -tgtarg2 identity_file:/home/zdmuser/.ssh/id_rsa -tgtarg3 sudo_location:/usr/bin/sudo -tdekeystorepasswd -tgttdekeystorepasswd -eval"
Scheduled job execution start time: 2024-02-14T14:18:19+01. Equivalent local time: 2024-02-14 14:18:19
Current status: SUCCEEDED
Result file path: "/u01/app/oracle/chkbase/scheduled/job-39-2024-02-14-14:18:29.log"
Metrics file path: "/u01/app/oracle/chkbase/scheduled/job-39-2024-02-14-14:18:29.json"
Job execution start time: 2024-02-14 14:18:29
Job execution end time: 2024-02-14 14:21:18
Job execution elapsed time: 2 minutes 48 seconds
ZDM_GET_SRC_INFO ........... PRECHECK_PASSED
ZDM_GET_TGT_INFO ........... PRECHECK_PASSED
ZDM_PRECHECKS_SRC .......... PRECHECK_PASSED
ZDM_PRECHECKS_TGT .......... PRECHECK_PASSED
ZDM_SETUP_SRC .............. PRECHECK_PASSED
ZDM_SETUP_TGT .............. PRECHECK_PASSED
ZDM_PREUSERACTIONS ......... PRECHECK_PASSED
ZDM_PREUSERACTIONS_TGT ..... PRECHECK_PASSED
ZDM_VALIDATE_SRC ........... PRECHECK_PASSED
ZDM_VALIDATE_TGT ........... PRECHECK_PASSED
ZDM_POSTUSERACTIONS ........ PRECHECK_PASSED
ZDM_POSTUSERACTIONS_TGT .... PRECHECK_PASSED
ZDM_CLEANUP_SRC ............ PRECHECK_PASSED
ZDM_CLEANUP_TGT ............ PRECHECK_PASSED

L’article Oracle ZDM Migration – java.security.InvalidKeyException: invalid key format est apparu en premier sur dbi Blog.

Alfresco – A never ending transformation

Fri, 2024-02-16 10:08

Beginning of the week, as I was working for our ServiceDesk (SLA support for our customers), I saw a few dozen mails generated by our monitoring over the weekend on a Production Alfresco 7.x Cluster doing the yo-yo in terms of RAM and Disk Space. Nothing was down, just some strange behavior where 20GB of free space would be gone and then re-appear after a few minutes and same thing for the RAM/SWAP.

The first thing I checked was the disk space mentioned on the alert. We received alerts from all members of the cluster one by one, almost in a perfect round-robin manner. On the second node, I saw the issue occurring in real-time, so I looked into what exactly was generating all the noise:

alfresco@alf-p2:~# date; df -h /tmp
Mon Feb 12 07:27:41 UTC 2024
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb2        19G    7G   12G  35% /tmp
alfresco@alf-p2:~#
alfresco@alf-p2:~# date; df -h /tmp
Mon Feb 12 07:28:20 UTC 2024
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb2        19G    9G    9G  49% /tmp
alfresco@alf-p2:~#
alfresco@alf-p2:~# du -sm /tmp/
9427    /tmp/
alfresco@alf-p2:~#
alfresco@alf-p2:~# du -sm /tmp/
9484    /tmp/
alfresco@alf-p2:~#
alfresco@alf-p2:~# du -sm /tmp/
9541    /tmp/
alfresco@alf-p2:~#

In less than a minute, around 2/3Gb of temporary files were generated, which doesn’t seem very healthy:

alfresco@alf-p2:~# cd /tmp
alfresco@alf-p2:/tmp#
alfresco@alf-p2:/tmp# ls -ltr
total 480
...
-rw-r-----   1 alfresco  alfresco    115 Feb 11 21:26 scheduler.json
drwxr-x---   2 alfresco  alfresco   4096 Feb 12 07:28 Alfresco/
drwxrwxrwt 117 root      root      12288 Feb 12 07:28 ./
alfresco@alf-p2:/tmp#
alfresco@alf-p2:/tmp# cd Alfresco/
alfresco@alf-p2:/tmp/Alfresco# ls -l
total 10553428
drwxr-x---   2 alfresco alfresco        4096 Feb 12 07:29 ./
drwxrwxrwt 117 root     root           12288 Feb 12 07:29 ../
-rw-r-----   1 alfresco alfresco     1897650 Feb 12 07:23 source_11877384286747332767_tmp.pdf
-rw-r-----   1 alfresco alfresco 10804789248 Feb 12 07:29 target_18121744399232974935_tmp.txt
alfresco@alf-p2:/tmp/Alfresco#
alfresco@alf-p2:/tmp/Alfresco#
alfresco@alf-p2:/tmp/Alfresco# ls -l
total 10686460
drwxr-x---   2 alfresco alfresco        4096 Feb 12 07:29 ./
drwxrwxrwt 117 root     root           12288 Feb 12 07:29 ../
-rw-r-----   1 alfresco alfresco     1897650 Feb 12 07:23 source_11877384286747332767_tmp.pdf
-rw-r-----   1 alfresco alfresco 10941014016 Feb 12 07:29 target_18121744399232974935_tmp.txt
alfresco@alf-p2:/tmp/Alfresco#

At that point in time, it looked like Alfresco was doing something that was causing the issue for the Disk Space, at least. Here, we can see a PDF file that is a “source” and a TXT file that appears to be under generation, as a “target”. So of course, my first thought here is that this is probably the Alfresco Transformation Service that is causing this issue, trying to transform a PDF into TXT, most probably for indexing of the content of this file.

While looking at the RAM/SWAP usage on this server, it was also showing the same thing, with the Java process of the ATS using 100% CPU (fortunately, the host has multiple CPUs) and going overboard with its RAM, forcing the host to SWAP.

Therefore, I looked at the ATS logs and saw 2 types of errors. First was a few IOException on PDFBox “Error: End-Of-File: expected line” but there wasn’t a lot of those… Then there was another error, much more present, that was the consequence of the FileSystem being full:

alfresco@alf-p2:~# cat $ATS_HOME/logs/transform-core-aio.log
...
2024-02-12 07:18:37.380 ERROR 23713 --- [o-8090-exec-141] o.a.transformer.TransformController      : Error writing: Seite 1

org.alfresco.transform.exceptions.TransformException: Error writing: Seite 1
        at org.alfresco.transformer.executors.Transformer.transform(Transformer.java:83) ~[alfresco-transformer-base-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.AIOController.transformImpl(AIOController.java:118) ~[classes!/:2.5.3]
        at org.alfresco.transformer.AbstractTransformerController.transform(AbstractTransformerController.java:173) ~[alfresco-transformer-base-2.5.3.jar!/:2.5.3]
        at jdk.internal.reflect.GeneratedMethodAccessor75.invoke(Unknown Source) ~[na:na]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
        ...
Caused by: java.lang.IllegalStateException: Error writing: Seite 1
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:697) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:673) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:617) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.TikaJavaExecutor.call(TikaJavaExecutor.java:141) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.TikaJavaExecutor.transform(TikaJavaExecutor.java:131) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        at org.alfresco.transformer.executors.Transformer.transform(Transformer.java:70) ~[alfresco-transformer-base-2.5.3.jar!/:2.5.3]
        ... 55 common frames omitted
Caused by: org.xml.sax.SAXException: Error writing: Seite 1
        at org.apache.tika.sax.ToTextContentHandler.characters(ToTextContentHandler.java:110) ~[tika-core-1.26.jar!/:1.26]
        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) ~[tika-core-1.26.jar!/:1.26]
        at org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:136) ~[tika-core-1.26.jar!/:1.26]
        at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) ~[tika-core-1.26.jar!/:1.26]
        ...
        at org.alfresco.transformer.executors.Tika.transform(Tika.java:693) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
        ... 60 common frames omitted
        Suppressed: java.io.IOException: No space left on device
                at java.base/java.io.FileOutputStream.writeBytes(Native Method) ~[na:na]
                at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) ~[na:na]
                at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233) ~[na:na]
                at java.base/sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:337) ~[na:na]
                at java.base/sun.nio.cs.StreamEncoder.close(StreamEncoder.java:161) ~[na:na]
                at java.base/java.io.OutputStreamWriter.close(OutputStreamWriter.java:255) ~[na:na]
                at java.base/java.io.BufferedWriter.close(BufferedWriter.java:269) ~[na:na]
                at org.alfresco.transformer.executors.Tika.transform(Tika.java:684) ~[alfresco-transform-tika-2.5.3.jar!/:2.5.3]
                ... 60 common frames omitted
Caused by: java.io.IOException: No space left on device
        at java.base/java.io.FileOutputStream.writeBytes(Native Method) ~[na:na]
        at java.base/java.io.FileOutputStream.write(FileOutputStream.java:354) ~[na:na]
        at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233) ~[na:na]
        at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:303) ~[na:na]
...
alfresco@alf-p2:~#

As you can see above, at 07:18, the FileSystem /tmp was 100% full and when I checked 5 minutes later, at 07:23, a new transformation was already producing a 10Gb text file and still growing. So, it was clear that this happens repeatedly, most probably for the same document. According to the monitoring, the issue started just before the weekend. Looking at the first occurrences of the FileSystem full from the ATS logs gave the following:

alfresco@alf-p2:~# grep '2024.*Error writing' $ATS_HOME/logs/transform-core-aio.log
2024-02-09 19:20:51.628 ERROR 23713 --- [o-8090-exec-166] o.a.transformer.TransformController      : Error writing:
2024-02-09 19:41:29.954 ERROR 23713 --- [o-8090-exec-156] o.a.transformer.TransformController      : Error writing: Seite 1
2024-02-09 20:02:11.764 ERROR 23713 --- [o-8090-exec-160] o.a.transformer.TransformController      : Error writing: Seite 1
2024-02-09 20:23:08.828 ERROR 23713 --- [o-8090-exec-163] o.a.transformer.TransformController      : Error writing:
2024-02-09 20:44:05.313 ERROR 23713 --- [o-8090-exec-141] o.a.transformer.TransformController      : Error writing: Seite 1
2024-02-09 21:04:52.642 ERROR 23713 --- [o-8090-exec-162] o.a.transformer.TransformController      : Error writing: Seite 1
...
2024-02-12 07:18:37.380 ERROR 23713 --- [o-8090-exec-152] o.a.transformer.TransformController      : Error writing: Seite 1
alfresco@alf-p2:~#

With the above, it pretty much confirms that it’s the same document that is always failing, since it’s blocking on “Seite 1“, which means “Page 1” in English.

To be able to find which document is causing the issue in Alfresco, there isn’t a lot of details available, since the ATS isn’t really giving you much about what it is doing. All I had was a temporary name (which obviously doesn’t trace back to anything in the Repository) and a size. Therefore, I checked for documents on the Alfresco Data (“alf_data“) with a size equal to the document “/tmp/Alfresco/source_11877384286747332767_tmp.pdf” (i.e. 1897650 bytes), created in the last few days. I expected it to be created on the 9-Feb, a little before 19:20 and I indeed found one:

alfresco@alf-p2:~# find /alf_data/contentstore/2024/2/ -type f -ls | grep 1897650
 34508512  1856 -rw-r----- 1 alfresco alfresco 1897650 Feb 9 19:02 /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
alfresco@alf-p2:~#
alfresco@alf-p2:~# md5sum /tmp/Alfresco/source_11877384286747332767_tmp.pdf /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
45ed40bd5f84b7c68e246885f2b6a55f  /tmp/Alfresco/source_11877384286747332767_tmp.pdf
45ed40bd5f84b7c68e246885f2b6a55f  /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
alfresco@alf-p2:~#
alfresco@alf-p2:~# diff /tmp/Alfresco/source_11877384286747332767_tmp.pdf /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
alfresco@alf-p2:~#

Therefore, this is the same content file. There is of course the possibility that a duplicate node was using the same content before February (as I searched only inside /2024/2, that means February), but since the issue appeared only over the weekend, it’s pretty safe to assume it’s this document/node.

alfresco@alf-p2:~# stat /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
  File: /alf_data/contentstore/2024/2/9/19/02/174f569e-93a3-4829-8ad5-bd3d6e78447b.bin
  Size: 1897650         Blocks: 3712       IO Block: 262144 regular file
Device: 34h/52d Inode: 34508512    Links: 1
Access: (0640/-rw-r-----)  Uid: (  113/alfresco)   Gid: (  116/alfresco)
Access: 2024-02-09 19:02:12.153002964 +0000
Modify: 2024-02-09 19:02:12.157983495 +0000
Change: 2024-02-09 19:02:12.157983635 +0000
 Birth: -
alfresco@alf-p2:~#

From that point, I had the “content_url” of a Node. Therefore, I could have used the Database (see useful database queries) to find the NodeRef of this Alfresco Node but at this customer, I don’t have an easy access to the DB, so I went through Share instead.

I know the node was created (or modified) at 19:02:12 (+/- 1s) on the 9-Feb, and even if the content isn’t indexed, its metadata should still be available searchable. Therefore, I just performed a search on Alfresco Share, to find documents created (or modified) at that exact time, i.e. cm:created:’2024-02-09T19:02:12′.

That gave me 4 results, out of which only 1 had a size around 2MB. To validate if this was indeed the document causing the issue, I simply used the JavaScript Console to dump this file and it gave me the exact same “content_url“. I could also validate on Share that this specific file wasn’t content-indexed yet (despite being in the repository for 2.5 days).

As a temporary workaround, to stop the OS from going crazy, I set this document as metadata-indexed only (no content), using the “Index Control” aspect. If you don’t know how this works, it’s pretty simple for a node:

  • Click on “Manage Aspect”
  • From the list of “Available to Add”, find “Index Control (cm:indexControl)”
  • Click on “+” to add it to the list of “Currently Selected”
  • Click on “Apply changes”
  • Click on “Edit Properties”
  • Uncheck the “Is Content Indexed” option

After doing that, you should be able to see something like that on the node’s properties:

Alfresco Index Control

In case a transformation for this document is already in progress, you will need to wait for the FileSystem to be full for the ATS (java) to remove its temporary file and realize that this document doesn’t need to be transformed anymore. You can probably also restart the process, if you prefer.

That’s only a workaround of course, not a real solution. Therefore, even if I knew that the issue was most probably around “Seite 1“, I replicated the issue on TEST by uploading this same file into the TEST environment and then looked inside the TXT content, to validate that assumption:

alfresco@alf-t1:/tmp/Alfresco# ls -l
total 23960
drwxr-x---  2 alfresco alfresco      4096 Feb 12 09:10 ./
drwxrwxrwt 25 root     root         36864 Feb 12 09:10 ../
-rw-r-----  1 alfresco alfresco   1897650 Feb 12 09:10 source_2995534351432950419_tmp.pdf
-rw-r-----  1 alfresco alfresco  22593536 Feb 12 09:10 target_7429882841367188802_tmp.txt
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# wc -l target_7429882841367188802_tmp.txt
2509490 target_7429882841367188802_tmp.txt
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# grep -v "^[[:space:]]*Seite 1$" target_7429882841367188802_tmp.txt | wc -l
1913
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# sleep 30
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# wc -l target_7429882841367188802_tmp.txt
83418233 target_7429882841367188802_tmp.txt
alfresco@alf-t1:/tmp/Alfresco#
alfresco@alf-t1:/tmp/Alfresco# grep -v "^[[:space:]]*Seite 1$" target_7429882841367188802_tmp.txt | wc -l
1913
alfresco@alf-t1:/tmp/Alfresco#

As shown above, there are 1913 lines of some texts and then the rest of the millions of lines are exactly “ Seite 1“. This text is actually coming from the page 34 of the PDF (it’s a merge of multiple PDFs it seems). By removing the page 34 from the document, it can be indexed properly. In the end, the “quick” solution for this customer is to fix the PDF (e.g. transform the page 34 into an image, then back into a PDF and OCRize it so it is indexed and searchable).

L’article Alfresco – A never ending transformation est apparu en premier sur dbi Blog.

An AWS journey in Geneva

Fri, 2024-02-16 09:18

This week I could go to the AWS re:Invent re:Cap in Geneva. This event, organized by AWS, gives a resume of the biggest announcements made in the famous re:Invent in Las Vegas.

The topics addressed during this event covered various cloud technologies where AWS is investing resources to develop. We had a recap of the news regarding Generative AI, Data and Analytics, App Modernization and Developer Experience or event Management and Governance.

Generative AI

In the generative AI part, we were mainly introduced to Amazon Q, a chat bot made in AWS. As of the features we saw that Q can integrate with more than 40 data sources. It also uses multiple sources for user management like AWS SSO or Entra ID. The main purpose is to answer user queries based on their rights to access documents, so the answers are filtered so users only what they are allowed to.

Database and Analytics

For the Database and Analytics topic we had an introduction to zero ETL. This topic shows AWS ambitions to reduce the need of pipelines. They want to create trust relationships between databases and redshift to get rid of the pipelines.

Another point presented in this topic was translator to generate SQL query based on the human language. This feature is used by Redshift and will enable people to create SQL queries based on natural phrases. It also uses Amazon Q as a base.

Applications and DevExp

During this session, we mainly learned about CodeCatalyst, which is a unified software development service. It allows us to do a full project management in one tool, with issues as Gitlab/Github. A big point here was the role of Amazon Q. We can directly assign issues to this IA tool and it will do the job to resolve it. Then we just have to commit the code, merge it and that’s it.

We also got an introduction to CodeWhisper, a tool that helps developer to create their code. This service is now compatible with IaC languages, like Terraform and it can be integrated directly into some IDEs like Visual Studio.

Management and Governance

Last but not least, we had an introduction to the newest features in terms of management and governance. In terms of security, we learned that AWS inspector can now be used with CI/CD plugins or that GuardDuty can now look at runtime events in containers.

We also had a talk about the frugal architect way int eh cloud, which are pillars created by Dr Vogel to help people have more sustainable and more performant cloud services by optimizing the costs. We were also introduce to the newest monitoring features for example a new Application panel that shows us all the costs for a specific application deployed in AWS.

Most of the features we saw were only on preview or available in some specific regions (mostly US regions). No doubt that all those features will soon be available globally.

L’article An AWS journey in Geneva est apparu en premier sur dbi Blog.

Attach database error (SMB Fileshare)

Fri, 2024-02-16 05:48

Starting with SQL Server 2012, SQL Server supports databases using the Server Message Block (SMB) file server as a storage option. For performance and reliability reasons, SMB 3.0 or later is preferred to use.
For architectures like Always On Failover Cluster Instance (FCI), it can be a relevant alternative to cluster disks. To avoid a single point of failure (SPOF) scenario, the file shares should be high available.

Issue description

In a production context, we set up a FCI infrastructure, in Multi-AZ, on AWS using Amazon FSx as file share target – also in multi-AZ deployment to avoid SPOF scenario.
Everything run smoothly until a Detach / Attach approach was used at some point. SQL Server triggered the following error message 5120 at the Attach phase:

Msg 5120, Level 16, State 101, Line 12
Unable to open the physical file "\fsx-server\SharePath\Database.mdf". Operating system error 5: "5(Access is denied.)".
Msg 1802, Level 16, State 7, Line 12
CREATE DATABASE failed. Some file names listed could not be created. Check related errors.

Even if SQL Server service account has Full Control on the files, it cannot attach back after the detach process.
In some documentation, you can find the recommendation to enable Trace Flag 1802, which will disable ACLs change and impersonated access verification during attach & detach operations. But another error (5123) appears anyway:

Msg 5123, Level 16, State 20, Line 11
CREATE FILE encountered operating system error 5(Access is denied.) while attempting to open or create the physical file '\fsx-server\SharePath\Database.mdf'.
Detach Process

To understand the root-cause of this error, let’s dig into the Microsoft documentation regarding detaching process.
When detaching the database files, here are the following permissions applying on the files:

  • Without impersonation: the SQL Server service account and members of local Windows Administrator group (i.e. FSx related) are granted FULL control on the database files
  • With impersonation: only the account performing the operation is granted FULL control on the database files

Initially without Trace Flag 1802, we were in the first situation “without impersonation” after detaching the database:

  • ACLs inheritance is disabled on database files
  • SQL Server service account has FULL Control on the database files
  • Local Windows Administrator group of the File Server has FULL control on database files
Attach Process

Now, let’s now dig into the Attach process still from the Microsoft documentation.
When attaching database files to create the database in SQL Server, here are the following permissions which are applying to the files:

  • Without impersonation: only the SQL Server service account is added with FULL Control permissions
  • With impersonation: the SQL Server service account and members of the local Windows Administrator group are granted FULL Control permissions

The specificity of Attach process, compared to the creation of a database from scratch, it tries to apply NTFS changes on existing files. We will see this slightly difference is not insignificant.

SMB Security Requirements

Here are the requirements in the context the database files are located on an SMB file share for the SQL Server Service (domain) account:

  • FULL Control share permissions
  • FULL Control NTFS permissions on the SMB share folders
  • SeSecurityPriviledge (Local Security Policy on the File Server) is required to change get/set ACLs. This is true even if the account has FULL Control NTFS permissions on the database file
Issue root-cause

Attach process changes the current permissions of database files before creating the database in SQL Server. To do that, it requires to get/set ACLs on the files: so SeSecurityPriviledge is required at the File Server side in order to avoid error 5120 & 5123.
If you are managing the File server, this is a change you can implement and fix this issue. But in a context where SMB share is a Managed service, we were not able to fix it.

Limitations on Attach process is not blocking because it exists other approaches – such as Backup & Restore – but it can be disturbing when you are facing this issue without understanding the root-cause.
Similar to PaaS environments, Attach & Detach is not supported as soon as you have a managed service on the Operating System hosting the database files.

L’article Attach database error (SMB Fileshare) est apparu en premier sur dbi Blog.

Kubernetes Networking by Using Cilium – Beginner Level

Wed, 2024-02-14 03:37

There seems to be a strong interest these days about understanding better networking in Kubernetes. This blog post is my contribution on this topic. I’ll do my best to explain it in a visual way and translate the techie part to an understandable language so that anybody can actually enjoy it.

The best way to learn networking is by doing what is called “Follow the packet” or “The life of a packet”. Basically, you follow the packet from the sender to the receiver by stopping at each step of this journey. I did that a while ago with a communication from a Pod to another Pod by using Calico. This time I’ll use another Container Network Interface (CNI) called Cilium which is based on eBPF (understand fast and flexible routing) and comes with a lot of powerful features and tools. Let’s hit that packet road!

Traditional Networking without Kubernetes

We are going to start from the beginning. I’ll consider you know nothing about networking. Maybe you already know what an IP Address is? The IP Address is the numerical Address of a network interface in a computer. This is how your computer can connect for example to your Wi-Fi network and give you access to the Internet. If you are using a laptop, your Wi-Fi network interface has one IP Address. This network interface also has another address that is unique and burnt on it by the hardware provider. This address is called the Medium Access Control (MAC) Address.

The IP Address belongs to a group (the IP Subnet). To know to which group it belongs to, it uses something called the Subnet Mask. This mask when applied to the IP Address gives a result and this result is identical for every IP Address that belong to the same group. This is like your community.

Let’s use the drawing below to make analogies:

The house is a computer, a server or a virtual machine. It could be of different sizes according to its CPU and memory but let use the same one for simplicity sake. A house has a door that is your network interface. The serial number of that door is your MAC Address and the number on your house (which is usually nailed on the door) is your IP Address. It is only if you change your door that your serial number will change. However, your house number has been assigned to you by the architect of your community and could change if there is a reassignment or a change in the design.

The community 10th (using numbers from 10 to 19) in blue belong to the same group (same IP subnet) and the community 20th in green is another group. In each group there are five houses so there is space for the community to grow. In each community the door has a direct link to a fountain which represents a Switch. At the fountain, there is a sign for each path to indicate which door you can reach. Yes, the fountain doesn’t know the house number but only the serial number of the door. For humans it is not very practical so we use a map (called the ARP table) that gives the translation between the house number and the serial number of its door.

If you live in house 14 and want to visit house 15, you’ll use the path (there is only one and it is yours so no traffic jams!) to the fountain first and then look at the signs. You know from your map which serial number correspond to which house and so you use the path to house 15. In this star topology you always go to the fountain first and not directly to the house you want to visit because there is no direct path. A path inside your community represents a Layer 2 link. You can’t reach another community by using these paths.

Traveling between communities

What now if from your house 14, you want to pay a visit to house 24? This is another community which means the couple IP Address / Subnet mask for 14 doesn’t produce the same result as for 24. Indeed the 10th and the 20th communities are different. So you know the destination is another community and in that case you have to go first to your gatekeeper (but always through the fountain as we have seen). He is the default gateway of your community and he lives in house 11. The rule is to go to him for any destinations that is outside of your community.

Only he has a map (the Routing Table) to reach community 20th and know which road to take (This is called Layer 3 routing because you are traveling outside of your community). This map shows that to reach 24 in the 20th community you have to use another door. Wait a minute, if a door is a network interface, does it mean that the gatekeeper house has another door? Absolutely correct! The house 11 has another door with another number on it (101) and of course this door has another serial number (MAC Address).

By exiting this door you can now take the path to reach community 20th which has its own gatekeeper in house 21. The map (Routing Table) of this gatekeeper directs you to the proper door to reach your destination. This door gives you access to community 20th as your destination 24 belongs to it. The gatekeeper also give you the map (ARP table) so you can orientate yourself at the fountain. You can now walk on the path to the green fountain. From there, you just have to follow the sign and the path to house 24. When you want to get back home, you travel back using the same path in the opposite direction.

Networking in Kubernetes

Now you know the basics about networking, let’s see how it works in comparison in Kubernetes. Yes it is slightly more complicated but let’s go step by step and use the picture below to understand it better:

Instead of houses, we now have buildings. The networking between the buildings is still the same as the traditional one with a Switch/fountain in the middle. The entrance of the building has a door with the building number on it (its IP Address) that is part of a 1000th community. One building will represent one node of our Kubernetes cluster.

You know Kubernetes is a container orchestrator. A container is wrapped into a pod. For simplicity sake, let’s consider a pod has only one container and so both terms are here equivalent. This pod will be one appartement in our building and so is like a private part of it. The apartment could be of different size in the building as it could have 2, 3 or 4 bedrooms for example. This will be the CPU and memory capacities your container will need on the node. Some appartement are empty so the building still has some capacity. However in Kubernetes, pods are created and deleted as needed. So in our building that would mean sometimes a 2 bedrooms appartement is created and when not used anymore, it could be removed from the building. Then a 5 bedrooms appartement could be created next if the building has enough space for it. Imagine then that it is a LEGO building and inside it you can build and demolish apartments of various size according to your need! Isn’t it awesome?

In each building, the containers/pods have their own community (IP subnet). The function of a CNI in Kubernetes is basically to assign numbers (IP Addresses) to pods so they can communicate together. By default Cilium uses different community for each building. When an apartment is created, Cilium assigns a number to it. When an apartment is deleted an recreated, it will get another number so it is ephemeral. Here the blue community uses the 10th range and the green one uses the 20th. You can notice that the number ranges of the blue and green communities are different than the range of the buildings. Just for you to know, this design is called Overlay network, there are other ones possible but that is a common one. There is then a network of pods on top of the network of the nodes.

Traveling between apartments in the same building

Now, you live in the apartment 12, how are you going to pay a visit to apartment 14? Like we did with the traditional networking example, you are the packet we are going to follow! Of course you leave the apartment (container) through its door (its network interface). The difference with our previous example is that you are now not out of the house, you are just out of your apartment but still inside the building. You then follow a private corridor and reach another door (this is the LXC interface).

This door gives you access to a common space of the building where the routing and dispatching occurs. We call it the Cilium Lobby (the blue rectangle). This Lobby has been installed by the Cilium Agent of this building when Cilium was selected to provide communications in this Kubernetes cluster. There is one Cilium Agent per building and he is taking care of everything related to networking.

So you reach the fountain in that Lobby (it is a virtual Switch here) with all its signs. You have in your pocket the map (ARP table) to decipher them and find a match with number 14. It is the door on the top right. You then open that door, follow the corridor and reach apartment number 14. You would go back to apartment number 12 following the same path but in the opposite direction.

As the destination is in the same community, it is still Layer 2 routing and you can see the similarities with traditional networking.

Traveling between apartments in different buildings

Now from apartment 12 you want to pay a visit to the apartment number 22 which is in another building. The beginning of your travel is the same as previously, you exit your apartment, follow the corridor, reach the Cilium Lobby and the fountain. As with traditional networking, the destination number belongs to another community so you need the help of the gatekeeper in the Lobby. The difference is that this gatekeeper doesn’t live in an apartment, he is just waiting at a deck in the Lobby. The gatekeeper looks at his map (the Routing table) for direction to number 22 and shows you the door number 11 to use (the cilium_host).

When you open that door, you see another door behind it: It is the blue triangle and it is called the VXLAN interface. This door open to a nice transparent tunnel that goes through the main door of the building. You are protected from the rain and can enjoy the landscape while walking to the other building. You even spot the outdoor fountain! When you reach the green building you exit the tunnel, greet the cilium_host number 21 of this building, reach the Lobby and go to the gatekeeper desk. You gave him your destination (apartment 22) and he shows you the path to the fountain of this building. He also gave you the map (ARP Table) so you can translate the signs at the fountain. From the fountain you head now to the door on the top left, follow the corridor and reach your destination. As before, your way back will follow the same journey in the opposite direction.

This is Layer 3 routing as the destination community is different than yours. Although the routing is a bit more complex in Kubernetes, the principle stays the same as with traditional routing: Finding the right door to reach your destination number. Note that there are other modes of routing but this one is the a common one.

Wrap up

I hope this helped you understand the difference between the traditional networking and Kubernetes networking and also that the latter is clearer now for you. If that is all you needed, then I’m glad you’ve read this blog post, I hope you’ve enjoyed it. If you now want to learn more about Kubernetes Networking, stay tuned because I’ll write an intermediate level where you’ll see what a building really looks like on a real cluster!

L’article Kubernetes Networking by Using Cilium – Beginner Level est apparu en premier sur dbi Blog.

M-Files January Release 2024

Tue, 2024-02-13 12:15

A bunch of new features and a seamless user experience across all the M-Files products have been introduced with the M-Files January 2024 Release. In addition, it will bring a fresh look across all M-Files products and additional Business Benefits.
In this blog, I will give a summary of the new Features and Business Benefits including pictures of the new look and feel of the different M-Files products.
Furthermore, I will also share the instruction to follow, in order to keep the former colour scheme.

Business Benefits of the M-Files January Release 2024

Outlined below you will find a list of the most important Business Benefits.

  • Consistent User Experience: Streamlined and cohesive design across products for a consistent and intuitive user experience.
  • Enhanced Accessibility: Improved color contrast and accessibility features will cater to a broader audience, ensuring a positive experience for users with diverse needs.
  • Efficient Learning Curve: Quicker adaptation to the unified design, as common elements and patterns create a more efficient learning curve.
  • Increased User Satisfaction: A cohesive and accessible design contributes to higher user satisfaction.
Summary of the main new Features of the M-Files January Release 2024

Features below will give you a list of the major improvements.

  • M-Files Desktop Client and M-Files Web Client have the same visual identity.
  • In the M-Files comments feature you have the option to mention someone by using the @ plus the name. In addition, this will trigger a notification.
  • M-Files Desktop is supporting the copy of many links at once. You can now select many objects and create links to them at once in M-Files Desktop.
  • It is now possible to set the session timeout individually. This can be easily done under the Advanced Vault Settings section.
  • To streamline the process, improved sharing options were implemented.
    Description out of the M-Files Release Notes: “Sharing options in M-Files Desktop have been clarified, and rarely used features have been removed if M-Files Web is enabled. PDF options have been moved to a “Save as PDF” section of the context menu. These are user interface changes only and have no effect on the API.“.

Joonas Linkola (Senior Partner sales Engineer at M-Files) shared a nice summary of the release note on LinkedIn.

For further details, please take a look at the official M-Files Release Note.”

How it looks now… Reverting to the previous M-Files colour scheme

Open the M-Files Admin application and navigate to the Vault you would like to modify.
Click on Configuration => Client => Desktop => Appearance.
Particularly, the parameters below have to be adapted.

  • Enabled > Yes
  • Logo Image > M-Files Logo below
  • Top Area Color > #318ccc
  • Accent Color > #318ccc
  • Top Area Breadcrumb and Icon Color > #ffffff

Please feel free to look at the below video that shows the same, step by step.

L’article M-Files January Release 2024 est apparu en premier sur dbi Blog.

Cloud Native Storage: Overview

Tue, 2024-02-13 09:21

When thinking of containers, Cloud Native Storage topic shines like an evidence. You have to handle this topic with care as it will carry your workload and configuration. This blog will be the first of a series talking about CNS and usage.
I will try to cover as many aspects as possible and clear up the CNCF situation when you start looking at the landscape. The purpose of this series will not be to a direct comparison of the products one to one. Especially because of the number of products and time consuming testing. Plus you may find many on the internet who did it for different purposes and different products. It will be more a higher level method to choose wisely what will the best fit your needs and workload.

Before I go further, let me just raise a little warning about what you’ll read below. To understand terminology and the ins and outs concepts, you better be familiar with Kubernetes. For example, holding a CKA (Certified Kubernetes Administrator) may help you better appreciate this series of blogs.

Classic storage, I mean at the OS level, is what we call ephemeral storage from a container view. You may have LVM, nfs shares or any storage managed at the OS level. Many SRE engineers will stay at this level for data protection to secure and avoid the risk of loosing their precious data. For example, we may imagine a database administrator having his database installed in a VM and thinking to backup the dedicated file system where the data are stored.

Now let’s get back to cloud native storage, as the name depict it, it has been built specifically for cloud environments. Stricto sensu, cloud differs from on-premise for hosting, but here, we’re describing the general term for “cloud native”. It means a software approach to develop, build, create and manage modern applications and for that, they require scalable, flexible and resilient capabilities. These applications can leverage on public, private or hybrid cloud infrastructures.

What are these cloud native applications, they are (or should be) most of the time small and independent services allowing a very fast release cycle with more confidence. This can also be part of the efficient time-to-market definition. Again, small doesn’t mean it can’t handle very big loads, scalability and flexibility are part of the cloud native’s main purpose. In fact, it is more designed for efficiency and reliability to handle production environments by reducing cost in a general manner when load is varying.

You may also find big monolithic workloads in containers and I can say that, since I work with containers, I saw a lot of productive environments running applications that are not microservices. It is sometimes not possible due to vendor products or it can be a vendor first move before moving from monolithic to microservices.

Now, let’s talk about the CNCF landscape regarding Cloud Native Storage, it is a sub-topic in the Runtime category.
You can find it by following the link.

If you are not familiar with this view, let me give you some hints

  • Some product have a grey background, it’s because their code is proprietary
  • Some product have a white background, it’s because their code is open-source
  • Projects are graduated regarding vote from the community, it indicate the maturity level of a product
  • Graduation can be the following
    • Sandbox: Innovators
    • Incubating: Early adopters
    • Graduated: Majority
    • Archived: Grey logo, Project no longer supported
  • Some are in this picture only because they support the community as members
    • Platinum
    • Gold
    • Silver
    • Academic
    • Nonprofit
    • End User supporter

I can imagine now what can be your next question is: Which product should I use for my needs.
Well, this will be the topic of my next blog series regarding Cloud Native Storage.

In the mean time, I encourage you to follow my colleague’s blog on not being afraid to install a database in Openshift here.

L’article Cloud Native Storage: Overview est apparu en premier sur dbi Blog.

Configure Zabbix with Ansible

Tue, 2024-02-13 07:14

Besides the ability to setup your Zabbix server, proxy and agents, the Ansible collection for Zabbix (ie. community.zabbix) also allows to configure every aspects of your Zabbix system. From hosts to users, discovery rules or even templates. In this blog post, we will see how this module helped me to automatically configure hosts I provision with YaK in my Zabbix server configuration.

Setup community.zabbix

The first step is easy and fast. In your YaK environment, run the following command:

ansible-galaxy collection install community.zabbix

This collection is actually an interface with Zabbix API which make our work easier when we already know Ansible and don’t want to go through complex HTTP API calls (see my other blog post about it).

Setup Ansible Variables

If you know Ansible, you know that all tasks are going through a ssh connection to the target host. Once agent is installed on target host, we want to configure it in Zabbix server. This is achieved by modifying few variables before calling zabbix modules:

- name: Set api configuration facts
  ansible.builtin.set_fact:
    ansible_network_os: community.zabbix.zabbix
    ansible_connection: httpapi
    ansible_httpapi_port: 80
    ansible_zabbix_url_path: ''
    ansible_host: "{{ zabbix_server_ip }}" # Zabbix server
    ansible_user: "{{ zabbix_api_login_user }}"

For requests to be redirected, all tasks concerning our host must go to the zabbix server (line 7). Zabbix server does not understand ssh connection, but httpapi connections (line 4) with ansible_network_os specific to the API (line 3). Authentication against the http api is made with ansible_user as login and ansible_httpapi_pass as password. For better security, we could create an API token and use https.

Usage Host Group

Before creating our host, we can start with a simpler task like creating a host group:

- name: Create host groups
  community.zabbix.zabbix_group:
    host_group: "{{ zabbix_agent_groups }}"
  register: zabbix_api_hostgroup_created
  until: zabbix_api_hostgroup_created is succeeded

This task calls zabbix_group module with a list of group in host_group argument. Could it be more easier than that?

Note that lines 4 and 5 are there to ensure host group is created as concurrent access (while running against multiple hosts) might raise creation failure. The default value for retries is 3 and with a delay of 5 seconds between each retry.

Host

To create a host in Zabbix configuration, there are a bit more variables to provide to the module:

- name: Create host in zabbix server configuration
  community.zabbix.zabbix_host:
    host_name: "{{ hostname }}"
    host_groups: "{{ zabbix_agent_groups }}"
    link_templates: "{{ zabbix_agent_link_templates }}"
    interfaces: "{{ zabbix_agent_interfaces }}"
    tags: "{{ zabbix_agent_tags }}"
    tls_connect: 2 # PSK
    tls_psk_identity: "{{ zabbix_agent2_tlspskidentity }}"
    tls_psk: "{{ zabbix_agent2_tlspsk_secret }}"
  register: zabbix_api_host_created
  until: zabbix_api_host_created is succeeded

host_name is the how the host will be named in Zabbix configuration (coming from inventory – line 3). host_groups is the list of groups we’ve created previously. link_templates is a list of Zabbix templates.

interfaces (line 6) will be the list of defined interfaces to reach the Zabbix agent from the Zabbix server. This variable content looks like:

    zabbix_agent_interfaces:
      - type: 1
        main: 1
        useip: "{{ zabbix_useuip }}"
        ip: "{{ ansible_host }}"
        dns: ""
        port: "{{ zabbix_agent_listenport }}"

Next tls_* variables are set to encrypt trafic between agent and server.

tags variable allows to have host added with tags in Zabbix configuration.

Et voilà!

L’article Configure Zabbix with Ansible est apparu en premier sur dbi Blog.

Reestablish administrator role access to Rancher users

Mon, 2024-02-12 07:49
Introduction

I came across a case with no more Administrator users in Rancher. Therefore there was no access to the local (Rancher) cluster and administration of Rancher. No user could configure and reassign the Administrator role. I will show you how to retrieve the Administrator role for your Rancher users.

Solution

The current default admin user has the role Restricted-Administrator. He is no longer an Administrator and cannot reassign himself to the Administrator role due to insufficient rights.

error message when assigining Administrator role

Let’s create a new user called temporary-admin with User-Base permission.

Get the ID of the user and access your Rancher Kubernetes cluster. Use a config file that can modify ClusterRoleBinding in the cluster.

Search for the ClusterRoleBinding associated with the user ID, here u-7s8dx.

$ kubectl get clusterrole | grep 7s8
u-7s8dx-view                                                           2024-02-07T15:38:53Z
$ kubectl get clusterrole u-7s8dx-view -o yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2024-02-07T16:16:05Z"
  finalizers:
  - wrangler.cattle.io/auth-prov-v2-crole
  labels:
    cattle.io/creator: norman
  name: u-7s8dx-view
  ownerReferences:
  - apiVersion: management.cattle.io/v3
    kind: User
    name: u-7s8dx
    uid: f409c575-e413-429d-8860-ca2e8103e736
  resourceVersion: "5803466"
  uid: 23d7347c-838c-4ed4-a1c2-290c782384ad
rules:
- apiGroups:
  - management.cattle.io
  resourceNames:
  - u-7s8dx
  resources:
  - users
  verbs:
  - get

Edit the ClusterRole with the following rules:

rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
- nonResourceURLs:
  - '*'
  verbs:
  - '*'

Now the user temporary-admin should have the permissions to reassign the Administrator role to your user. Edit your user so they can be Administrator again, log into it, then delete the temporary-admin user.

Now, your user has the Administrator role back in Rancher, and all your local modifications with the temporary-admin user have been deleted.

Check our other blog articles

https://www.dbi-services.com/blog/category/devops/
https://www.dbi-services.com/blog/category/kubernetes/

L’article Reestablish administrator role access to Rancher users est apparu en premier sur dbi Blog.

Why businesses – regardless of their size – have an interest in using an ECM/IIM solution?

Thu, 2024-02-08 09:36

In today’s business landscape, everyone is facing a crucial dilemma: how to store my information? How to secure my data? How to manage my content processes? How to make my data more intelligent? Which solution to use?

In this blog, I will show you that, whatever the size of the business, a company should use an ECM/IIM solution to digitally manage the information to support its processes and business goals.

What is it?

An ECMEnterprise Content Management – is a combination of methods and strategies used to collect, manage, and deliver information in various forms. Information is also stored and archived for future use.

But ECM is not able to manage and analyze unstructured content like emails, rich media, invoices and other records, analytics and more. That’s where IIM arrived as a rebirth of ECM.

IIMIntelligent Information Management – is the solution to address the growing demands for increased capacity, seamless information integration, time efficiency, and cost savings. In order for you to access all the data you were missing before.

What does it do?

An ECM/IIM solution is not only scanning, storing and retrieving information but a strategic system for collecting data and organizing information. It digitalizes, controls and automates the flow of unstructured information within an organization. It also captures, reads, analyzes and indexes the information. You can annotate, edit or improve the information. It offers workflow and automation tools to ensure that information goes through to the correct approval processes within your organization and is assigned to the right people at the right time.

Why is it important to have such solution?

Having a lot of information to manage, it is not a simple task for a business: store the information, classify the information, retrieve the information… An ECM/IIM solution can help you do just that, and facilitate the routines for processing and analyzing your information in your company. For example, when your information is unorganized, such a solution can help you to structure it. When you need to establish different level of access on the information based on some criteria, such as the departments within your business. When you have a lot of paper format information and you want to digitalize it all.

Regardless of your business’s size or industry, an ECM/IIM solution brings you a lot of advantages!

Who is it for?

At the beginning, ECM/IIM solutions were for large companies because of the cost (infrastructure, licenses, maintenance) and the need of IT teams to implement and manage them. This is not the case anymore! Many ECM/IIM solutions are now affordable and accessible for small and medium-sized businesses, thanks to technological advances, which have virtually eliminated the need of heavy IT resources. The systems became more user-friendly and easier to implement and manage by non-IT people.

Recently, the low-code/no-code for ECM/IIM solutions are a simple and cost-effective way of developing applications without specialized people. Using a straightforward visual interface and drag-and-drop tools, you can reduce the workload on your IT teams and achieve a faster time-to-market of your ECM/IIM solution.

ECM/IIM solutions are now equally suited to all industries or businesses.

Here are some examples:

Healthcare industries has embraced ECM/IIM solutions to reduce costs and improve patient care by managing all their information.  Patient information, medications, electronic medical records (EMR), physician notes and lab results are some of the examples. Loosing time to search for patient information — or worse, not knowing it exists — could generate errors and have devastating consequences. In addition, to automate all quality and document management related processes, healthcare industries are using QMS – Quality Management System – solutions.

Banks and financial services are good candidates for ECM/IIM solutions. They manage a high-volume transactions which involves manual document processing. Such solutions digitizes the reports and documents to streamline business processes, improves the speed of business operations, decisions and enhances customer service with a complete customer view.

Administrative departments (HR, Financial, Marketing…) of any companies also needs ECM/IIM solutions to manage all the personal information of their employees (like CVs, motivation letter, contracts, certification, payslips…) but also the client or providers information (like contact info, invoices, contracts…), templates documents, internal documentation.

Retail establishment can use ECM/IIM solutions to run merchandising and supply chain functions as cost-effectively and efficiently as possible. They use it to manage their supplier contracts, orders and shipments and to better understand customers and the markets trends. Providing current and accurate product information allows to improve employee productivity. The use of ECM/IIM solutions enable them to react more quickly to changing market demands, and considerably reduce document storage and handling costs.

Manufacturing industries are also using ECM/IIM solutions to provide a unified platform for all content and data across the enterprise. It is perfect for organizing product data, processing invoices, and managing employee information. It helps managers automate complex business process to
reduce processing times within the enterprise, including project management, financial and procurement services, human resources, compliance and many other areas.

Colleges, universities and other educational institutions accumulate a lot of records which need to be secured, organized, and easily retrieved. That’s why they use ECM/IIM solutions to digitize their documents and formally manage their electronic content. They can store student and employee records digitally, handle applications with electronic workflows, preserve media and documents digitally and automate routine processes.

Governments and municipalities are also using high volumes of documents, including a lot of paper documents. Their document management needs can be unique, and errors can cause serious consequences. They have complicated processes and requirements for document flow. Using ECM/IIM solutions can help them to reduce the footprint, automate their paper workflows, ensure their document security and regulatory compliance and reduce a lot administrative tasks and unnecessary bureaucracy.

As you can see, a lot of businesses are using ECM/IIM solutions in order to control and manage their critical information. They realized that they can save time and money. Visibility and control are the key words.

Conclusion

ECM/IIM solutions can be used by any size of industries or businesses. It is a fact! Any kind of business looking for a solution to cut costs, improve productivity, and streamline business processes will benefit from ECM/IIM solutions. The amount of data generated and manipulated by businesses is increasing a lot, every year, so using such solutions is an excellent way to gain control of the content and save money in your organization, regardless of its size.

Our offer at dbi services

dbi services is specialized in the integration of content management solutions. Our expertise in database, middleware technologies and operating systems guarantees quality implementation of ECM/IIM solutions in your IT infrastructure.

We offer:

  • Enhanced expertise: With a deep understanding of content management solutions, our team ensures superior quality ECM/IIM implementations.
  • Increased security and compliance: We adhere to the highest standards to protect your data.
  • Tailored and effective solution: We finely tune ECM/IIM solutions to your specific needs, boosting the efficiency of your organization.

Our solutions:

For more information: https://www.dbi-services.com/expertises/enterprise-content-management-ecm

L’article Why businesses – regardless of their size – have an interest in using an ECM/IIM solution? est apparu en premier sur dbi Blog.

VMware Tanzu Kubernetes: Configure your managed cluster

Wed, 2024-02-07 11:12

This third blog describe the steps to do once your cluster is initialized and in ready state.

In case you missed the first 2 blogs you can find them here and there. Ok, now from our jump server, we need to configure our Tanzu managed cluster with base tools then we’ll configure tools regarding our licence (Tanzu Kubernetes Standard Edition).

Tanzu Kubernetes CLI

You need to install Tanzu CLI, thus you’ll be able to manage your cluster(s) to

  • Create and manage management clusters
  • Create and manage workload clusters
  • Manage Kubernetes releases
  • Install and manage packages
  • Create and manage application workloads
  • Configure the Tanzu CLI itself

As a reminder, we are on a jump server (We can also call it the bootstrap machine as named in the official documentation), so we have to install the CLI as mentioned in the documentation.

Once installed, let’s try to initialize the Tanzu CLI.

$ tanzu init
Checking for required plugins...
Installing plugin 'login:v0.25.0'
Installing plugin 'management-cluster:v0.25.0'
Installing plugin 'package:v0.25.0'
Installing plugin 'pinniped-auth:v0.25.0'
Installing plugin 'secret:v0.25.0'
Installing plugin 'telemetry:v0.25.0'
Successfully installed all required plugins
✔  successfully initialized CLI

$ tanzu version
version: v0.25.0
buildDate: 2022-08-25
sha: 6288c751-dirty

If installation is correct, you should see the following.

$ tanzu plugin list
  NAME                DESCRIPTION                                                        SCOPE       DISCOVERY  VERSION  STATUS
  login               Login to the platform                                              Standalone  default    v0.25.0  installed
  management-cluster  Kubernetes management-cluster operations                           Standalone  default    v0.25.0  installed
  package             Tanzu package management                                           Standalone  default    v0.25.0  installed
  pinniped-auth       Pinniped authentication operations (usually not directly invoked)  Standalone  default    v0.25.0  installed
  secret              Tanzu secret management                                            Standalone  default    v0.25.0  installed
  telemetry           Configure cluster-wide telemetry settings                          Standalone  default    v0.25.0  installed

Before we continue, let’s create the kube config for our managed cluster.

$ tanzu login
? Select login type Local kubeconfig
? Enter path to kubeconfig (if any)
? Enter kube context to use tkgs-cluster-1
? Give the server a name DEV
✔  successfully logged in to management cluster using the kubeconfig DEV
Checking for required plugins...
All required plugins are already installed and up-to-date
Tanzu Kubernetes packages

At that point, we are ready to install packages in our cluster, theses packages are part of what’s available regarding the licence we have chosen. If you remember, I said we have the Tanzu Standard licence, so we start by installing the standard repository

$ tanzu package repository get tanzu-standard -n tanzu-package-repo-global

NAME:          tanzu-standard
VERSION:       11066922
REPOSITORY:    projects.registry.vmware.com/tkg/packages/standard/repo
TAG:           v1.6.0
STATUS:        Reconcile succeeded
REASON:

$ tanzu package repository update tanzu-standard --url projects.registry.vmware.com/tkg/packages/standard/repo -n tanzu-package-repo-global
 Updating package repository 'tanzu-standard'
 Getting package repository 'tanzu-standard'
 Validating provided settings for the package repository
 Updating package repository resource
 Waiting for 'PackageRepository' reconciliation for 'tanzu-standard'
 'PackageRepository' resource install status: Reconciling
 'PackageRepository' resource install status: ReconcileSucceeded
Updated package repository 'tanzu-standard' in namespace 'tanzu-package-repo-global'

In case you get this error, check your firewall configuration.

NAME:          tanzu-standard
VERSION:       11039009
REPOSITORY:    projects.registry.vmware.com/tkg/packages/standard/repo
TAG:           v1.6.0
STATUS:        Reconcile failed: Fetching resources: Error (see .status.usefulErrorMessage for details)
REASON:        vendir: Error: Syncing directory '0':
  Syncing directory '.' with imgpkgBundle contents:
    Imgpkg: exit status 1 (stderr: imgpkg: Error: Checking if image is bundle:
  Fetching image:
    Error while preparing a transport to talk with the registry:
      Unable to create round tripper:
        Get "https://projects.registry.vmware.com/v2/": dial tcp x.x.x.x:443: i/o timeout)

Let’s list all components that are available.

$ tanzu package available list

  NAME                                          DISPLAY-NAME               SHORT-DESCRIPTION                                                                 LATEST-VERSION
  cert-manager.tanzu.vmware.com                 cert-manager               Certificate management                                                            1.7.2+vmware.1-tkg.1
  contour.tanzu.vmware.com                      contour                    An ingress controller                                                             1.20.2+vmware.1-tkg.1
  external-dns.tanzu.vmware.com                 external-dns               This package provides DNS synchronization functionality.                          0.11.0+vmware.1-tkg.2
  fluent-bit.tanzu.vmware.com                   fluent-bit                 Fluent Bit is a fast Log Processor and Forwarder                                  1.8.15+vmware.1-tkg.1
  fluxcd-helm-controller.tanzu.vmware.com       Flux Helm Controller       Helm controller is one of the components in FluxCD GitOps toolkit.                0.21.0+vmware.1-tkg.1
  fluxcd-kustomize-controller.tanzu.vmware.com  Flux Kustomize Controller  Kustomize controller is one of the components in Fluxcd GitOps toolkit.           0.24.4+vmware.1-tkg.1
  fluxcd-source-controller.tanzu.vmware.com     Flux Source Controller     The source-controller is a Kubernetes operator, specialised in artifacts          0.24.4+vmware.1-tkg.4
                                                                           acquisition from external sources such as Git, Helm repositories and S3 buckets.
  grafana.tanzu.vmware.com                      grafana                    Visualization and analytics software                                              7.5.16+vmware.1-tkg.1
  harbor.tanzu.vmware.com                       harbor                     OCI Registry                                                                      2.5.3+vmware.1-tkg.1
  multus-cni.tanzu.vmware.com                   multus-cni                 This package provides the ability for enabling attaching multiple network         3.8.0+vmware.1-tkg.1
                                                                           interfaces to pods in Kubernetes
  prometheus.tanzu.vmware.com                   prometheus                 A time series database for your metrics                                           2.36.2+vmware.1-tkg.1
  whereabouts.tanzu.vmware.com                  whereabouts                A CNI IPAM plugin that assigns IP addresses cluster-wide                          0.5.1+vmware.2-tkg.1

The Tanzu Standard Licence comes with

  • Cert-manager: A certificates management
  • Contour: An ingress controller
  • External-dns: A complement to coreDNS but for external name resolution
  • Fluent-bit: For logging
  • Flux: For the CI/CD
  • Grafana: A dashboard tool to visualize metrics
  • Harbor: An OCI registry
  • Multus-cni: A tool to enable attaching multiple network
  • Prometheus: A TSDB for metrics
  • Whereabouts: A CNI IPAM

The first package to install is cert-manager as it will simplify the process of obtaining, renewing and using certificates. Remember that it is required to install it first.

$ tanzu package install cert-manager \
> --package-name cert-manager.tanzu.vmware.com \
> --version 1.7.2+vmware.1-tkg.1 \
> --namespace tanzu-packages \
> --create-namespace
 Installing package 'cert-manager.tanzu.vmware.com'
 Creating namespace 'tanzu-packages'
 Getting package metadata for 'cert-manager.tanzu.vmware.com'
 Creating service account 'cert-manager-tanzu-packages-sa'
 Creating cluster admin role 'cert-manager-tanzu-packages-cluster-role'
 Creating cluster role binding 'cert-manager-tanzu-packages-cluster-rolebinding'
 Creating package resource
 Waiting for 'PackageInstall' reconciliation for 'cert-manager'
 'PackageInstall' resource install status: Reconciling
 'PackageInstall' resource install status: ReconcileSucceeded

 Added installed package 'cert-manager'

Then, you are free to install the components that comes with your Tanzu licence or install yours. To keep it simple i suggest you install contour so that your ingress configuration will be smoother. Out-of-the-Box installation will go like that.

$ tanzu package install contour \
> --package-name contour.tanzu.vmware.com \
> --version 1.20.2+vmware.1-tkg.1 \
> --namespace tanzu-packages \
> --create-namespace
 Installing package 'contour.tanzu.vmware.com'
 Updating package 'contour'
 Getting package install for 'contour'
Updated installed package 'contour'

$ tanzu package installed list -A

  NAME          PACKAGE-NAME                   PACKAGE-VERSION        STATUS               NAMESPACE
  cert-manager  cert-manager.tanzu.vmware.com  1.7.2+vmware.1-tkg.1   Reconcile succeeded  tanzu-packages
  contour       contour.tanzu.vmware.com       1.20.2+vmware.1-tkg.1  Reconcile succeeded  tanzu-packages

Once installed, you can check the status in the dedicated namespace

$ kubectl get all -n tanzu-system-ingress
NAME                          READY   STATUS    RESTARTS   AGE
pod/contour-f446f5f57-9kdsh   1/1     Running   0          2m38s
pod/contour-f446f5f57-th9jw   1/1     Running   0          2m38s
pod/envoy-8lg9k               2/2     Running   0          2m39s
pod/envoy-rm2s9               2/2     Running   0          2m39s
pod/envoy-xt9mb               2/2     Running   0          2m39s

NAME              TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/contour   ClusterIP   10.108.109.150   <none>        8001/TCP                     2m39s
service/envoy     NodePort    10.99.237.57     <none>        80:31532/TCP,443:32046/TCP   2m38s

NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/envoy   3         3         3       3            3           <none>          2m39s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/contour   2/2     2            2           2m38s

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/contour-f446f5f57   2         2         2       2m38

Uhh something looks not good, my external-IP has a <none> instead of an IP.

Well, I will have to reinstall it with custom properties. It will be a good exercise to remove a package in Tanzu.

Deleting a package is an easy-peasy task in case your require it, in our case, we wanted to customize its configuration to set envoy service as LoadBalancer type to be able to expose our applications. Let’s delete contour package.

$ tanzu package installed delete contour -n tanzu-packages
Deleting installed package 'contour' in namespace 'tanzu-packages'. Are you sure? [y/N]: y
 Uninstalling package 'contour' from namespace 'tanzu-packages'
 Getting package install for 'contour'
 Deleting package install 'contour' from namespace 'tanzu-packages'
 'PackageInstall' resource deletion status: Deleting
 Deleting admin role 'contour-tanzu-packages-cluster-role'
 Deleting role binding 'contour-tanzu-packages-cluster-rolebinding'
 Deleting service account 'contour-tanzu-packages-sa'
Uninstalled package 'contour' from namespace 'tanzu-packages'

Then to generate the default values and adapt it for our needs we can do the following.

$ tanzu package available get contour.tanzu.vmware.com/1.20.2+vmware.1-tkg.1 --generate-default-values-file

NAME:                             contour.tanzu.vmware.com
VERSION:                          1.20.2+vmware.1-tkg.1
RELEASED-AT:                      2022-06-14 02:00:00 +0200 CEST
DISPLAY-NAME:                     contour
SHORT-DESCRIPTION:                An ingress controller
PACKAGE-PROVIDER:                 VMware
MINIMUM-CAPACITY-REQUIREMENTS:    Varies significantly based on number of Services, Ingresses/HTTPProxies, etc. A starting point is 128MB RAM and 0.5 CPU for each Contour and Envoy pod, but this can and should be tuned based on observed usage.
LONG-DESCRIPTION:                 An Envoy-based ingress controller that supports dynamic configuration updates and multi-team ingress delegation. See https://projectcontour.io for more information.
MAINTAINERS:                      [{Steve Kriss} {Steve Sloka} {Nick Young} {Sunjay Bhatia} {Nicholas Seemiller}]
RELEASE-NOTES:                    contour 1.20.2 https://github.com/projectcontour/contour/releases/tag/v1.20.2
LICENSE:                          [VMware’s End User License Agreement (Underlying OSS license: Apache License 2.0)]
SUPPORT:                          Support provided by VMware for deployment on TKG 1.4+ clusters. Best-effort support for deployment on any conformant Kubernetes cluster. Contact support by opening a support request via VMware Cloud Services or my.vmware.com.
CATEGORY:                         [ingress]

Created default values file at /home/tanzu/contour-default-values.yaml

I set the following.

envoy:
 service:
   type: LoadBalancer

Then rerun the installation of contour with my custom values file.

$ tanzu package install contour --package-name contour.tanzu.vmware.com --version 1.20.2+vmware.1-tkg.1 --namespace tanzu-packages --create-namespace --values-file contour-data-values.yaml
 Installing package 'contour.tanzu.vmware.com'
 Creating namespace 'tanzu-packages'
 Getting package metadata for 'contour.tanzu.vmware.com'
 Creating service account 'contour-tanzu-packages-sa'
 Creating cluster admin role 'contour-tanzu-packages-cluster-role'
 Creating cluster role binding 'contour-tanzu-packages-cluster-rolebinding'
 Creating secret 'contour-tanzu-packages-values'
 Creating package resource
 Waiting for 'PackageInstall' reconciliation for 'contour'
 'PackageInstall' resource install status: Reconciling
 'PackageInstall' resource install status: ReconcileSucceeded
 'PackageInstall' resource successfully reconciled

 Added installed package 'contour'

Now check our envoy configuration

$ kubectl get all -n tanzu-system-ingress
NAME                           READY   STATUS    RESTARTS   AGE
pod/contour-6c5977c549-46hw4   1/1     Running   0          8m55s
pod/contour-6c5977c549-vdswd   1/1     Running   0          8m55s
pod/envoy-d6kmv                2/2     Running   0          8m55s
pod/envoy-fvpn6                2/2     Running   0          8m55s
pod/envoy-wjk4m                2/2     Running   0          8m55s

NAME              TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                      AGE
service/contour   ClusterIP      10.110.74.158   <none>           8001/TCP                     8m55s
service/envoy     LoadBalancer   10.107.24.71    172.15.160.111   80:31930/TCP,443:31169/TCP   8m55s

NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/envoy   3         3         3       3            3           <none>          8m55s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/contour   2/2     2            2           8m55s

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/contour-6c5977c549   2         2         2       8m55s
Conclusion

You are now ready to have some fun and install some applications in your cluster.

I hope this “tuto” helped you a bit to start your journey with Tanzu Kubernetes.

L’article VMware Tanzu Kubernetes: Configure your managed cluster est apparu en premier sur dbi Blog.

Introduction to PostgreSQL in OpenShift

Wed, 2024-02-07 01:49

I’m on a mission to tackle scary topics around DevOps these days! In my 2 previous blog posts (Part 1 and Part 2) I’ve tried to demystified the DevOps word (and world!) by describing what it actually is with a simple example.

But there is an even scarier topic (true story!), which is… deploying a database in Kubernetes! Database is to Kubernetes what the garlic is to the vampires! Yes you get the picture below but it doesn’t have to be that way!

It is a common belief that it is not recommended or even possible for production databases and some people just move away from it. However other people have thought it twice and have actually deployed production databases on Kubernetes with no loss of data for years. If you want to give it a try, I’ll show you an example of how you can test it very quickly.

The choice of technology

First you have to select a database technology and I’m going with the widely used PostgreSQL. It is one of the many database choices that is part of the CNCF landscape as shown below. CNCF is a vendor-neutral hub for all applications that can run as containers. They are then called cloud native:

Then, all the magic for installing a database in a Kubernetes cluster is related to its Operator. The choice of a mature operator is important as it will do all the hard work for us. For PostgreSQL there are several choices and a popular one used by many companies is CloudNativePG. In addition, it is also part of the database CNCF landscape through EDB so it deserves serious consideration:

CloudNativePG can be installed on any flavour of Kubernetes and I’ve chosen the one from Red Hat called OpenShift (if you need a QuickStart on it, you can check my blog posts here then here). With its OperatorHub it is very easy and quick to deploy an Operator and have one PostgreSQL cluster ready to play with in your Kubernetes cluster. If you are like me, you want to play with your new toy right now!

For those of you that are already familiar with this topic, you may wonder why not use the Operator crunchydata that is also part of the CNCF and suits OpenShift very well. That is a fantastic question and it would also be an excellent choice. That may be the topic for another blog post!

Operator installation in OpenShift with the Web UI

Let’s install CloudNativePG (CNPG) and look how easy it is in OpenShift. First I like to create a separate project (or namespace) to keep everything tidy but it is not mandatory. I cheat a bit by using the command line interface here as I have a shell open to my OpenShift cluster:

$ oc new-project cloudnativepg

Then just use the UI Web console from OpenShift to install and deploy everything. If you like click-click-click-click you are in for a treat!

Click on “Create instance” in the Cluster panel:

Click the button “Create” to install a PostgreSQL cluster with 2 instances and that’s it!

Check the installation process and result in CLI

If you are familiar with Kubernetes in general, you are also used to the Command Line Interface (CLI). After you’ve clicked on the button “Create”, you can follow along all this installation process as shown below:

$ oc get po -n cloudnativepg -w
NAME                                      READY   STATUS     RESTARTS   AGE
cluster-pg-1-initdb-zdthn                 0/1     Init:0/1   0          20s
cnpg-controller-manager-bd7c97dc7-kw4pk   1/1     Running    0          5m14s
cluster-pg-1-initdb-zdthn                 0/1     Init:0/1   0          20s
cluster-pg-1-initdb-zdthn                 0/1     PodInitializing   0          23s
cluster-pg-1-initdb-zdthn                 1/1     Running           0          102s
cluster-pg-1-initdb-zdthn                 0/1     Completed         0          2m11s
cluster-pg-1-initdb-zdthn                 0/1     Completed         0          2m14s
cluster-pg-1-initdb-zdthn                 0/1     Completed         0          2m15s
cluster-pg-1-initdb-zdthn                 0/1     Completed         0          2m15s
cluster-pg-1                              0/1     Pending           0          0s
cluster-pg-1                              0/1     Pending           0          0s
cluster-pg-1                              0/1     Init:0/1          0          0s
cluster-pg-1                              0/1     Init:0/1          0          3s
cluster-pg-1                              0/1     Init:0/1          0          6s
cluster-pg-1                              0/1     PodInitializing   0          7s
cluster-pg-1                              0/1     Running           0          9s
cluster-pg-1                              0/1     Running           0          15s
cluster-pg-1                              0/1     Running           0          21s
cluster-pg-1                              1/1     Running           0          21s
cluster-pg-2-join-x5cdh                   0/1     Pending           0          0s
cluster-pg-2-join-x5cdh                   0/1     Pending           0          0s
cluster-pg-2-join-x5cdh                   0/1     Pending           0          3s
cluster-pg-2-join-x5cdh                   0/1     Init:0/1          0          3s
cluster-pg-2-join-x5cdh                   0/1     Init:0/1          0          5s
cluster-pg-2-join-x5cdh                   0/1     Init:0/1          0          7s
cluster-pg-2-join-x5cdh                   0/1     PodInitializing   0          9s
cluster-pg-2-join-x5cdh                   1/1     Running           0          11s
cluster-pg-2-join-x5cdh                   0/1     Completed         0          21s
cluster-pg-2-join-x5cdh                   0/1     Completed         0          24s
cluster-pg-2-join-x5cdh                   0/1     Completed         0          24s
cluster-pg-2-join-x5cdh                   0/1     Completed         0          25s
cluster-pg-2                              0/1     Pending           0          0s
cluster-pg-2                              0/1     Pending           0          0s
cluster-pg-2                              0/1     Init:0/1          0          0s
cluster-pg-2                              0/1     Init:0/1          0          4s
cluster-pg-2                              0/1     Init:0/1          0          7s
cluster-pg-2                              0/1     PodInitializing   0          7s
cluster-pg-2                              0/1     Running           0          9s
cluster-pg-2                              0/1     Running           0          12s
cluster-pg-2                              0/1     Running           0          13s
cluster-pg-2                              1/1     Running           0          21s
cluster-pg-1-initdb-zdthn                 0/1     Terminating       0          3m29s
cluster-pg-2-join-x5cdh                   0/1     Terminating       0          49s
cluster-pg-1-initdb-zdthn                 0/1     Terminating       0          3m29s
cluster-pg-2-join-x5cdh                   0/1     Terminating       0          49s

In a few minutes we can see each of our PostgreSQL instance in 2 separate Pods: cluster-pg-1 and cluster-pg-2

You may have heard that databases in Kubernetes are based on the object StatefulSet but CNPG doesn’t use it as you can see below:

$ oc get sts -n cloudnativepg
No resources found in cloudnativepg namespace.

The Operator install all the Kubernetes Objects required to run PostgreSQL in it. I’ll keep this blog post at the high level so I will not dive into it. However it would be interesting to see a simple example of data being creating in our instance-1 and then see they have been replicated into instance-2. Let’s just do that.

The username and password to connect to the PostgreSQL database are stored in a secret. You can get these credentials as follows:

$ oc get secrets -n cloudnativepg cluster-pg-app -oyaml

Username and password are coded in base64 so you can decode them as shown below (for example for the username):

$ echo -n "YXBw" | base64 --decode
app

With these credentials information you can now jump into the pod and connect to the PostgreSQL database:

$ oc exec -it -n cloudnativepg cluster-pg-1 -- bash

1000660000@cluster-pg-1:/$ psql -h localhost -U app
Password for user app: <enter the decoded password here>
psql (16.1 (Debian 16.1-1.pgdg110+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

app=> CREATE TABLE customers (firstname text,lastname text, id serial);
CREATE TABLE

app=> INSERT INTO customers (firstname, lastname) VALUES ( 'John', 'Doe');
INSERT 0 1

app=> \dt
         List of relations
 Schema |   Name    | Type  | Owner
--------+-----------+-------+-------
 public | customers | table | app
(1 row)

app=> SELECT * FROM customers;
 firstname | lastname | id
-----------+----------+-------------
 John       | Doe    |           1
(1 row)

We have created a table with one entry in it. Let’s check if we can find it into the other instance:

$ oc exec -it -n cloudnativepg cluster-pg-2 -- bash

1000660000@cluster-pg-2:/$ psql -h localhost -U app
Password for user app:
psql (16.1 (Debian 16.1-1.pgdg110+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

app=> \dt
         List of relations
 Schema |   Name    | Type  | Owner
--------+-----------+-------+-------
 public | customers | table | app
(1 row)

app=> SELECT * FROM customers;
 firstname | lastname | customer_id
-----------+----------+-------------
 John       | Doe    |           1
(1 row)

Our data have been replicated from our first instance (pod cluster-pg-1) to our second instance (pod cluster-pg-2)! Our PostgreSQL cluster is working!

Wrap up

With this simple example you can now play with your PostgreSQL cluster like you would if it was installed on a Bare-Metal or in a Virtual Machine. Of course this PostgreSQL cluster is not optimized at all for performance but you can see that it is possible to install a database in Kubernetes. If you want to learn more about PostgreSQL you can check out our several Training courses on this topic.

So, don’t be afraid of running databases in Kubernetes, try it and you’ll love it!

L’article Introduction to PostgreSQL in OpenShift est apparu en premier sur dbi Blog.

Disabling “alter system” in PostgreSQL?

Thu, 2024-02-01 06:58

Last year there started an interesting discussion on the PostgreSQL development mailing list: Would it make sense to implement something in PostgreSQL which can be used to disable the “alter system” command even for superusers? The request comes out of the container / K8s world, where parameters are usually set using a declarative way and it might make sense to disallow this even for superusers. If you read the thread you’ll learn about the pros and cons of such a feature, as always, but there is something you can do even today which I was not aware of, even if it is obvious once you’ve seen it.

Lets quickly have a look at what “alter system” is doing. Once you change a parameter using “alter system” this parameter change gets either added to “postgresql.auto.conf” if it is not there, or overwritten if it is there already:

postgres=# alter system set work_mem='11MB';
ALTER SYSTEM
postgres=# \! tail -1 $PGDATA/postgresql.auto.conf
work_mem = '11MB'
postgres=# alter system set work_mem='12MB';
ALTER SYSTEM
postgres=# \! tail -1 $PGDATA/postgresql.auto.conf
work_mem = '12MB'

Of course you either need to reload or even restart PostgreSQL to make this active, but this is how it works for “alter system” in general.

Disallowing “alter system” for even the super user is surprising simple. You can for example do it like this:

postgres=# \! sudo chattr +i $PGDATA/postgresql.auto.conf
postgres=# alter system set work_mem='12MB';
ERROR:  could not open file "postgresql.auto.conf": Operation not permitted

chattr +i will disable any write access to the file and even will prevent deletion of the file. Another option is to give the file to someone else than the operating system user which is running the instance:

postgres=# ! sudo chattr -i $PGDATA/postgresql.auto.conf
postgres=# alter system set work_mem='12MB';
ALTER SYSTEM
postgres=# ! sudo chown root:root $PGDATA/postgresql.auto.conf
postgres=# ! sudo chmod 600 $PGDATA/postgresql.auto.conf
postgres=# alter system set work_mem='12MB';
ERROR: could not open file "postgresql.auto.conf": Permission denied

The same will of course work for all other configuration files in the data directory.

You may also check here for more options about how users can be granted permissions to set specific parameters.

L’article Disabling “alter system” in PostgreSQL? est apparu en premier sur dbi Blog.

Pages