Wednesday, November 24, 2010

vol copy

This is effectively the same as a vol clone, but you need to do the entire operation before the volume is online and available. You need to create the destination volume first and then restrict it so that it is ready for the copy. Then you start the copy process.

# vol copy start -s snap_name source_vol dest_vol

“-s snap_name” defines the snapshot you want to base the copy on, and “source_vol” and “dest_vol” define the source and destination for the copy. “-S” can also be used to copy across all the snapshots that are also included in the volume. This can be very useful if you need to copy all backups within a volume as well as just the volume data.


# vol copy start [ -S | -s snapshot ] source destination

Copies all data, including snapshots, from one volume to another. If the -S flag is used, the command copies all snapshots in the source volume to the destination volume. To specify a particular snapshot to copy, use the -s flag followed by the name of the snapshot. If neither the -S nor -s flag is used in the command, the filer automatically creates a distinctively-named snapshot at the time the vol copy start command is executed and copies only that snapshot to the destination volume.

The source and destination volumes must either both be traditional volumes or both be flexible volumes. The vol copy command will abort if an attempt is made to copy between different volume types.

The source and destination volumes can be on the same filer or on different filers. If the source or destination volume is on a filer other than the one on which the vol copy start command was entered, specify the volume name in the filer_name:volume_name format.

The filers involved in a volume copy must meet the following requirements for the vol copy start command to be completed successfully:

The source volume must be online and the destination volume must be offline.

If data is copied between two filers, each filer must be defined as a trusted host of the other filer. That is, the filer’s name must be in the /etc/hosts.equiv file of the other filer. If one filer is not in the /etc/hosts.equiv file of the other filer then "Permission denied" error message is displayed to the user.

If data is copied on the same filer, localhost must be included in the filer’s /etc/hosts.equiv file. Also, the loopback address must be in the filer’s /etc/hosts file. Otherwise, the filer cannot send packets to itself through the loopback address when trying to copy data.

The usable disk space of the destination volume must be greater than or equal to the usable disk space of the source volume. Use the df pathname command to see the amount of usable disk space of a particular volume.

Each vol copy start command generates two volume copy operations: one for reading data from the source volume and one for writing data to the destination volume. Each filer supports up to four simultaneous volume copy operations.

# vol copy status [ operation_number]

Displays the progress of one or all active volume copy operations, if any. The operations are numbered from 0 through 3. If no operation_number is specified, then status for all active vol copy operations is provided.

# vol copy throttle [ operation_number ] value

This command controls the performance of the volume copy operation. The value ranges from 10 (full speed) to 1 (one-tenth of full speed). The default value is maintained in the filer’s vol.copy.throttle option and is set 10 (full speed) at the factory. The performance value can be applied to an operation specified by the operation_number parameter. If an operation number is not specified, the command applies to all active volume copy operations.

Use this command to limit the speed of volume copy operations if they are suspected to be causing performance problems on a filer. In particular, the throttle is designed to help limit the volume copy’s CPU usage. It cannot be used to fine-tune network bandwidth consumption patterns.

The vol copy throttle command only enables the speed of a volume copy operation that is in progress to be set. To set the default volume copy speed to be used by future volume copy operations, use the options command to set the vol.copy.throttle option.

How to clear the cache from memory

Linux has a supposedly good memory management feature that will use up any "extra" RAM you have to cache stuff. This section of the memory being used is SUPPOSED to be freely available to be taken over by any other process that actually needs it, but unfortunately Linux thinks that cache memory is too important to move over for anything else that actually needs it.

I noticed that whenever the server is booted, everything runs great. But as soon as it fills up with cache, performance degrades. It's terrible..

Up until just now, I have been forced to restart every time this happens because I simply cannot get any work done while in this state of retardation. I can close every single program I'm running - and even then, simply right clicking would require some extended thinking before loading the context menu. Ridiculous.

What consumes System Memory?

The kernel - The kernel will consume a couple of MB of memory. The memory that the kernel consumes can not be swapped out to disk. This memory is not reported by commands such as "free" or "ps".

Running programs - Programs that have been executed will consume memory while they run.

Memory Buffers - The amount of memory used is managed by the kernel. You can get the amount with "free".

Memory Cached - The amount of memory used is managed by the kernel. You can get the amount with "free".

Cached memory is freed on demand by the kernel.... it's done this way to make your system more responsive. Trust Linus Torvalds, he knows what he's doing.

There are few ways to check memory usage in Linux. By using the commands like free, vmstat and ps you will be able to check your linux memory usage:-

# free

The 'free -m' command displays the total amount of free and used physical and swap memory in the system, as well as the buffers used by the kernel.

Output:
total used free shared buffers cached
Mem: 1033612 354636 678976 0 240 201508
-/+ buffers/cache: 152888 880724
Swap: 1967952 0 1967952

You can also ask free to display results in every 5 seconds, in order to track the increases/decreases on memory usage.

# free -s 5

# vmstat

The vmstat command reports information about processes, memory, paging, block IO, traps, and cpu activity.

Output:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 131620 35432 341496 0 0 42 82 737 1364 15 3 81 1


# ps

The command "ps" is a c program that reads the "/proc" filesystem.

There are two elements that are useful when determining the per process memory usage.

They are:
a. RSS
b. VSIZE

Per Process Memory Usage

The inputs to this section were obtained with the command:

# ps -eo pid,ppid,rss,vsize,pcpu,pmem,cmd -ww --sort=pid


Luckily, I found a way to clear out the cache being used. Simply run the following command as root and the cache will be cleared out.

Linux Command :

# sync; echo 3 > /proc/sys/vm/drop_caches

The main point "Don't Panic! Your ram is fine!" is absolutely correct.

The Linux disk cache is very unobtrusive. It uses spare memory to greatly increase disk access speeds, and without taking any memory away from applications. A fully used store of ram on Linux is efficient hardware use, not a warning sign.

Tuesday, November 16, 2010

Creating a FlexClone volume in OnTap 7.3

Flexible volumes let users do some really kool things. One of my favorites involves cloning. We can take an existing FlexVol volume and create a clone from it, either using the volume itself (right now) or using an existing Snapshot copy from some point in the past (like before your server blew up). A FlexClone volume looks exactly like the volume it was created from (its parent FlexVol volume or another FlexClone volume), but it uses no additional physical storage!

This code shows you how easy it is to create a FlexClone volume by:

1) Cloning a flexible volume and
2) Splitting a FlexClone volume from the parent volume.


1) Clone a flexible volume

Display list of snapshots in vol1
FAS1> snap list vol1


Create a clone volume named newvol using the nightly.1 snapshot in vol1
FAS1> vol clone create newvol –b vol1 nightly.1

verify newvol was created
FAS1> vol status –v newvol

Look for snapshots listed as busy, vclone. These are shared with flexclones of vol1 and should not be deleted or the clone will grow to full size
FAS1> snap list vol1

Display space consumed by new and changed data in the flexclone volume.
FAS1> df –m newvol

2) Split a FlexClone volume from the parent volume.

Determine amount of space required to split newvol from its parent flexvol.
FAS1> vol clone split estimate newvol

Display space available in the aggregate containing the parent volume (vol1)
FAS1> df –A

Begin splitting newvol from its parent volume (vol1)
FAS1> vol clone split start newvol

Check the status of the splitting operation
FAS1> vol clone split status newvol

Halt split process. NOTE: All data copied to this point remains duplicated and snapshots of the FlexClone volume are deleted.
FAS1> vol clone split stop newvol

Verify newvol has been split from its parent volume
FAS1> vol clone status –v newvol

Wednesday, November 10, 2010

How to clear NFS locks during network crash or outage for Oracle datafiles.

Symptoms :

Database cannot be opened because old locks still exist on the filer stored data files.

Error : ORA-27086

Cause of this problem :

Data files that were open during network episode left in NFS locked state. Oracle cannot open the locked files

The lock recovery manager (NLM) in the Linux kernel uses uname -n to determine the host name while the rpc.statd process (NSM) uses gethostbyname() to determine the client's name. If these do not match, the recovery process will not work

Solution :

This solution indicates the recovery steps detailed herein should be taken in case Oracle is hung, but that PROPER ORACLE TROUBLESHOOTING AND SUPPORT METHODS SHOULD BE FOLLOWED FOR ANY DATABASE-HUNG ISSUES, independently of NetApp.

Oracle's database product does not typically hang after a network crash.

Summary of Corrective steps :

1) Shutdown Oracle databases

2) Unmount database volumes

3) Kill lockd/statd processes on UNIX host

4) Clear locks on filer

5) Remove the NFS lock files on the host.

6) Restart lockd/statd processes on UNIX host

7) Remount the database volumes on the UNIX host

8) Restart databases

Detailed Procedure:

1) Shutdown all Oracle databases being run by the affected server.

Issue the Oracle shutdown immediate command and verify that no database processes are still running by issuing the UNIX command ps -ef |grep -i ora on the UNIX database host.

If database processes are still running issue the Oracle shutdown abort command and use the UNIX command ps -ef | grep -i ora to verify that no database processes are still running.

If database processes are still running do the following from the UNIX command line:
ps -ef | grep ora to get process id's (pid's) of remaining Oracle processes

kill -9 pid for each remaining Oracle process.

2) Unmount all database volumes using the UNIX umount command.

3) Kill statd and lockd processes on the UNIX host in the order specified below:
Determine the process id's (pid's) of statd and lockd from the UNIX command line:

ps -ef |grep lockd

ps -ef |grep statd

kill [lockd_process_id]

kill [statd_process_id]

4) Remove locks from filer

Execute the following from the filer command line:

filer> priv set advanced

filer> sm_mon -l (In many cases specifying the host name does not clear all the affecting locks, so the recommendation is to NOT specify a hostname)

Delete all files in the filer's "/etc/sm" directory. (Remove the files only. Do NOT remove the "/etc/sm" directory itself.)

If the filer is running Data ONTAP 7.1 or higher run 'lock break -h [hostname]' to release any locks that still exist.

Note:
If the 'lock break -h [hostname]' doesn't work, ensure that the server name that you are entering is not the same as the one that the filer has.

If the locks are not cleared, run 'lock break -p nlm' (This also requires Data ONTAP 7.1 or higher). This will clear all the NFS locks on the filer. This will not sever any NFS connections, it will simply force the processes to re-request the locks for the files they are writing to.

5) Remove the NFS lock files on the host.

From TR-3183 - Using the Linux NFS Client with Network Appliance Storage,
rpc.statd uses gethostbyname() to determine the client's name, but lockd (in the Linux kernel) uses uname -n.
By changing the HOSTNAME= fully qualified domain name, lockd will use an FQDN when contacting the storage. If there is a lnx_node1.iop.eng.netapp.com and also a lnx_node5.ppe.iop.eng.netapp.com contacting the same NetApp storage, the storage will be able to correctly distinguish the locks owned by each client. Therefore, we recommend using the fully qualified name in /etc/sysconfig/network. In addition to this, sm_mon -l or lock break on the storage will also clear the locks on the storage which will fix the lock recovery problem.

Additionally, if the client's nodename is fully qualified (that is, it contains the hostname and the domain name spelled out), then rpc.statd must also use a fully qualified name. Likewise, if the nodename is unqualified, then rpc.statd must use an unqualified name. If the two values do not match, lock recovery will not work. Be sure the result of gethostbyname(3) matches the output of uname -n by adjusting your client's nodename in /etc/hosts, DNS, or your NIS databases.

6) Start the UNIX statd and lockd processes from the UNIX host command line in the order specified below:

/usr/lib/nfs/statd

/usr/lib/nfs/lockd

7) Mount the database volumes on the UNIX host.

8) Start the database(s) and test for availability.

Monday, November 08, 2010

How to send email from the Linux command line using mail and mutt.

The Linux command line can be very powerful once you know how to use it. You can parse data, monitor processes, and do a lot of other useful and cool things using it. We will begin with the “mail” command.

MAIL

First run a quick test to make sure the “sendmail” application is installed and working correctly.

# rpm -qa | grep sendmail

The output should give you the version of sendmail installed "sendmail-8.13.8-2.el5".

Then execute the following command, replacing “abc@xyzemail.com” with your e-mail address.

# mail -s “Test email” abc@xyzemail.com

Hit the return key and you will come to a new line. Enter the text “This is a test mail”.
Follow up the text by hitting the return key again. Then hit the key combination of Control+D to continue.
The command prompt will ask you if you want to mark a copy of the mail to any other address, hit Control+D again.
Check your mailbox. This command will send out a mail to the email id mentioned with the subject, “Test email”.

To add content to the body of the mail while running the command you can use the following options. If you want to add text on your own:

# echo “This will go into the body of the mail.” | mail -s “Test email” abc@xyzemail.com

And if you want mail to read the content from a file:

# mail -s “Test email” abc@xyzemail.com < /home/oracle/test.log

Some other useful options in the mail command are:

-s subject (The subject of the mail)
-c email-address (Mark a copy to this “email-address”, or CC)
-b email-address (Mark a blind carbon copy to this “email-address”, or BCC)

Here’s how you might use these options:

# echo “This will go into the body of the mail” | mail -s “Test email” abc@xyzemail.com -c lmn@xyzemail.com -b pqr@xyzemail.com

MUTT

One of major drawbacks of using the mail command is that it does not support the sending of attachments. mutt, on the other hand, does support it. I’ve found this feature particularly useful for scripts that generate non-textual reports or backups which are relatively small in size which I’d like to backup elsewhere. Of course, mutt allows you to do a lot more than just send attachments. It is a much more complete command line mail client than the “mail” command. Right now we’ll just explore the basic stuff we might need often. Here’s how you would attach a file to a mail:

# echo “Sending an attachment.” | mutt -a backup.zip -s “attachment” abc@xyzemail.com

This command will send a mail to abc@xyzemail.com with the subject (-s) “attachment”, the body text “Sending an attachment.”, containing the attachment (-a) backup.zip. Like with the mail command you can use the “-c” option to mark a copy to another mail id.

SENDING MAIL FROM A SHELL SCRIPT

Now, with the basics covered you can send mails from your shell scripts. Here’s a simple shell script that gives you a reading of the usage of space on your partitions and mails the data to you.

#!/bin/bash
df -h | mail -s “disk space report” abc@xyzemail.com

Save these lines in a file on your Linux server and run it. You should receive a mail containing the results of the command. If, however, you need to send more data than just this you will need to write the data to a text file and enter it into the mail body while composing the mail. Here’s and example of a shell script that gets the disk usage as well as the memory usage, writes the data into a temporary file, and then enters it all into the body of the mail being sent out:

#!/bin/bash
df -h > /tmp/mail_report.log
free -m >> /tmp/mail_report.log
mail -s “disk and RAM report” abc@xyzemail.com < /tmp/mail_report.log

Now here’s a more complicated problem. You have to take a backup of a few files and mail then out. First the directory to be mailed out is archived. Then it is sent as an email attachment using mutt. Here’s a script to do just that:

#!/bin/bash
tar -zcf /tmp/backup.tar.gz /home/oracle/files
echo | mutt -a /tmp/backup.tar.gz -s “daily backup of data” abc@xyzemail.com

The echo at the start of the last line adds a blank into the body of the mail being set out.