/* A PACKAGE FOR GLOBALLY ALIGNING TWO SEQUENCES WITHIN AN ARBITRARY REGION: To invoke, call ALIGN(,B,M,N,L,R,V,G,E,S). The parameters are explained as follows: A, B : two sequences to be aligned M : the length of sequence A N : the length of sequence B L : left boundary R : right boundary V : scoring table for matches and mismatches G : gap-opening penalty E : gap-extension penalty S : script for DISPLAY routine */ #include #define MININT -999999 #define max(x,y) ((x) >= (y) ? (x) : (y)) #define min(x,y) ((x) <= (y) ? (x) : (y)) #define gap(k) ((k) <= 0 ? 0 : (g+e*(k))) /* k-symbol indel cost */ /* Append "Delete k" op */ #define DEL(k) \ { \ if (last < 0) \ last = sapp[-1] -= (k); \ else \ last = *sapp++ = -(k); \ } /* Append "Insert k" op */ #define INS(k) \ { \ if (last > 0) \ last = sapp[-1] += (k); \ else \ last = *sapp++ = (k); \ } /* Append "Replace" op */ #define REP \ { last = *sapp++ = 0; \ } static long *SS, *DD, I; static int *SP, *DP, IP; static char *ST, *DT, IT; static int *PP[3]; static char *PT[3]; static int *RI; /* map back diagonal into its row index */ static long (*v)[128]; /* v = V */ static long g, e, m; /* g = G, e = E, m = g+e */ static int *sapp; /* Current script append ptr */ static int last; /* Last script op appended */ char *A, *B; int *L, *R; static long align(int, int, char, int, int, char); static long CHECK_SCORE(); static long align(int I1, int J1, char Type1, int I2, int J2, char Type2) { int i, j, l, r, b, b1, b2; int i1, j1, i2, j2; char t1, t2; int cmid, nmid, pmid, mr, pmr; long s, sp, st, c, cp, ct, d, dp, dt, ti, *va; char flag, rflag, dflag, sflag; int l_save, r_save; /* printf("I1= %d, J1= %d, Type1 = %d, I2= %d, J2= %d, Type2= %d\n",I1,J1,Type1,I2,J2,Type2); for (i=I1; i<= I2; ++i) printf("%d : %d %d\n",i,L[i],R[i]); */ fflush(stdout); /* Initializations */ if (Type2 == 0) SS[J2] = DD[J2] = I = 0; else if (Type2 == 1) { DD[J2] = 0; I = SS[J2] = MININT; } else { I = 0; DD[J2] = SS[J2] = MININT; } RI[I2+J2] = I2; cmid = J2; if (I1 < I2) nmid = (L[I2-1] + R[I2-1] + 1)/2; else nmid = J1; l = L[I2]; for (j = J2-1; j >= l; --j) { I = I - e; DD[j] = SS[j] = I - g; if (j+1 >= nmid) { /* (I2,j+1) is a partition point */ IP = SP[j] = I2+j+1; IT = ST[j] = 2; } else { IP = SP[j] = IP; IT = ST[j] = IT; } if (j >= nmid) { /* (I2,j) is a partition point */ b = I2+j; RI[b] = I2; PP[2][b] = PP[0][b] = IP; PT[2][b] = PT[0][b] = IT; PP[1][b] = DP[j] = b; PT[1][b] = DT[j] = 0; } else { DP[j] = SP[j]; DT[j] = ST[j]; } } c = SS[l]; SS[J2+1] = MININT; for (j = l-1; j >= J1; --j) SS[j] = DD[j] = MININT; for (i = I2-1; i >= I1; --i) { pmr = mr; mr = min(L[i+1], R[i]); pmid = cmid; cmid = nmid; if (i > I1) nmid = (L[i-1]+R[i-1] + 1)/2; else nmid = J1; l = L[i]; r = R[i]; I = MININT; s = SS[r+1]; sp = SP[r+1]; st = ST[r+1]; va = v[A[i+1]]; flag = 0; if ((r+1>=cmid && r+1<=pmid) || (i+1pmid && r+1<=pmr)) dflag = 1; else dflag = 0; for (j = r; j >= l; --j) { rflag = flag; sflag = dflag; if ((j >= nmid && j <= cmid) || (i < I2 && j > cmid && j <= mr)) { flag = 1; b = i + j; RI[b] = i; } else flag = 0; if ((j >= cmid && j <= pmid) || (i+1pmid && j<=pmr)) dflag = 1; else dflag = 0; if (j < J2) c = s + va[B[j+1]]; else c = MININT; d = DD[j] - m; ti = I - m; if (c >= max(d,ti)) { if (sflag) { cp = i+j+2; ct = 0; } else { cp = sp; ct = st; } } else if (d >= ti) { c = d; if (dflag) { cp = i+j+1; ct = 1; } else { cp = DP[j]; ct = DT[j]; } } else { c = ti; if (rflag) { cp = i+j+1; ct = 2; } else { cp = IP; ct = IT; } } if (c >= (d = DD[j]-e)) { d = c; if (flag) { dp = i+j; dt = 0; } else { dp = cp; dt = ct; } } else { if (dflag) { dp = i+j+1; dt = 1; } else { dp = DP[j]; dt = DT[j]; } } if (c >= (I = I-e)) { I = c; if (flag) { IP = i+j; IT = 0; } else { IP = cp; IT = ct; } } else { if (rflag) { IP = i+j+1; IT = 2; } } s = SS[j]; sp = SP[j]; st = ST[j]; SS[j] = c; SP[j] = cp; ST[j] = ct; DD[j] = d; DP[j] = dp; DT[j] = dt; if (flag) { /* save information in partition nodes */ PP[0][b] = cp; PT[0][b] = ct; PP[1][b] = dp; PT[1][b] = dt; PP[2][b] = IP; PT[2][b] = IT; } } SS[r+1] = MININT; } b = I2 + J2; i1 = I1; j1 = J1; t1 = Type1; b1 = I1 + J1; while (b1 != b) { b2 = PP[t1][b1]; i2 = RI[b2]; j2 = b2 - i2; t2 = PT[t1][b1]; if (i1 == i2) { if (j2 > j1) INS(1) } else if (j1 == j2) DEL(1) else if (i1+1 == i2 && j1+1 == j2) { if (t2 == 1) { INS(1) DEL(1)} else if (t2 == 2) { DEL(1) INS(1) } else REP } else { l_save = L[i2]; r_save = R[i2]; if (j1 < (L[i1]+R[i1] + 1)/2) { for (i = i2; i > i1; --i) { nmid = (L[i-1]+R[i-1] + 1)/2; L[i] = max(j1, L[i]); R[i] = nmid-1; } L[i1] = j1; R[i1] = j1; R[i2] = j2; } else { for (i = i1; i < i2; ++i) { cmid = (L[i]+R[i] + 1)/2; L[i] = max(cmid,L[i+1])+1; R[i] = min(j2, R[i]); } L[i1] = j1; L[i2] = j2; R[i2] = j2; } align(i1,j1,t1,i2,j2,t2); L[i2] = l_save; R[i2] = r_save; } i1 = i2; j1 = j2; t1 = t2; b1 = b2; } return c; } long ALIGN(char Seq1[], char Seq2[], int bi, int bj, int ei, int ej, int lb[], int rb[], long V[][128], long G, long E, int S[]) { long c; int i, j, mb, *ptb; long check_score; char *ckalloc(); int M, N; A = Seq1; B = Seq2; L = lb; R = rb; v = V; /* Setup global parameters */ g = G; e = E; m = g+e; sapp = S; last = 0; M = ei - bi; N = ej - bj; j = (N+2) * sizeof(long); SS = ((long *) ckalloc(j)) - bj; DD = ((long *) ckalloc(j)) - bj; j = (N+2) * sizeof(int); SP = ((int *) ckalloc(j)) - bj; DP = ((int *) ckalloc(j)) - bj; j = (N+2) * sizeof(char); ST = ((char *) ckalloc(j)) - bj; DT = ((char *) ckalloc(j)) - bj; j = (M+N+1) * sizeof(int); PP[0] = (int *) ckalloc(j) - bi - bj; PP[1] = (int *) ckalloc(j) - bi - bj; PP[2] = (int *) ckalloc(j) - bi - bj; RI = (int *) ckalloc(j) - bi - bj; j = (M+N+1) * sizeof(char); PT[0] = (char *) ckalloc(j) - bi - bj; PT[1] = (char *) ckalloc(j) - bi - bj; PT[2] = (char *) ckalloc(j) - bi - bj; c = align(bi,bj,0,ei,ej,0); check_score = CHECK_SCORE(Seq1+bi,Seq2+bj,M,N,S); #ifdef STATS printf("\nAlign_score=%.1f\n",((float)c)/10.0); printf("\nCheck_score=%.1f\n",((float)check_score)/10.0); if (check_score != c) printf("\nCheck_score=%.1f\n",((float)check_score)/10.0); #endif free(SS+bj); free(DD+bj); free(SP+bj); free(DP+bj); free(ST+bj); free(DT+bj); free(PP[0]+bi+bj); free(PP[1]+bi+bj); free(PP[2]+bi+bj); free(RI+bi+bj); free(PT[0]+bi+bj); free(PT[1]+bi+bj); free(PT[2]+bi+bj); return c; } /* Alignment display routine */ static char ALINE[51], BLINE[51], CLINE[51]; long DISPLAY(char A[], char B[], int M, int N, int S[], int AP, int BP) { register char *a, *b, *c; register int i, j, op; int lines, ap, bp; i = j = op = lines = 0; ap = AP; bp = BP; a = ALINE; b = BLINE; c = CLINE; while (i < M || j < N) { if (op == 0 && *S == 0) { op = *S++; *a = A[++i]; *b = B[++j]; *c++ = (*a++ == *b++) ? '|' : ' '; } else { if (op == 0) op = *S++; if (op > 0) { *a++ = ' '; *b++ = B[++j]; op--; } else { *a++ = A[++i]; *b++ = ' '; op++; } *c++ = '-'; } if (a >= ALINE+50 || i >= M && j >= N) { *a = *b = *c = '\0'; printf("\n%6d ",50*lines++); for (b = ALINE+10; b <= a; b += 10) printf(" . :"); if (b <= a+5) printf(" ."); printf("\n%6d %s\n %s\n%6d %s\n",ap,ALINE,CLINE,bp,BLINE); ap = AP + i; bp = BP + j; a = ALINE; b = BLINE; c = CLINE; } } } /* CHECK_SCORE - return the score of the alignment stored in S */ static long CHECK_SCORE(char A[], char B[], int M, int N, int S[]) { register long i, j, op; long score; score = i = j = op = 0; while (i < M || j < N) { op = *S++; if (op == 0) score = v[A[++i]][B[++j]] + score; else if (op > 0) { score = score - (g+op*e); j = j+op; } else { score = score - (g-op*e); i = i-op; } } return(score); }