0

I am trying to split a 16-length char array of 1's and 0's into 2 equal length integers for 8-bit binary conversion to decimal.

Example: char* str = "0001011011110000" Result Expected:

int s = 00010110; int t = 11110000; 

Full Code: What it Does: The user inputs a string of DNA (ex: ATTCGG). If the string is not divisible by 4, it will pad extra characters using strcat(). Then it will convert each char to a two-bit string in the new char array xtr[64]. This array must then be converted into two equal length 8-bit binary integers to be converted into two decimal numbers that represent the DNA string. Basically DNA binary compression is the assignment.

int main() { char str[64]; scanf("%s", str); int obe = strlen(str); int mod = obe % 4; if (mod != 0) { for (int i = mod; i > 0; i--) { strcat(str, "0"); } } int j; char xtr[64] = ""; for (j = 0; j < strlen(str); j++) { if (str[j] == 'A') { strcat(xtr, "0"); strcat(xtr, "0"); } else if (str[j] == 'T') { strcat(xtr, "0"); strcat(xtr, "1"); } else if (str[j] == 'C') { strcat(xtr, "1"); strcat(xtr, "0"); } else if (str[j] == 'G') { strcat(xtr, "1"); strcat(xtr, "1"); } else if (str[j] == '0') { strcat(xtr, "0"); strcat(xtr, "0"); } } int k = strlen(xtr) / 2; char ret[64]; for (int i = 0; i < k; i++) { ret[i] = xtr[i]; } char ter[64]; for (int i = k + 1; i < strlen(xtr); i++) { ter[i] = xtr[i]; } int s = atoi(ret); int t = atoi(ter); printf("%s", str); printf("\n"); printf("%s", xtr); printf("\n"); printf("%d", s); printf("\n"); printf("%d", t); } 

Result: ATTCGG00 0001011011110000 10110 0

Problem: The second integer is not being converted correctly, and this code is very primitive. May need bitwise operators.

8
  • 3
    ret[i] writes to an uninitialized pointer which is obviously wrong. Apart from that, what "pointer casts"? There are no casts in the code posted. Please post complete code and exact compiler messages. Commented Feb 22, 2023 at 14:27
  • Two points about using atoi(): a) you must terminate the string you pass, b) it doesn't convert binary, for that you need strtol(). Commented Feb 22, 2023 at 14:29
  • How can I comment the full code on here? Commented Feb 22, 2023 at 14:37
  • 1
    You can click the edit link under the question. Commented Feb 22, 2023 at 14:38
  • I edited the code to include the prints. This is a code section in a more complex code that converts letters to 1's and 0's. Then converts that char array of digits into equal-length integers for binary conversions Commented Feb 22, 2023 at 14:46

2 Answers 2

1
#include <stdio.h> int parseBitChars(char* str, int bitCount) { int ret = 0; for (int i = 0; i != bitCount; i++) ret = (ret << 1) | (str[i] == '1' ? 1 : 0); return ret; } int main() { char* str = "0001011011110000"; // Parse whole string in one go printf("Value: %d\n", parseBitChars(str, 16)); // Value: 5872 // Or split into bytes int a = parseBitChars(str, 8); int b = parseBitChars(str + 8, 8); printf("Bytes: %d %d\n", a, b); // Bytes: 22 240 } 
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks EDD. I assume the reverse of that method is to input the two integers, inverse the bitwise operators, and return the char* array?
Haris, my code is very primitive, but it still took hours to write and perfect the top portion of converting letters to bit-strings and combining those strings. EDD is just helping with the final part of compressing the char array into two decimal numbers
@MatthewMcCoy First you need to allocate enough space for the string, then you can start from the smallest bit and place it at the end of the string (x & 1 ? '1' : '0' to get convert lowest bit to char). Then to move to the next bit you can do x = x >> 1 and repeat, writing to the next last character. But all this is pretty tedious to write yourself so I would reccomend finding some code online.
Nice suggestion, what would be some good search terms to put in my search when looking for online code. Like "convert bytes to strings in c"
0

Here's a revamped version of your code with comments:

#include <stdio.h> #include <string.h> int main( void ) { char str[ 32 + 1 ]; // Up to 32 bases (plus terminator) char xtr[ 64 + 1 ] = ""; // Expands to 64 int obe; scanf( "%32s%n", str, &obe ); // Limit user entry for( int i = obe % 4; i > 0; i-- ) strcat( str, "A" ); // Pad (with 'A') to multiple of 4 // Convert bases to binary values in a string for( int j = 0; str[ j ]; j++ ) if ( str[j] == 'A' ) strcat( xtr, "00" ); else if ( str[j] == 'T' ) strcat( xtr, "01" ); else if ( str[j] == 'C' ) strcat( xtr, "10" ); else if ( str[j] == 'G' ) strcat( xtr, "11" ); // Output in blocks of 8 digits. for( int k = 0, len = strlen( xtr ); k < len; k += 8 ) printf( "%d - %.8s\n", k, xtr + k ); return 0; } 
ATTCGG 0 - 00010110 8 - 11110000 

Converting a DNA sequence to an intermediary string is unnecessary.

Fortuitously, the ASCII code for the letters 'A', 'C', 'G' and 'T' encode well enough in bit 1 and bit 2. Note: This encoding differs from yours, assigning different bit patterns to represent each base.

'A' = 0bxxxxx00x ==> 0 // 'x' == "don't care" 'C' = 0bxxxxx01x ==> 2 'G' = 0bxxxxx11x ==> 6 'T' = 0bxxxxx10x ==> 4 

The downside is that conventional "ACGT" swaps the order of the last two bases.

This 'swap' can be 'unswapped' with a translation using a crafted 8 bit hexadecimal value.

Explore the following code and study the demonstration strings below:

#include <stdio.h> void demo( char *p ) { // chunks of bases into registers puts( p ); while( *p ) { // unsigned char asBits = 0; // 4 bases/chunk // unsigned short asBits = 0; // 8 bases/chunk unsigned int asBits = 0; // 16 bases/chunk // unsigned long asBits = 0; // 32 bases/chunk const int pack = sizeof(asBits) * 4; // The ASCII for each of ACGT is pretty fortunate; can be hashed to two bits 0-3. // 0xB4: (0b10110100) 4 pairs of bits crafted to correspond to "GTCA" (reversed for shifting.) // Note that T&G are swapped by that 'magic byte' to conform to conventional "ACGT" // "AND"ing with 6 masks for the two fortunate bits, // "0xB4" is right shifted 0, 2, 6 or 4 bits, // that is then masked (3&) for its lowest two bits. // 'A'->0b00, 'C'->0b01, 'G'->0b10' and 'T'->0b11 // The accumulator is shifted and this pair OR'd where they belong. int i; for( i = pack; *p && i; p++, i-- ) asBits = asBits<<2 | (3 & (0xB4>>(*p&6))); // using one of several mapping functions // Sequence may not be modulo 16, so tack on extra 0b00 to pad as needed asBits <<= i+i; // padding for stragglers // Playback for verification printf( "%0*X - ", pack/2, asBits ); for( int j = pack+pack-2; j >= 0; j -= 2 ) putchar( "ACGT"[(asBits>>j)&3] ); putchar( '\n' ); } } int main( void ) { /* Some bonus alternative translation functions char *cp; # define M1 "\0\1\3\2"[*cp>>1&3] # define M2 "\0\0\0\1\3\0\0\2"[*cp&7] # define M3 3&0x8340>>(*cp<<1&0xF) # define M4 3&0xB4>>(*cp&6) char *n = "0123"; for( cp = "ACGT"; *cp; cp++ ) printf( "%c %c%c%c%c\n", *cp, n[M1], n[M2], n[M3], n[M4] ); */ demo( "TGCTTGCCTGCATGCA" ); // 16 bases demo( "TTGCTTGCCTGCATGCT" ); // 17 bases demo( "T" ); // 1-4 bases demo( "AT" ); demo( "AAT" ); demo( "AAAT" ); // lots of bases demo( "CATCATCATCATCATCATCATCATCATCATCATCATCATCATCAT" ); return 0; } 

Output demonstration:

TGCTTGCCTGCATGCA E7E5E4E4 - TGCTTGCCTGCATGCA TTGCTTGCCTGCATGCT F9F97939 - TTGCTTGCCTGCATGC C0000000 - TAAAAAAAAAAAAAAA T C0000000 - TAAAAAAAAAAAAAAA AT 30000000 - ATAAAAAAAAAAAAAA AAT 0C000000 - AATAAAAAAAAAAAAA AAAT 03000000 - AAATAAAAAAAAAAAA CATCATCATCATCATCATCATCATCATCATCATCATCATCATCAT 4D34D34D - CATCATCATCATCATC 34D34D34 - ATCATCATCATCATCA D34D34C0 - TCATCATCATCATAAA 

Play around with this for a while.


EDIT:
Here's another version of the core processing that simultaneously converts batches of 4 bases to both a string of 1's and 0's, and shows a decimal equivalent.

 unsigned char four = 0; // Convert bases to binary values in a string int j = 0; while( str[ j ] ) { if ( str[j] == 'A' ) strcat( xtr, "00" ), four = (four << 2) | 0; else if ( str[j] == 'T' ) strcat( xtr, "01" ), four = (four << 2) | 1; else if ( str[j] == 'C' ) strcat( xtr, "10" ), four = (four << 2) | 2; else if ( str[j] == 'G' ) strcat( xtr, "11" ), four = (four << 2) | 3; if( ++j % 4 == 0 ) { printf( "%s - %3d\n", xtr, four ); xtr[0] = '\0'; four = 0; } } 
ATTCGG 00010110 - 22 11110000 - 240 

2 Comments

Fe203, your code is amazingly simple. The only issue is that my code doesn't require the code to be printed in blocks of 8. Those blocks of 8 in binary are converted to decimal. So the output of the code should be: 22 240. I'm still going to use your first half for conversions, but the issue now is converting those byte blocks to decimals.
@MatthewMcCoy Thanks. Buried in the 2nd part of this answer is "shifting and OR'ing" to set bits (or not) in an 'accumulator'... It's unclear if conversion to a 'string' of 1's and 0's is necessary. It's possible to go direct from ACTG to setting bits in a byte, then printing that as a decimal value... Cheers!.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.