4
\$\begingroup\$

Source: Rosalind("Consensus and Profile")

Brief summary

 A T C C A G C T G G G C A A C T A T G G A T C T DNA Strings A A G C A A C C T T G G A A C T A T G C C A T T A T G G C A C T A 5 1 0 0 5 5 0 0 Profile C 0 0 1 4 2 0 6 1 G 1 1 6 3 0 1 0 0 T 1 5 0 0 0 1 1 6 Consensus A T G C A A C T 

Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.

Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)

Model (cons.rb):

#!/usr/bin/env ruby require_relative '../ie_module' class DnaConsensus include ImportExport DNA_BASES = %w(A C G T) attr_reader :dna_strings, :consensus, :profile def initialize(source = "rosalind_#{current_dir_name}.txt") @dna_strings = (source =~ /txt$/ ? import_lines(source) : source).values @profile = build_profile @consensus = build_consensus end def to_s "#{consensus.join}\n#{stringify(profile)}" end private def build_profile prof = DNA_BASES.map{|b| [b, []]}.to_h dna_strings.map(&:chars).transpose.each.with_object(prof) do |arr, hsh| hsh.merge!(hashed(arr)){ |_, oldval, newval| oldval << newval } end end def hashed(arr) hsh = arr.group_by(&:chr).map{ |k,v| [k, v.size] }.to_h (DNA_BASES - hsh.keys).each { |b| hsh[b] = 0 } hsh end def build_consensus dna_strings.first.length.times.with_object([]) do |index, arr| arr << profile.max_by{|_, list| list[index]}.first end end end a = DnaConsensus.new a.export_to_file([a.to_s]) 

File read/write logic (ie_module.rb):

module ImportExport def export_to_file(result, file = "result_#{current_dir_name}.txt") File.open(file, 'w') do |f| result.each{ |val| f << "%s" % val } end end private def current_dir_name File.basename(Dir.getwd) end def stringify(obj) if obj.is_a?(Hash) then obj.map{|k,v| "#{k}: #{v.join(' ')}"} else obj.map{|e| e.join(' ')} end.join("\n") end def import_lines(file) File.foreach(file).with_object({}) do |line, hsh| line = line.strip.sub(/^>/, '') $' ? hsh[line] = '' : hsh[hsh.keys.last] << line end end end 

Here is a lot of code, but #build_profile is the most "complicated" part. I know, that "alternate way" exists. All suggestions are welcome.

\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

I am going to ignore all the file reading code, which is extraneous to the problem, and focus just on simplifying the code to find the consensus. It can be done in essentially a single a line, the one beginning consensus = .... Everything else is just setting up the sample data.

transpose gets us the columns, and max_by ... count get us the most frequently occurring nucleotide:

matrix = <<EOS A T C C A G C T G G G C A A C T A T G G A T C T A A G C A A C C T T G G A A C T A T G C C A T T A T G G C A C T EOS .split("\n").map{|x| x.split(' ')} nucleotides = %w(A C G T) consensus = matrix.transpose.map {|x| nucleotides.max_by {|n| x.count(n)}} p consensus #=> ["A", "T", "G", "C", "A", "A", "C", "T"] 
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.