Skip to main content
formatting, grammar.
Source Link
Nick Udell
  • 5.2k
  • 1
  • 29
  • 68

Uploading a .csv to a NoSQL cluster - batch beat outfaster than consumer/producer

I was tasked with making a program that uploads a .csv to a NoSQL cluster. We were doing 2-5 gigThe files, and it was taking 4 are larger (typically 2-8 hours. Well when we hit 17 gig files that was just to slow17GB).

When I remade my My program I realized workingworks in batch made is much much faster. I could getmode and can process a 17 gig17GB file done in 6 hours. So then 

I decided to make a consumer-producer multithreading structure. This caused it to be just as slow as beforesignificantly slower. Although my program is fast and working great, I want to know why the producer-consumer construct was slower than a batch produce, batch consume method.

As compairedcompared to

The bottom half are infinite loops, but I just left it like that for a speed test.

Uploading a .csv to a NoSQL cluster - batch beat out consumer/producer

I was tasked with making a program that uploads a .csv to a NoSQL cluster. We were doing 2-5 gig files, and it was taking 4-8 hours. Well when we hit 17 gig files that was just to slow.

When I remade my program I realized working in batch made is much much faster. I could get a 17 gig file done in 6 hours. So then I decided to make a consumer-producer multithreading structure. This caused it to be just as slow as before. Although my program is fast and working great, I want to know why the producer-consumer construct was slower than a batch produce, batch consume method.

As compaired to

The bottom half are infinite loops, but I just left it like that for a speed test.

Uploading a .csv to a NoSQL cluster - batch faster than consumer/producer

I was tasked with making a program that uploads a .csv to a NoSQL cluster. The files are larger (typically 2-17GB). My program works in batch mode and can process a 17GB file in 6 hours. 

I decided to make a consumer-producer multithreading structure. This caused it to be significantly slower. I want to know why the producer-consumer construct was slower than a batch produce, batch consume method.

As compared to

The bottom half are infinite loops for a speed test.

deleted 5 characters in body
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

EDIT: TheThe bottom half are infinite loops, but iI just left it like that for a speed test.

EDIT: The bottom half are infinite loops, but i just left it like that for speed test.

The bottom half are infinite loops, but I just left it like that for a speed test.

added 1 character in body; edited title
Source Link
Jamal
  • 35.2k
  • 13
  • 134
  • 238

Batch Uploading a .csv to a NoSQL cluster - batch beat out consumer/producer, and i dont know why

I was tasked with making a program that uploads a csv.csv to a nosqlNoSQL cluster. We were doing 2-5 gig files, and it was taking 4-8 hours. Well when we hit 17 gig files that was just to slow.

When I remade my program I realized working in batch made is much much faster. I could get a 17 gig file done in 6 hours. So then iI decided to make a consumer-producer multithreading structure. This caused it to be just as slow as before. Although my program is fast and working great, I want to know why the producer-consumer construct was slower than a batch produce, batch consume method.

Batch beat out consumer/producer, and i dont know why

I was tasked with making a program that uploads a csv to a nosql cluster. We were doing 2-5 gig files, and it was taking 4-8 hours. Well when we hit 17 gig files that was just to slow.

When I remade my program I realized working in batch made is much much faster. I could get a 17 gig file done in 6 hours. So then i decided to make a consumer-producer multithreading structure. This caused it to be just as slow as before. Although my program is fast and working great, I want to know why the producer-consumer construct was slower than a batch produce, batch consume method.

Uploading a .csv to a NoSQL cluster - batch beat out consumer/producer

I was tasked with making a program that uploads a .csv to a NoSQL cluster. We were doing 2-5 gig files, and it was taking 4-8 hours. Well when we hit 17 gig files that was just to slow.

When I remade my program I realized working in batch made is much much faster. I could get a 17 gig file done in 6 hours. So then I decided to make a consumer-producer multithreading structure. This caused it to be just as slow as before. Although my program is fast and working great, I want to know why the producer-consumer construct was slower than a batch produce, batch consume method.

Source Link
Loading