1

I am configuring an /etc/hosts file on my router, and have a list of over 135,000 domains that I need to block via this file, which takes up nearly 5MB of space. As it turns out, the available RAM on my router is extremely limited, so dnsmasq cannot load this file into memory without crashing.

At the moment, the host file looks something like this:

0.0.0.0 a.triggit.com 0.0.0.0 a.twiago.com 0.0.0.0 a.visualrevenue.com 0.0.0.0 a.websponsors.com 0.0.0.0 a.webwise.org 0.0.0.0 a15172379.alturo-server.de 0.0.0.0 a2.mediagra.com 0.0.0.0 a3.suntimes.com 0.0.0.0 a7cleaner.com 0.0.0.0 aa.agkn.com 0.0.0.0 aa.tweakers.nl 0.0.0.0 aaa-architecten.nl 0.0.0.0 aaa-arcobaleno.it 0.0.0.0 aads.treehugger.com 0.0.0.0 aan.amazon.com 0.0.0.0 aarth.net 0.0.0.0 aax-cpm.amazon-adsystem.com 0.0.0.0 aax-us-east.amazon-adsystem.com 0.0.0.0 aax-us-pdx.amazon-adsystem.com 0.0.0.0 aax.amazon-adsystem.com 0.0.0.0 ab.5.p2l.info 0.0.0.0 ab.tweakers.nl 0.0.0.0 ab913aa797e78b3.com 0.0.0.0 abclnks.com 0.0.0.0 abetterinternet.com 0.0.0.0 abi83-schramberg.de 0.0.0.0 aboardamusement.com 

However, I would like to change it to become this:

0.0.0.0 a.triggit.com a.twiago.com a.visualrevenue.com a.websponsors.com a.webwise.org a15172379.alturo-server.de a2.mediagra.com a3.suntimes.com a7cleaner.com aa.agkn.com aa.tweakers.nl aaa-architecten.nl aaa-arcobaleno.it aads.treehugger.com 0.0.0.0 aan.amazon.com aarth.net aax-cpm.amazon-adsystem.com aax-us-east.amazon-adsystem.com aax-us-pdx.amazon-adsystem.com aax.amazon-adsystem.com ab.5.p2l.info ab.tweakers.nl ab913aa797e78b3.com abclnks.com abetterinternet.com abi83-schramberg.de 0.0.0.0 aboardamusement.com 

None of these lines exceed 256-characters, and each domain from the original does list does not get broken into two (i.e. the 256-th character is not carried onto the new line).

How can I read the original hosts file, and create longer single-line expressions, which are at maximum 256-characters, but do not "eat up" some results (i.e. breaking on the last domain before the 256-character limit)?

5
  • I think you're missing the first line of expected output, the one starting with 0.0.0.0 a.kerg.net Commented Jul 22, 2022 at 18:12
  • 2
    If you need to block that many domains I would assume that you're running a site in a commercial environment. You should consider blocking using the enterprise router's configuration management, such as this for Cisco. If this is a home or small business environment you might look into a parental control solution. Commented Jul 22, 2022 at 18:16
  • This is not a commercial environment. I am downloading the host file provided by github.com/StevenBlack/hosts and simply don't want any device connected to my home router to resolve these addresses. These hosts include spam, malware, etc. Commented Jul 22, 2022 at 18:37
  • @JRogers97 that's a silly solution. Devices can use their own DNS so completely unaffected by the host definitions on the router. As said, this is really an XY problem and you need to find a better solution, like installing a firewall in the gateway Commented Jul 23, 2022 at 9:06
  • @phuclv They can, but they don't by default. When a friend comes to my apartment and uses my wifi with an iphone, I do not want them interacting with these domains. Unless I'm going to stop every single person entering my apartment and say "hey, let me install this custom profile for your iphone, which you'll need to disable when you leave (p.s tough luck if we forget and something doesn't work when you're not at my place)", this is the easiest solution for my old consumer router. I'm not interested in buying a raspberry pi for a pi-hole or something like this; it's unnecessary. End of story Commented Jul 23, 2022 at 18:53

1 Answer 1

0

This might be what you want, using any awk in any shell on every Unix box:

$ cat tst.awk NR == 1 { out = $1 } { prev = out out = out OFS $2 if ( length(out) > 256 ) { print prev out = $0 } } END { print out } 

$ awk -f tst.awk file 0.0.0.0 a.kerg.net a.libertystmedia.com a.ligatus.com a.ligatus.de a.mktw.net a.o333o.com a.phormlabs.com a.predictvideo.com a.prisacom.com a.rad.live.com a.rad.msn.com a.spolecznosci.net a.ss34.on9mail.com a.total-media.net a.tribalfusion.com 0.0.0.0 a.triggit.com a.twiago.com a.visualrevenue.com a.websponsors.com a.webwise.org a15172379.alturo-server.de a2.mediagra.com a3.suntimes.com a7cleaner.com aa.agkn.com aa.tweakers.nl aaa-architecten.nl aaa-arcobaleno.it aads.treehugger.com 0.0.0.0 aan.amazon.com aarth.net aax-cpm.amazon-adsystem.com aax-us-east.amazon-adsystem.com aax-us-pdx.amazon-adsystem.com aax.amazon-adsystem.com ab.5.p2l.info ab.tweakers.nl ab913aa797e78b3.com abclnks.com abetterinternet.com abi83-schramberg.de 0.0.0.0 aboardamusement.com 
2
  • This is an answer to the original question but I think this may be an XY problem. You're giving an answer but the OP may need to take a completely different approach. Commented Jul 22, 2022 at 18:19
  • 1
    Thank you, this is perfect. Commented Jul 22, 2022 at 18:37

You must log in to answer this question.