After the invention of World Wide Web (WWW) by British scientist Tim Berners-Lee in 1989 within couple of years web browser application was released to the general public. The Internet become a platform with new commercial, social, cultural and technical opportunities.
Following the release of web browser WWW grew rapidly, so did the users who use Internet. Info.cern.ch was the first ever published website in August 1991 [1]. Currently there are 1.7 billion websites hosted in the internet [2]. Having said that, not all the websites may be active today. A website means a unique hostname which can be resolved using DNS service.
Most of the websites which we access regularly are indexed in the web search engine database. But there are other legitimate websites which are not listed in the web search engine index (hidden) and these cannot be accessed without having the URL.
Deep Web is a term mostly used in the IT industry to refer websites which are not listed in the web search engine index [3]. Most people have a misconception that Deep Web sites are related to bad business similar to Dark Web (or Dark Net) which is incorrect. For example, wordpress.com is a famous blogs site has setting to avoid the page from search engines. Clicking this option will keep that user’s webpage in Deep Web category — means hidden, out from search engines. That doesn’t mean that the user’s webpage is associated with criminal or anti-social activities.
Search engine companies use web crawlers (automated program) to browse WWW and download the webpages and its URLs in a database. Web administrations can make settings in robots.txt files to allow or disallow a search engine crawler from indexing. Also if a webpage is password protected, then crawlers cannot access the web page and cannot add the page to search engine database [4].
Similar to Deep Web, on a high-level Dark Web is an industrial term used to represent a part of Internet which is not indexed in web search engines and needs special applications to access its webpages. Dark Web are anonymity overlay networks in the Internet which are built for privacy reasons, mainly to avoid surveillance of Government or National agencies. There are couple of famous anonymous networks active today in the Internet.
Freenet Project is a free and open source based anonymous network started by Ian Clark [5]. A dynamic peer-to-peer style with decentralised network architecture where nodes are encrypted and routed via multiple nodes to make it difficult to trace. Every computer (node) has to install the application to access the resources in Freenet. Each node provides the network and some storage space. Every file added into the Freenet peer-to-peer network has a Global Unique Identifier (GUID). File maybe stored in few nodes in a distributed way and in the files lifetime, the file might copy or be migrated to other nodes. A user can access a file via the freenet application by requesting the file’s GUID.
The Invisible Internet Project (I2P) create anonymous network layer. I2P applications has to install on the computer to access the I2P network. During installation, I2P application generate unique cryptographic identity (node identity) for each computer. I2P application use the anonymous network layer to exchange messages between the cryptographic identifiers.
I2P uses garlic routing, which can carry multiple encrypted messages in a bundle called “clove” along with layered encryption of messages. Every message are exchanged between the node identities using unidirectional tunnels. Inbound tunnel to receive message and an outbound tunnel to send message back. I2P website are called “Eepsite” with .i2p extension similar to .com and can access only via I2P applications/ network. Example elgoog.l2p
The Tor Project is one of the famous anonymous networks. Initially created as part of U.S Navel Research Lab (NRL) project to access Internet with privacy [6]. The main purpose of the project is to keep the communication between the user (Originator) and the destination (Responder) anonymously. Tor can access simply by installing a Tor browser.
Tor uses onion routing concept, which encrypts the data at each step of the routing. To avoid tracing, traffic is routed through multiple Tor nodes. Onion routing is designed to hide the header information and makes it extremely hard for anyone to identify the originator’s source, traffic pattern, location information etc. Tor grew from few nodes to thousands of nodes run by volunteers today. The users to access Tor grew to millions [7].
The Tor is more popular and has large user base may be the reason it attracts more anti-social elements and criminals. The anonymity it provides to access the hidden services inside the Tor network is exactly what bad actors want. There is no doubt that Dark Web has large online black market. Silk Roadsite (silkroad6ownowfk[.]onion) was one of the famous black market in Tor. As per FBI, Silk Road hidden service used by hundreds of drug dealers and others to sell unlawful goods and services. There are Movies, TV shows, News about the harmful and negative side of the Dark Web. But is not always true that Dark Web is only associated with bad business.
Arguably there are legitimate purposes to use anonymous network. For the whistle-blowers, who can share secret stories anonymously or can submit a confidential tip to a news agency. For investigate journalists to share and exchange information. For countries where Internet is forbidden or strictly controlled, helps to thwart the website ban.
Most news agencies and online media has created secure drop sites in Tor network to collect confidential information from whistle-blowers. Wired UK is a magazine company has Tor site (k5ri3fdr232d36nb[.]onion) hosted in the Dark web. Aiming to help people to share confidential information anonymously. Anyone can send information using the link without revealing one’s identity.
For investigation journalism, companies like BrightPlanet use Deep Web and Dark Web to perform investigation for their client’s fraudulent trademark usage, or to combat pharmaceutical fraud or to analyze news focusing on terror group[8].
To fight media censorship, famous online website names such Facebook (www.facebookcorewwwi[.]onion) launched Tor website in 2016. In 2017, New York Times launched their news website (www.nytimes3xbfgragh[.]onion) on Tor. BBC (www.bbcnewsv2vjtpsuy[.]onion) has International news website since last year.
Even the service providers are slowly starting to support the Tor services. Recently Cloudflare has announced that they are starting DNS service for Tor onion network. They are providing a privacy-first DNS resolver service for Tor network. This is first of its kind and a welcome move for the people who are looking forward to use Tor[9].
[1] “First URL active once more,” [Online]. Available: https://first-website.web.cern.ch/blog/first-url-active-once-more.
[2]Internet Live Stats, “Total number of Websites,” [Online]. Available: https://www.internetlivestats.com/total-number-of-websites/#ref-2.
[3] Google, “Introduction to robots.txt,” [Online]. Available: https://support.google.com/webmasters/answer/6062608?hl=en.
[4] The Journal of Electronic Publishing, “White Paper: The Deep Web: Surfacing Hidden Value,” [Online]. Available: https://quod.lib.umich.edu/cgi/t/text/idx/j/jep/3336451.0007.104/–white-paper-the-deep-web-surfacing-hidden-value?rgn=main;view=fulltext.
[5]Freenet Project, “What is Freenet?,” [Online]. Available: https://freenetproject.org/pages/about.html.
[6]Tor Project, “History,” [Online]. Available: https://www.torproject.org/about/history/.
[7]Tor Project, “Tor Metrics,” [Online]. Available: https://metrics.torproject.org/userstats-relay-table.html.
[8]https://brightplanet.com/2018/03/05/visualizing-terror-groups-named-entity-tagging/
[9]CloudFlare, “Introducing DNS Resolver for Tor,” [Online]. Available: https://blog.cloudflare.com/welcome-hidden-resolver/.