What Is The Deep Web?
The deep web is also referred to as a hidden web or invincible web. These are comprised of the World Wide Web. The contents of the deep web are not indexed by web engines that perform standard searches. However, the ‘Surface Web’ can be said to be the antonym of the deep web. It is can be accessed by almost everyone who uses the internet. This term was first coined by Michael K. Bergman, a computer scientist.
The contents of the deep web include all the forms hidden behind HTTP. Also, it has many common uses such as online banking, webmail, access to social media pages which are private or restricted. Web platforms that require paying for the services we have to access, are also easily accessed using this.
Also, forums that are protected by certain paywalls can also be broken through. Others among these include online magazines and newspapers. A direct URL or IP address can be used in order to access the website. But then, if it is a public website, then it may require a password. If not a password, then maybe any other form of accessing it will be in demand.
Terms
The first usage of the terms, “deep web” and “dark web” together happened around 2009. When search terminologies related to the dark web were held for discussion, there were mentions of illegal activities that keep occurring on free net and darknet. And for a very long time, deep web and dark web have been getting confused. Because of this, some people use them to refer to the same thing.
Hence, this perplexing state of usage has been termed as inaccurate. This has gone a step further to become a noticeable source of confusion. There have been arguments over the fact that these are supposed to be used distinctly. Standard browsers and methods don’t go well with the dark web.
Speaking about indexing
So, since contents of the dark web not getting indexed by normal browsers and sites are the facts that are getting important, let us have a look at the various indexing methods which can be further subdivided into categories:
1. Limited access to content:
Sites that have a limit to access their content, fall under this. Access to their web pages is provided by using some technical way. In fact, search engines are not allowed to browse them. Hence, this prevents the creation of cached copies of the same. Some examples of these include the Robots Exclusion Standard or CAPTCHAs.
2. Content of a scripted nature:
Pages that can be accessed using links that are produced using JavaScript and also those which allow content to be downloaded in a dynamic way by using solutions created primarily by Flash or Ajax.
3. Software:
Some types of content are hidden intentionally. However, this is done to keep those away from the general internet. Also, these are made accessible by using special software such as Tor, I2P, or any other software that might be in use for deep web. A major example of this is the usage of .onion server. The address of this server is used in a hidden fashion. Also, IP addresses are hidden in these cases.
4. Contextual web:
These constitute pages where content varies according to different contexts of accessing those.
5. Dynamic content:
Pages that are returned as a result of a query that is submitted or those that are accessed only through a form, especially if there are input elements of an open-domain kind that are put into use. An example of an open-domain element includes text fields. Such types of fields are difficult to go through if one is not well versed in the domain.
6. Non-HTML or text content:
Content in the form of text which is encoded in the form of any multimedia comprising of any image or video comes under this. Apart from that, these can be files in formats that are not handled by search engines.
7. Private web:
Sites that mostly require registration and login details are contained in this.
8. Content which is not linked:
Pages that do not have a direct linkage with other pages prevent web crawling programs from accessing any content. This type of content is used to refer to pages that do not have in links which are also sometimes termed as backlinks. Also, search engines do not detect backlinks from all web pages that are searched.
9. Web archives:
There are certain web archival services, for instance, the Wayback Machine which facilitates users to see those versions of web pages that have been archived throughout the time. These are not indexed by search engines such as Google.
Types of content
The content of any specific web server is not always possible to be discovered in order to index it. Any site can be called in an indirect way. This is mainly because of computer vulnerability.
In order to discover content that is present on the web, search engines implement web crawlers. These web crawlers should be efficient enough to be able to follow hyperlinks. This takes place through protocol numbers which are mostly virtual. This process is very well suited in order to discover content which is present on the surface web. Though, it is a little ineffective when it comes to finding deep web content.
These crawlers do not go for finding web pages that are dynamic or are an outcome of some queries. This is because the number of queries, in this case, cannot be determined. This issue can be solved, though not completely, by providing links to results that come out when a query is searched.
The consequence of this can be that this could lead to the unwanted popularity of a certain member on the deep web getting increased. There are some search engines that have been able to access the deep web. Some of them are:
- DeepPeep
- Ahmia.fi
- Scirus
- Deep Web Technologies
Intute and Scirus among these are not doing well. One of them ran out of funding and the other retired around 2013. So, the deep web should not be confused with the dark web and what differentiates them can be derived from everything that has been given above.
Comments
Post a Comment