Making a toy Recursive DNS Resolver for fun

June 28, 2024

Background

I’ve been taking CS6250: Computer Networks in the OMSCS program this summer semester, which has refined my understanding of the subject. Meanwhile, needing to ramp up my Python programming skills for backend systems at work, I decided to get my hands dirty by making a recursive DNS resolver. Why DNS? It’s based on specific specs, and I thought it could be manageable in a few hours. As a result, this project has deepened my knowledge of both Computer Networks and Python. This article describes how I built a toy recursive DNS resolver, step by step, translating human-readable domains into machine-readable IP addresses.

Disclaimer

Since I’m not an expert in the field of Computer Networks, my explanations might not always hit the mark due to my limited understanding. If you notice any inaccuracies or points that need correction, please feel free to reach out to me on social media or send me an email.

What is DNS?

Before diving into recursive DNS resolvers, it’s better to talk a bit about Domain Name Server (DNS). DNS is a hierarchical naming system used to resolve domain names into IP addresses. This is widely used in computers. For example, when you type a “example.com” into your browser, the browser uses DNS to resolve the IP address, like 192.0.0.1 from the URL. This IP is used to extract resources, such as HTML from the host server.

DNS is structured in three main levels: the root servers, the Top-Level Domain (TLD) servers, and the authoritative servers. Talking to those servers step by step, the recursive DNS resolver translates human-readable domain into IP address. On analogy, imagine searching for a book in a book store. First, you enter the store and look at a floor map to find the section where your book’s genre is located (akin to querying the root DNS). Once you find the right floor, you search for the specific shelf (similar to the TLD DNS). Finally, you might ask a staff member for help to locate the book on that shelf (like the authoritative DNS). Hope this analogy should make it easier to grasp how DNS functions.

Query to DNS

As this tool works as a CLI, the usage could be like below:

python main.py example.com

Therefore, the only domain name passed as an argument will be added to the query to DNS. As DNS is used by various consumers, of course, there is a protocol on how to form a query message defined in RFC 1035. Especially, section 4, message format is relevant to making a query. All communications between clients and serves have a single format. To make a simple query to DNS, header and question sections are key to compose the message. Let’s see the header spec at first.

DNS Header

Based on the above spec, defining the following class in Python. All properties except for flags are identical with the above figure.

@dataclass
class DNSHeader:
	xid: int  # Randomly chosen identifier
	flags: int  # Bit-mask to indicate request/response
	qdcount: int = 0  # Number of questions
	ancount: int = 0  # Number of answers
	nscount: int = 0  # Number of authority records
	arcount: int = 0  # Number of additional records

Note: “flags” is a 16-bit bitmask including QR, Opcode, AA, TC, RD, RA, Z, and RCode. For simplicity in this example, all queries contain the value 0x0100 (0000 0001 0000 0000), in which only Recursion Desired bit is set.

Question

DNS Question

DNSQuestion represents a question to DNS server, which contains the following items:

QNAME: A domain name. QType: Specifies the type of query; in this example, only type A is used. QClass: Specifies the class of query; this example uses only the Internet (IN) class.

This format can be represented as DNSQuestion class as below:

@dataclass
class DNSQuestion:
	qname: str
	qtype: int = 1  # The QType (1 = A)
	qclass: int = 1  # The QCLASS (1 = IN)

This implementation includes a to_bytes method to convert the query into a binary format suitable for transmission to a DNS server over the network.

def to_bytes(self):
    	parts = self.qname.split('.')
    	name_bytes = b''.join((len(part).to_bytes(
        	1, byteorder='big') + part.encode('ascii')) for part in parts) + b'\x00'
    	return name_bytes + struct.pack('!HH', self.qtype, self.qclass)

Domain handling

In DNS, a domain name is composed of labels. Each label contains its length and the actual data. For example, the domain “example.com” could be represented as “7example3com”. While this might seem straightforward, it becomes more complex when considering pointer handling, which will be discussed later.

Socket

To make actual network requests, this program utilizes the socket library. A socket is an interface that allows applications to communicate with another host over networks. This abstraction enables programs to send and receive data over networks. Since DNS typically uses UDP instead of TCP to reduce overhead, this example also uses UDP, as shown below:

def send_query(domain: str, server: str, port: int = 53):
	query = build_query(domain)
	sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

	try:
    		sock.sendto(query, (server, port))
    		# DNS specification mandates a maximum of 512 bytes for all messages
    		response, _ = sock.recvfrom(512)
		print(response)
	finally:
    		sock.close()

Okay, so your console would print some binary info like below, which provides clear progress updates :)

b'09\x83\x00\x00\x01\x00\x00\x00\r\x00\x0b\x07example\x03com\x00\x00\x01\x00\x01\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x14\x01l\x0cgtld-servers\x03net\x00\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01j\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01h\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01d\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01b\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01f\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01k\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01m\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01i\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01g\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01a\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01c\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01e\xc0+\xc0)\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0)\xa2\x1e\xc0)\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x00\xd97\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0I\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc00O\x1e\xc0I\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x02p\x94\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0Y\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc06p\x1e\xc0Y\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x02\x08\xcc\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0i\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0\x1fP\x1e\xc0i\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x00\x85n\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0y\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0!\x0e\x1e\xc0y\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x03#\x1d\x00\x00\x00\x00\x00\x00\x00\x02\x000\xc0\x89\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0#3\x1e'

Parse DNS Response

In the previous section, I explained how to make a query and successfully receive a response from a DNS server. Next, the DNS response needs to be parsed into a human-readable format.

At first, the DNS message format contains the following sections. The header and questions have been already been defined. The next step is to define the answer section.

DNS message format

As the answer, authority, and additional sections all share the following format:

DNS record

So DNSRecode class should be defined as below:

@dataclass
class DNSRecord:
	name: str
	type: int
	class_: int
	ttl: int
	length: int
	data: str

As a result, DNSHeader, DNSQuestion, and DNSRecode data structeres are defined to parse the binary response.

BytesIO

To parse the binary response, this program uses the BytesIO library to manage the binary stream. BytesIO provides APIs for operations like seeking and reporting the current position, which simplifies the parsing process. The parser starts by creating a BytesIO object named reader and then passes it to subsequent parser functions, as shown below:

def parse_response(bytes: bytes):
	reader = BytesIO(bytes)
	header = parse_header(reader)
	questions = parse_questions(reader, header.qdcount)
	records = parse_records(reader)
	return header, questions, records

Parse header

Parsing the header is straightforward using the struct.unpack function, as the header format consists of six 16-bit sections. The parser can be implemented as follows:

def parse_header(reader: BytesIO):
	header_fields = struct.unpack('!HHHHHH', reader.read(12))
	return DNSHeader(*header_fields)

Note: The read function requires the byte size, which is calculated as 16 bits * 6 / 8 = 12 bytes.

Domain name handling

To parse question section and DNS records, parsing domain name is required. However, handling the domain name is not straightforward as it employs a compression strategy. This compression strategy uses a pointer to reduce redundant declarations in the message. For example, a domain name may be partially or fully replaced by a pointer to a prior occurrence of the same name in the message. This is indicated by the two most significant bits of the byte being set to 1, forming a 14-bit pointer.

To handle this, the parser needs to recognize these pointers and decode the domain names appropriately. Here’s how this can be approached:

Normal Labels: A label typically starts with a length byte followed by the label content. The length byte tells how many bytes to read for the label.
Compressed Labels: When the length byte has the two most significant bits set to 1, it indicates a pointer. The remaining 14 bits of this and the next byte represent the offset from the start of the message where the full domain name can be found.

def parse_domain_name(reader: BytesIO):
	labels = []
	while True:
    		length_byte = reader.read(1)
    		length = length_byte[0]
    		if length == 0:
        			break
    		if length >= 192:  # 11000000
        			# Handle compression
        			pointer_byte = reader.read(1)
        			pointer = struct.unpack('!H', length_byte + pointer_byte)[0]
        			pointer &= 0x3FFF  # Remove the two most significant bits
        			current_position = reader.tell()
        			reader.seek(pointer)
        			subdomain = parse_domain_name(reader)
        			labels.append(subdomain)
        			reader.seek(current_position)
        			break
    		labels.append(reader.read(length).decode('ascii'))
	return ".".join(labels)

Parse question and record

Using the parse_domain_name function significantly simplifies the parsing of both the question and the record. The code to handle this could look like the following:

def parse_question(reader: BytesIO):
	qname = parse_domain_name(reader)
	data = reader.read(4)
	qtype, qclass = struct.unpack("!HH", data)
	return DNSQuestion(qname, qtype, qclass)

def parse_record(reader: BytesIO):
	name = parse_domain_name(reader)
	data = reader.read(10)
	type, class_, ttl, length = struct.unpack("!HHIH", data)
	if type == 1:
    	data = socket.inet_ntoa(reader.read(length))
	else:
    		data = parse_domain_name(reader)
	return DNSRecord(name, type, class_, ttl, length, data)

Unpacking binary data to construct the target object. The function socket.inet_ntoa is used to convert a 32-bit packed IPv4 address (a 4-byte binary string) into its standard dotted-quad string representation (e.g., 192.168.0.1). This is necessary because DNS records of type A (which represent IPv4 addresses) store the IP address.

Ref: https://github.com/yayoc/rdnsr/commit/ca80e01815d78bc16e962dbcfc19598ca91c3e34

Make a recursive

So far, the DNS resolver only queries a single arbitrary DNS server. The final step is to make this query recursive, going through the root, TLD, and authoritative servers to resolve the domain name. Since the fundamental functions are already in place, updating the program is straightforward. You can refer to the relevant commit for these changes: https://github.com/yayoc/rdnsr/commit/fa5142f3d7fc9bdfa760a9fddb20586e87441ea3

The main function sequentially runs send_query functions from the root server to the authoritative servers, retrieving name servers along the way. While this simple approach may not be the most performant, it effectively demonstrates how DNS works.

Finally, I got A record of example.com with the following command. Yey!

$ python main.py example.com
query to  198.41.0.4
query to  l.gtld-servers.net
query to  a.iana-servers.net
DNSRecord(name='example.com', type=1, class_=1, ttl=3600, length=4, data='93.184.215.14')

Conclusion

While it’s a simplified approach, creating a recursive DNS resolver has greatly enhanced my understanding of DNS. In particular, I wasn’t aware of domain compression until I started writing the parser. Additionally, the coding has prompted me to consider other issues, such as error handling and response verification, which was not obvious before starting this project.

References

RFC 1035
UDP Socket Programming: DNS
Implement DNS in a weekend
- I wasn’t aware of this project before starting my own. It would have been better to read this article first, as it contains more detailed info.