Making a toy Recursive DNS Resolver for fun
Background
I’ve been taking CS6250: Computer Networks in the OMSCS program this summer semester, which has refined my understanding of the subject. Meanwhile, needing to ramp up my Python programming skills for backend systems at work, I decided to get my hands dirty by making a recursive DNS resolver. Why DNS? It’s based on specific specs, and I thought it could be manageable in a few hours. As a result, this project has deepened my knowledge of both Computer Networks and Python. This article describes how I built a toy recursive DNS resolver, step by step, translating human-readable domains into machine-readable IP addresses.
Disclaimer
Since I’m not an expert in the field of Computer Networks, my explanations might not always hit the mark due to my limited understanding. If you notice any inaccuracies or points that need correction, please feel free to reach out to me on social media or send me an email.
What is DNS?
Before diving into recursive DNS resolvers, it’s better to talk a bit about Domain Name Server (DNS). DNS is a hierarchical naming system used to resolve domain names into IP addresses. This is widely used in computers. For example, when you type a “example.com” into your browser, the browser uses DNS to resolve the IP address, like 192.0.0.1 from the URL. This IP is used to extract resources, such as HTML from the host server.
DNS is structured in three main levels: the root servers, the Top-Level Domain (TLD) servers, and the authoritative servers. Talking to those servers step by step, the recursive DNS resolver translates human-readable domain into IP address. On analogy, imagine searching for a book in a book store. First, you enter the store and look at a floor map to find the section where your book’s genre is located (akin to querying the root DNS). Once you find the right floor, you search for the specific shelf (similar to the TLD DNS). Finally, you might ask a staff member for help to locate the book on that shelf (like the authoritative DNS). Hope this analogy should make it easier to grasp how DNS functions.
Query to DNS
As this tool works as a CLI, the usage could be like below:
python main.py example.com
Therefore, the only domain name passed as an argument will be added to the query to DNS. As DNS is used by various consumers, of course, there is a protocol on how to form a query message defined in RFC 1035. Especially, section 4, message format is relevant to making a query. All communications between clients and serves have a single format. To make a simple query to DNS, header and question sections are key to compose the message. Let’s see the header spec at first.
Header
Based on the above spec, defining the following class in Python. All properties except for flags are identical with the above figure.
@dataclass
class DNSHeader:
xid: int # Randomly chosen identifier
flags: int # Bit-mask to indicate request/response
qdcount: int = 0 # Number of questions
ancount: int = 0 # Number of answers
nscount: int = 0 # Number of authority records
arcount: int = 0 # Number of additional records
Note: “flags” is a 16-bit bitmask including QR, Opcode, AA, TC, RD, RA, Z, and RCode. For simplicity in this example, all queries contain the value 0x0100 (0000 0001 0000 0000), in which only Recursion Desired bit is set.
Question
DNSQuestion represents a question to DNS server, which contains the following items:
QNAME: A domain name. QType: Specifies the type of query; in this example, only type A is used. QClass: Specifies the class of query; this example uses only the Internet (IN) class.
This format can be represented as DNSQuestion class as below:
@dataclass
class DNSQuestion:
qname: str
qtype: int = 1 # The QType (1 = A)
qclass: int = 1 # The QCLASS (1 = IN)
This implementation includes a to_bytes method to convert the query into a binary format suitable for transmission to a DNS server over the network.
def to_bytes(self):
parts = self.qname.split('.')
name_bytes = b''.join((len(part).to_bytes(
1, byteorder='big') + part.encode('ascii')) for part in parts) + b'\x00'
return name_bytes + struct.pack('!HH', self.qtype, self.qclass)
Domain handling
In DNS, a domain name is composed of labels. Each label contains its length and the actual data. For example, the domain “example.com” could be represented as “7example3com”. While this might seem straightforward, it becomes more complex when considering pointer handling, which will be discussed later.
Socket
To make actual network requests, this program utilizes the socket library. A socket is an interface that allows applications to communicate with another host over networks. This abstraction enables programs to send and receive data over networks. Since DNS typically uses UDP instead of TCP to reduce overhead, this example also uses UDP, as shown below:
def send_query(domain: str, server: str, port: int = 53):
query = build_query(domain)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
sock.sendto(query, (server, port))
# DNS specification mandates a maximum of 512 bytes for all messages
response, _ = sock.recvfrom(512)
print(response)
finally:
sock.close()
Okay, so your console would print some binary info like below, which provides clear progress updates :)
b'09\x83\x00\x00\x01\x00\x00\x00\r\x00\x0b\x07example\x03com\x00\x00\x01\x00\x01\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x14\x01l\x0cgtld-servers\x03net\x00\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01j\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01h\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01d\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01b\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01f\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01k\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01m\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01i\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01g\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01a\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01c\xc0+\xc0\x14\x00\x02\x00\x01\x00\x02\xa3\x00\x00\x04\x01e\xc0+\xc0)\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0)\xa2\x1e\xc0)\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x00\xd97\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0I\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc00O\x1e\xc0I\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x02p\x94\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0Y\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc06p\x1e\xc0Y\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x02\x08\xcc\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0i\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0\x1fP\x1e\xc0i\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x00\x85n\x00\x00\x00\x00\x00\x00\x00\x00\x000\xc0y\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0!\x0e\x1e\xc0y\x00\x1c\x00\x01\x00\x02\xa3\x00\x00\x10 \x01\x05\x03#\x1d\x00\x00\x00\x00\x00\x00\x00\x02\x000\xc0\x89\x00\x01\x00\x01\x00\x02\xa3\x00\x00\x04\xc0#3\x1e'
Parse DNS Response
In the previous section, I explained how to make a query and successfully receive a response from a DNS server. Next, the DNS response needs to be parsed into a human-readable format.
At first, the DNS message format contains the following sections. The header and questions have been already been defined. The next step is to define the answer section.
As the answer, authority, and additional sections all share the following format:
So DNSRecode class should be defined as below:
@dataclass
class DNSRecord:
name: str
type: int
class_: int
ttl: int
length: int
data: str
As a result, DNSHeader, DNSQuestion, and DNSRecode data structeres are defined to parse the binary response.
BytesIO
To parse the binary response, this program uses the BytesIO library to manage the binary stream. BytesIO provides APIs for operations like seeking and reporting the current position, which simplifies the parsing process. The parser starts by creating a BytesIO object named reader and then passes it to subsequent parser functions, as shown below:
def parse_response(bytes: bytes):
reader = BytesIO(bytes)
header = parse_header(reader)
questions = parse_questions(reader, header.qdcount)
records = parse_records(reader)
return header, questions, records
Parse header
Parsing the header is straightforward using the struct.unpack function, as the header format consists of six 16-bit sections. The parser can be implemented as follows:
def parse_header(reader: BytesIO):
header_fields = struct.unpack('!HHHHHH', reader.read(12))
return DNSHeader(*header_fields)
Note: The read function requires the byte size, which is calculated as 16 bits * 6 / 8 = 12 bytes.
Domain name handling
To parse question section and DNS records, parsing domain name is required. However, handling the domain name is not straightforward as it employs a compression strategy. This compression strategy uses a pointer to reduce redundant declarations in the message. For example, a domain name may be partially or fully replaced by a pointer to a prior occurrence of the same name in the message. This is indicated by the two most significant bits of the byte being set to 1, forming a 14-bit pointer.
To handle this, the parser needs to recognize these pointers and decode the domain names appropriately. Here’s how this can be approached:
- Normal Labels: A label typically starts with a length byte followed by the label content. The length byte tells how many bytes to read for the label.
- Compressed Labels: When the length byte has the two most significant bits set to 1, it indicates a pointer. The remaining 14 bits of this and the next byte represent the offset from the start of the message where the full domain name can be found.
def parse_domain_name(reader: BytesIO):
labels = []
while True:
length_byte = reader.read(1)
length = length_byte[0]
if length == 0:
break
if length >= 192: # 11000000
# Handle compression
pointer_byte = reader.read(1)
pointer = struct.unpack('!H', length_byte + pointer_byte)[0]
pointer &= 0x3FFF # Remove the two most significant bits
current_position = reader.tell()
reader.seek(pointer)
subdomain = parse_domain_name(reader)
labels.append(subdomain)
reader.seek(current_position)
break
labels.append(reader.read(length).decode('ascii'))
return ".".join(labels)
Parse question and record
Using the parse_domain_name function significantly simplifies the parsing of both the question and the record. The code to handle this could look like the following:
def parse_question(reader: BytesIO):
qname = parse_domain_name(reader)
data = reader.read(4)
qtype, qclass = struct.unpack("!HH", data)
return DNSQuestion(qname, qtype, qclass)
def parse_record(reader: BytesIO):
name = parse_domain_name(reader)
data = reader.read(10)
type, class_, ttl, length = struct.unpack("!HHIH", data)
if type == 1:
data = socket.inet_ntoa(reader.read(length))
else:
data = parse_domain_name(reader)
return DNSRecord(name, type, class_, ttl, length, data)
Unpacking binary data to construct the target object. The function socket.inet_ntoa is used to convert a 32-bit packed IPv4 address (a 4-byte binary string) into its standard dotted-quad string representation (e.g., 192.168.0.1). This is necessary because DNS records of type A (which represent IPv4 addresses) store the IP address.
Ref: https://github.com/yayoc/rdnsr/commit/ca80e01815d78bc16e962dbcfc19598ca91c3e34
Make a recursive
So far, the DNS resolver only queries a single arbitrary DNS server. The final step is to make this query recursive, going through the root, TLD, and authoritative servers to resolve the domain name. Since the fundamental functions are already in place, updating the program is straightforward. You can refer to the relevant commit for these changes: https://github.com/yayoc/rdnsr/commit/fa5142f3d7fc9bdfa760a9fddb20586e87441ea3
The main function sequentially runs send_query functions from the root server to the authoritative servers, retrieving name servers along the way. While this simple approach may not be the most performant, it effectively demonstrates how DNS works.
Finally, I got A record of example.com with the following command. Yey!
$ python main.py example.com
query to 198.41.0.4
query to l.gtld-servers.net
query to a.iana-servers.net
DNSRecord(name='example.com', type=1, class_=1, ttl=3600, length=4, data='93.184.215.14')
Conclusion
While it’s a simplified approach, creating a recursive DNS resolver has greatly enhanced my understanding of DNS. In particular, I wasn’t aware of domain compression until I started writing the parser. Additionally, the coding has prompted me to consider other issues, such as error handling and response verification, which was not obvious before starting this project.
References
- RFC 1035
- UDP Socket Programming: DNS
- Implement DNS in a weekend
- I wasn’t aware of this project before starting my own. It would have been better to read this article first, as it contains more detailed info.