URL Validator
Validate URL format and parse URL components
About URL Validation
A URL (Uniform Resource Locator) is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. URLs are fundamental to how the web works, and proper URL validation is essential for web applications, APIs, and data processing systems. Understanding URL structure and validation helps prevent security vulnerabilities, improve user experience, and ensure data quality.
URL Structure
A complete URL consists of several components:
https://user:password@www.example.com:8080/path/to/page?query=value&foo=bar#section
├─ Scheme: https
├─ User Info: user:password (optional, rarely used)
├─ Host: www.example.com
├─ Port: 8080 (optional, defaults: 80 for HTTP, 443 for HTTPS)
├─ Path: /path/to/page
├─ Query: query=value&foo=bar
└─ Fragment: section
URL Components Explained
- Scheme (Protocol): Defines how to access the resource (http, https, ftp, mailto, etc.)
- Domain (Host): The server address where the resource is located
- Port: The network port to connect to (optional, has defaults)
- Path: The specific location of the resource on the server
- Query String: Key-value pairs for passing data to the server
- Fragment (Hash): Points to a specific section within the resource
Common URL Schemes
| Scheme | Purpose | Example |
|---|---|---|
| http | Hypertext Transfer Protocol | http://example.com |
| https | Secure HTTP (with SSL/TLS) | https://example.com |
| ftp | File Transfer Protocol | ftp://ftp.example.com |
| mailto | Email address | mailto:user@example.com |
| file | Local file system | file:///path/to/file |
URL Encoding
Special characters in URLs must be encoded using percent-encoding (also called URL encoding):
| Character | Encoded | Purpose |
|---|---|---|
| Space | %20 or + | Separates words in queries |
| & | %26 | Reserved for query parameter separator |
| = | %3D | Reserved for query key-value separator |
| # | %23 | Reserved for fragments |
| ? | %3F | Reserved for query string start |
Valid URL Examples
https://www.example.com- Simple HTTPS URLhttps://example.com:8080/path- With custom port and pathhttps://example.com/search?q=test&page=1- With query parametershttps://example.com/page#section- With fragmenthttp://192.168.1.1/admin- IP address as hosthttp://localhost:3000- Local development server
Common URL Validation Errors
- Missing scheme: URLs must start with a protocol (http://, https://)
- Invalid characters: Some characters must be percent-encoded
- Malformed domain: Domain names must follow DNS naming rules
- Invalid port: Port numbers must be between 1 and 65535
- Incorrect encoding: Special characters improperly encoded
- Multiple question marks: Only one ? allowed to start query string
When to Use URL Validation
- Form Validation: Ensure users enter valid URLs in web forms
- API Development: Validate URL parameters and webhook URLs
- Web Scraping: Verify URLs before making HTTP requests
- Link Checkers: Validate URLs in content management systems
- Configuration Files: Verify API endpoints and service URLs
- Security: Prevent URL injection and SSRF attacks
Security Considerations
Security Warning: URL validation should include:
- Whitelist allowed schemes (typically only http and https)
- Prevent localhost and private IP addresses for user-submitted URLs
- Validate against SSRF (Server-Side Request Forgery) attacks
- Check for open redirects in URL parameters
- Sanitize URLs before displaying or storing
- Use HTTPS whenever possible for sensitive data
URL vs URI vs URN
| Term | Meaning | Example |
|---|---|---|
| URI | Uniform Resource Identifier (generic term) | https://example.com/page or urn:isbn:0-486-27557-4 |
| URL | Uniform Resource Locator (specifies location) | https://example.com/page |
| URN | Uniform Resource Name (identifies by name) | urn:isbn:0-486-27557-4 |
Best Practices
- Always use HTTPS for production websites and APIs
- Keep URLs short and readable when possible
- Use hyphens (not underscores) in URL paths for SEO
- Implement proper URL encoding for all user input
- Validate URLs on both client and server side
- Use canonical URLs to avoid duplicate content issues
- Implement proper redirects (301/302) when URLs change
- Consider internationalized domain names (IDN) for global audiences
SEO Considerations
- Use descriptive, keyword-rich URLs
- Keep URLs simple and easy to understand
- Avoid excessive parameters and session IDs
- Use lowercase letters consistently
- Implement proper URL structure for site hierarchy
- Use trailing slashes consistently
Additional Resources
- RFC 3986 (URI Generic Syntax): tools.ietf.org/html/rfc3986
- URL Standard (WHATWG): url.spec.whatwg.org
- MDN URL API: developer.mozilla.org/en-US/docs/Web/API/URL