Can we automatically check that some pages have a valid HTML markup ? This can be useful as part of a CI process, or for an automated audit.
The W3C provides an online validation tool. It turns out it is packaged as a docker container, and there are Ruby bindings to communicate with this service. Here is an example.
docker run -it --rm -p 8888:8888 ghcr.io/validator/validator:latest
# Gemfile
source "https://rubygems.org"
gem 'w3c_validators'
bundle install
results.errors
array, but it contains the error messages and locations.# audit.rb
require 'w3c_validators'
require 'yaml'
include W3CValidators
validator = NuValidator.new(:validator_uri => 'http://localhost:8888/')
url_db = YAML.load_file('data/urls.yml')
invalid = []
url_db['urls'].each do |se|
uri = se['url']
results = validator.validate_uri(uri)
if results.errors.length > 0
invalid << uri
end
end
puts invalid
ruby audit.rb
This is not necessarily convenient, but all this can be automated as part of a test suite inside a CI.