PDF Accessibility Auto-Tag API
PDF Accessibility Auto-Tag API Output Format
The output of PDF Accessibility Auto-Tag API contains the following:
- The version 1.7 tagged PDF file with headings shifted if the option of shift headings is set.
- A report in XLSX format, which provides information related to tagging of the document. This will be generated if report generation is enabled.
API limitations
- File size: Files up to a maximum of 100 MB are supported.
- Number of Pages: Non-scanned PDFs up to 200 pages and scanned PDFs up to 100 pages are supported, however limits may be lower for files with a large number of tables.
- Rate limits: Keep request rate below 25 requests per minute.
- Page Size: The API supports standard page sizes not more than 17.5” or less than 6” in either dimension.
- Hidden Objects: PDF files that contain content that is not visible on the page like Javascript, OCG (optional content groups), etc are not supported. Files that contain such hidden information may fail to process. For such cases, removing hidden content prior to processing files again may return a successful result.
- Language: The API is currently optimized for English language content. Files containing content in French, German, Spanish, Danish, Dutch, Norwegian (Bokmal), Galician, Catalan, Finnish, Italian, Swedish, Portuguese, and Romanian should return good results most of the time. Files containing content in Afrikaans, Bosnian, Croatian, Czech, Hungarian, Indonesian, Malay, Polish, Russian, Serbian, Turkish, Hindi, Marathi and other similar languages should return good results often. Non-English files may have issues with non-English punctuation. OCR is configured for English content.
- OCR and Scan quality: The quality of text extracted from scanned files is dependent on the clarity of content in the input file and is currently configured for English content. Conditions like skewed pages, shadowing, obscured or overlapping fonts, and page resolution less than 200 DPI can all result in lower quality text output.
- Form fields: Files containing XFA and other fillable form elements are not supported.
- Unprotected files: The API supports files that are unprotected or where security restrictions allow editing of content. Files that are secured and do not allow editing of content will not be processed.
- Annotations: Content in PDF files containing annotations such as highlights and sticky notes will be processed, but annotations that obscure text could impact output quality. Text within annotations will not be included in the output.
- PDF Producers: The PDF Accessibility Auto-Tag API is designed to add tags to PDF to make it easier to make the file accessible. Files created from applications that produce other types of content like illustrations, CAD drawings or other types of vector art may not return high quality results.
- PDF Collections: PDFs that are made from a collection of files including PDF Portfolios are not currently supported.
Error codes
Scenario | Error code | Error message |
---|---|---|
Unknown error/ failure | ERROR | Unexpected error |
Timeout | TIMEOUT | Unexpected error: Processing timeout |
Disqualified | DISQUALIFIED | File is not suitable for conversion |
Unsupported XFA file | DISQUALIFIED_XFA | File is not suitable for conversion: File contains an XFA form |
Page limit violation | DISQUALIFIED_PAGE_LIMIT | File is not suitable for conversion: File exceeds page limit |
Scan page limit violation | DISQUALIFIED_SCAN_PAGE_LIMIT | File is not suitable for conversion: Scanned file exceeds page limit |
File size violation | DISQUALIFIED_FILE_SIZE | File is not suitable for conversion: File exceeds size limit |
Encryption permission | DISQUALIFIED_PERMISSIONS | File is not suitable for conversion: File permissions do not allow conversion |
Complex file | DISQUALIFIED_COMPLEX_FILE | File is not suitable for conversion: File content is too complex |
Unsupported language | DISQUALIFIED_LANGUAGE | File is not suitable for conversion: File content language is unsupported |
Bad PDF | BAD_PDF | The PDF file is damaged or its content is too complex |
Bad PDF file type | BAD_PDF_FILE_TYPE | The input file is not a PDF file |
Damaged input file | BAD_PDF_DAMAGED | The input file is damaged |
Complex table | BAD_PDF_COMPLEX_TABLE | The input file contains a table that is too complex to process |
Complex content | BAD_PDF_COMPLEX_INPUT | The input file contains content that is too complex to process |
Unsupported font | BAD_PDF_UNSUPPORTED_FONT | The input file contains font data that is corrupted or not supported |
Large PDF file | BAD_PDF_LARGE_FILE | The input file size exceeds the maximum allowed |
Protected PDF | PROTECTED_PDF | PDF is encrypted or password-protected |
Empty or corrupted input | BAD_INPUT | Input is corrupted or empty |
Invalid input parameters | BAD_INPUT_PARAMS | Invalid input parameters |
Generate tagged PDF with version 1.7 along with an XLSX report and shift the headings in the output PDF file
The sample below generate tagged PDF of version 1.7 along with an XLSX report and shift the headings in the output PDF file.
Java
Python
Copied to your clipboard1// Get the samples from https://git.corp.adobe.com/dc/dc-cpf-sdk-java-samples/tree/beta2// Run the sample:3// mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.autotagpdf.AutotagPDFWithOptions45public class AutotagPDFWithOptions {67 private static final org.slf4j.Logger LOGGER = LoggerFactory.getLogger(AutotagPDFWithOptions.class);89 public static void main(String[] args) {1011 try {12 // Initial setup, create credentials instance.13 Credentials credentials = Credentials.serviceAccountCredentialsBuilder()14 .fromFile("pdfservices-api-credentials.json")15 .build();1617 //Create an ExecutionContext using credentials and create a new operation instance.18 ExecutionContext executionContext = ExecutionContext.create(credentials);1920 AutotagPDFOperation autotagPDFOperation = AutotagPDFOperation.createNew();2122 // Provide an input FileRef for the operation23 autotagPDFOperation.setInput(FileRef.createFromLocalFile("src/main/resources/autotagPdfInput.pdf"));2425 // Build AutotagPDF options and set them into the operation26 AutotagPDFOptions autotagPDFOptions = AutotagPDFOptions.autotagPDFOptionsBuilder()27 .shiftHeadings()28 .generateReport()29 .build();30 autotagPDFOperation.setOptions(autotagPDFOptions);3132 // Execute the operation33 AutotagOutputFiles autotagOutputFiles = autotagPDFOperation.execute(executionContext);3435 // Save the output files at the specified location36 autotagOutputFiles.saveTaggedPDF("output/AutotagPDFWithOptions-tagged.pdf");37 autotagOutputFiles.saveReport("output/AutotagPDFWithOptions-report.xlsx");383940 } catch (ServiceApiException | IOException | ServiceUsageException e) {41 System.out.println(e);42 }43 }44}
Copied to your clipboard1# Get the samples from https://git.corp.adobe.com/dc/dc-cpf-python-sdk-samples/tree/beta2# Run the sample:3# python src/autotagpdf/autotag_pdf_with_options.py45logging.basicConfig(level=os.environ.get('LOGLEVEL', 'INFO'))67try:8 # get base path.9 base_path = str(Path(__file__).parents[2])1011 # Initial setup, create credentials instance.12 credentials = Credentials.service_account_credentials_builder() \13 .from_file(base_path + '/pdfservices-api-credentials.json') \14 .build()1516 # Create an ExecutionContext using credentials and create a new operation instance.17 execution_context = ExecutionContext.create(credentials)18 autotag_pdf_operation = AutotagPDFOperation.create_new()1920 # Set operation input from a source file.21 input_file_path = 'autotagPdfInput.pdf'22 source = FileRef.create_from_local_file(base_path + '/resources/' + input_file_path)23 autotag_pdf_operation.set_input(source)2425 # Build AutotagPDF options and set them into the operation26 autotag_pdf_options: AutotagPDFOptions = AutotagPDFOptions.builder()\27 .with_shift_headings()\28 .with_generate_report()\29 .build()30 autotag_pdf_operation.set_options(autotag_pdf_options)3132 # Execute the operation.33 autotag_output_files: AutotagPDFOutputFiles = autotag_pdf_operation.execute(execution_context)3435 input_file_name = Path(input_file_path).stem36 base_output_path = base_path + '/output/AutotagPDFWithOptions/'3738 Path(base_output_path).mkdir(parents=True, exist_ok=True)39 tagged_pdf_path = f'{base_output_path}{input_file_name}-tagged.pdf'40 report_path = f'{base_output_path}{input_file_name}-report.xlsx'4142 # Save the result to the specified location.43 autotag_output_files.save_pdf_file(tagged_pdf_path)44 autotag_output_files.save_xls_file(report_path)4546except (ServiceApiException, ServiceUsageException, SdkException) as e:47 logging.exception(f'Exception encountered while executing operation: {e}')
Generate tagged PDF from a PDF
The sample below generates tagged PDF from a PDF.
Java
Python
Copied to your clipboard1// Get the samples from https://git.corp.adobe.com/dc/dc-cpf-sdk-java-samples/tree/beta2// Run the sample:3// mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.autotagpdf.AutotagPDF45public class AutotagPDF {67 private static final org.slf4j.Logger LOGGER = LoggerFactory.getLogger(AutotagPDF.class);89 public static void main(String[] args) {1011 try {12 // Initial setup, create credentials instance.13 Credentials credentials = Credentials.serviceAccountCredentialsBuilder()14 .fromFile("pdfservices-api-credentials.json")15 .build();1617 //Create an ExecutionContext using credentials and create a new operation instance.18 ExecutionContext executionContext = ExecutionContext.create(credentials);1920 AutotagPDFOperation autotagPDFOperation = AutotagPDFOperation.createNew();2122 // Provide an input FileRef for the operation23 autotagPDFOperation.setInput(FileRef.createFromLocalFile("src/main/resources/autotagPdfInput.pdf"));2425 // Execute the operation26 AutotagOutputFiles autotagOutputFiles = autotagPDFOperation.execute(executionContext);2728 // Save the output files at the specified location29 autotagOutputFiles.saveTaggedPDF("output/AutotagPDF-tagged.pdf");3031 } catch (ServiceApiException | IOException | ServiceUsageException e) {32 System.out.println(e);33 }34 }35}
Copied to your clipboard1# Get the samples from https://git.corp.adobe.com/dc/dc-cpf-python-sdk-samples/tree/beta2# Run the sample:3# python src/autotagpdf/autotag_pdf.py45logging.basicConfig(level=os.environ.get('LOGLEVEL', 'INFO'))67try:8 # get base path.9 base_path = str(Path(__file__).parents[2])1011 # Initial setup, create credentials instance.12 credentials = Credentials.service_account_credentials_builder() \13 .from_file(base_path + '/pdfservices-api-credentials.json') \14 .build()1516 # Create an ExecutionContext using credentials and create a new operation instance.17 execution_context = ExecutionContext.create(credentials)18 autotag_pdf_operation = AutotagPDFOperation.create_new()1920 # Set operation input from a source file.21 input_file_path = 'autotagPdfInput.pdf'22 source = FileRef.create_from_local_file(base_path + '/resources/' + input_file_path)23 autotag_pdf_operation.set_input(source)2425 # Execute the operation.26 autotag_output_files: AutotagPDFOutputFiles = autotag_pdf_operation.execute(execution_context)2728 input_file_name = Path(input_file_path).stem29 base_output_path = base_path + '/output/AutotagPDF/'3031 Path(base_output_path).mkdir(parents=True, exist_ok=True)32 tagged_pdf_path = f'{base_output_path}{input_file_name}-tagged.pdf'3334 # Save the result to the specified location.35 autotag_output_files.save_pdf_file(tagged_pdf_path)3637except (ServiceApiException, ServiceUsageException, SdkException) as e:38 logging.exception(f'Exception encountered while executing operation: {e}')
Generates tagged PDF by setting options with command line arguments
The sample below generates tagged PDF by setting options through command line arguments.
Here is a sample list of command line arguments and their description:
- --input < input file path >
- --output < output file path >
- --report { If this argument is present then the output will be generated with the report }
- --shift_headings { If this argument is present then the headings will be shifted in the output PDF file }
Java
Python
Copied to your clipboard1// Get the samples from https://git.corp.adobe.com/dc/dc-cpf-sdk-java-samples/tree/beta2// Run the sample:3// mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.autotagpdf.AutotagPDFParamaterised -Dexec.args="--report --shift_headings --input src/main/resources/autotagPdfInput.pdf --output output/"45public class AutotagPDFParamaterised {67 private static final org.slf4j.Logger LOGGER = LoggerFactory.getLogger(AutotagPDFWithOptions.class);89 public static void main(String[] args) {10 LOGGER.info("--input " + getInputFilePathFromCmdArgs(args));11 LOGGER.info("--output " + getOutputFilePathFromCmdArgs(args));12 LOGGER.info("--report " + getGenerateReportFromCmdArgs(args));13 LOGGER.info("--shift_headings " + getShiftHeadingsFromCmdArgs(args));1415 try {16 // Initial setup, create credentials instance.17 Credentials credentials = Credentials.serviceAccountCredentialsBuilder()18 .fromFile("pdfservices-api-credentials.json")19 .build();2021 //Create an ExecutionContext using credentials and create a new operation instance.22 ExecutionContext executionContext = ExecutionContext.create(credentials);2324 AutotagPDFOperation autotagPDFOperation = AutotagPDFOperation.createNew();2526 // Set input for operation from command line args27 autotagPDFOperation.setInput(FileRef.createFromLocalFile(getInputFilePathFromCmdArgs(args)));2829 // Get and Build AutotagPDF options from command line args and set them into the operation30 AutotagPDFOptions autotagPDFOptions = getOptionsFromCmdArgs(args);31 autotagPDFOperation.setOptions(autotagPDFOptions);3233 // Execute the operation34 AutotagOutputFiles autotagOutputFiles = autotagPDFOperation.execute(executionContext);3536 // Save the output files at the specified location37 String outputPath = getOutputFilePathFromCmdArgs(args);38 autotagOutputFiles.saveTaggedPDF(outputPath + "AutotagPDFParameterised-tagged.pdf");39 if (autotagPDFOptions != null && autotagPDFOptions.isGenerateReport())40 autotagOutputFiles.saveReport(outputPath + "AutotagPDFParameterised-report.xlsx");4142 } catch (ServiceApiException | IOException | ServiceUsageException e) {43 System.out.println(e);44 }45 }4647 private static AutotagPDFOptions getOptionsFromCmdArgs(String[] args) {48 Boolean generateReport = getGenerateReportFromCmdArgs(args);49 Boolean shiftHeadings = getShiftHeadingsFromCmdArgs(args);5051 AutotagPDFOptions.Builder builder = AutotagPDFOptions.autotagPDFOptionsBuilder();5253 if (generateReport)54 builder.generateReport();55 if (shiftHeadings)56 builder.shiftHeadings();5758 return builder.build();59 }6061 private static Boolean getShiftHeadingsFromCmdArgs(String[] args) {62 return Arrays.asList(args).contains("--shift_headings");63 }6465 private static Boolean getGenerateReportFromCmdArgs(String[] args) {66 return Arrays.asList(args).contains("--report");67 }6869 private static String getInputFilePathFromCmdArgs(String[] args) {70 String inputFilePath = "src/main/resources/autotagPdfInput.pdf";71 int inputFilePathIndex = Arrays.asList(args).indexOf("--input");72 if (inputFilePathIndex >= 0 && inputFilePathIndex < args.length - 1) {73 inputFilePath = args[inputFilePathIndex + 1];74 } else75 LOGGER.info("input file not specified, using default value : autotagPdfInput.pdf");7677 return inputFilePath;78 }7980 private static String getOutputFilePathFromCmdArgs(String[] args) {81 String outputFilePath = "output/";82 int outputFilePathIndex = Arrays.asList(args).indexOf("--output");83 if (outputFilePathIndex >= 0 && outputFilePathIndex < args.length - 1) {84 outputFilePath = args[outputFilePathIndex + 1];85 } else86 LOGGER.info("output path not specified, using default value : output/");8788 return outputFilePath;89 }90}
Copied to your clipboard1# Get the samples from https://git.corp.adobe.com/dc/dc-cpf-python-sdk-samples/tree/beta2# Run the sample:3# python src/autotagpdf/autotag_pdf_parameterised.py --report --shift_headings --input resources/autotagPdfInput.pdf --output output/45logging.basicConfig(level=os.environ.get('LOGLEVEL', 'INFO'))678class AutotagPDFParameterised:910 _input_path: str11 _output_path: str12 _generate_report: bool13 _shift_headings: bool1415 base_path = str(Path(__file__).parents[2])1617 def __init__(self):18 pass1920 @staticmethod21 def parse_args(*args: str):22 if not args:23 args = sys.argv[1:]24 parser = argparse.ArgumentParser(description='Autotag PDF')2526 parser.add_argument('--input', help='Input file path', type=Path, metavar='input')27 parser.add_argument('--output', help='Output path', type=Path, dest='output')28 parser.add_argument('--report', dest='report', action='store_true', help='Generate report(in XLSX format)',29 default=False)30 parser.add_argument('--shift_headings', dest='shift_headings', action='store_true', help='Shift headings',31 default=False)3233 return parser.parse_args(args)3435 def get_default_input_file_path(self) -> str:36 return self.base_path + '/resources/autotagPdfInput.pdf'3738 def get_default_output_file_path(self) -> str:39 return self.base_path + '/output/AutotagPDFParameterised'4041 def get_autotag_pdf_options(self) -> AutotagPDFOptions:42 shift_headings = self._shift_headings43 generate_report = self._generate_report4445 builder: AutotagPDFOptions.Builder = AutotagPDFOptions.builder()46 if shift_headings:47 builder.with_shift_headings()48 if generate_report:49 builder.with_generate_report()50 return builder.build()5152 def execute(self, *args: str) -> None:53 args = self.parse_args(*args)54 self._input_path = args.input if args.input else self.get_default_input_file_path()55 self._output_path = args.output if args.output else self.get_default_output_file_path()56 self._generate_report = args.report57 self._shift_headings = args.shift_headings5859 self.autotag_pdf()6061 def autotag_pdf(self):62 try:63 # Initial setup, create credentials instance.64 credentials = Credentials.service_account_credentials_builder() \65 .from_file(self.base_path + '/pdfservices-api-credentials.json') \66 .build()6768 # Create an ExecutionContext using credentials and create a new operation instance.69 execution_context = ExecutionContext.create(credentials)70 autotag_pdf_operation = AutotagPDFOperation.create_new()7172 # Set operation input from a source file.73 source = FileRef.create_from_local_file(self._input_path)74 autotag_pdf_operation.set_input(source)7576 # Build AutotagPDF options and set them into the operation77 autotag_pdf_operation.set_options(self.get_autotag_pdf_options())7879 # Execute the operation.80 autotag_output_files: AutotagPDFOutputFiles = autotag_pdf_operation.execute(execution_context)8182 input_file_name = Path(self._input_path).stem83 base_output_path = self._output_path8485 Path(base_output_path).mkdir(parents=True, exist_ok=True)8687 # Save the result to the specified location.88 tagged_pdf_path = f'{base_output_path}/{input_file_name}-tagged.pdf'89 autotag_output_files.save_pdf_file(tagged_pdf_path)90 if self._generate_report:91 report_path = f'{base_output_path}/{input_file_name}-report.xlsx'92 autotag_output_files.save_xls_file(report_path)9394 except (ServiceApiException, ServiceUsageException, SdkException) as e:95 logging.exception(f'Exception encountered while executing operation: {e}')969798if __name__ == "__main__":99 autotag_pdf_parameterised = AutotagPDFParameterised()100 autotag_pdf_parameterised.execute()