Skip to content

Commit a910410

Browse files
committed
Verify that we get errors if the files contain a DOCTYPE
Tests that show this error detection for each of * DFDL schemas * included DFDL schemas * imported DFDL schemas * TDML files * XML Infoset files used in TDML tests * Config files used in TDML tests * External variable binding files * XML input to XML Text Infoset Inputters (using Woodstox library) * XML input to JDOM infoset Inputters * XML input to Scala XML Infoset Inputters Eliminating use of DOCTYPE also eliminates any possibility of XML General or Paremter Entities. Consolidated loader/validators of XML and XSD. Everything that loads XML, XSD, TDML, config, external vars, or ".dfdl.xsd", now uses DaffodilXMLLoader to do it. Specific validators that were for Config files or External Variable Binding files are now gone. DaffodilXMLLoader was simplified and made more uniform. Single purpose mixin traits and adapters were eliminated. The validation in DaffodilXMLLoader uses two methods. First it uses the XercesValidator (used by the validation feature). Second it does a validating load with Xerces. This seems to catch/report different validation problems. (Instrumentation to prove this may be worth it, so that we can get rid of redundant work if possible) DFDL schemas are validated by constructing an XML Schema object from them, as well as by loading them and validating against the XML Schema for DFDL schemas. The DaffodilXMLLoader validates (if requested) using a supplied XML Schema. But then always loads using the DaffodilConstructingParser which uses the underlying ConstructingParser - this is needed because in many cases we are dependent on properly handling CDATA regions (e.g., TDML files, test infoset XML files, etc) which Xerces doesn't do properly. I attempted to implement DAFFODIL-288 to validate the infoset XML (before unparsing) also but was unsuccessful, but a TODO DAFFODIL-288 marks the place where that fix goes. New validation is more uniform, and thorough. This caught a number of small issues like missing "tdml:" prefixes on numerous files' testSuite elements. There were various adjustments to accomodate the more strict validation. Changes to SAX due to simple types now being series of Text, Atom, and EntityRef. Various other small fixes to TDML runner to insure no diagnostic errors are being hidden. Fix MS-Windows failure due to CRLF issues. There should be less CRLF sensitivity now. The new unified DaffodilXMLLoader which we use everywhere always normalizes CRLF to LF and isolated CR to LF. This is done in Text, CDATA, and COMMENT objects. The only reason we use the constructing parser now is the behavior around CDATA/PCDATA nodes, which is broken in Xerces. There are tests to characterize this behavior in Xerces so if it does get fixed we can adapt. However, if it did get fixed it would require a mode switch to turn this different behavior on, so we probably just won't notice. Upgraded scala-xml library to version 2.0.0 TDMLRunner no longer gets NPE in one situation. Also documented why we need the 2nd xerces load beyond just the regular XercesValidator call, which is for xsi:schemaLocation. Note added about xsi:noNamespaceSchemaLocation Added comments about useDefaultNamespace in tdml.xsd. Also added a TODO in the code. We really want this to be false by default, but 81 tests in daffodil-test fail if you change that, so not doing in this change set. DAFFODIL-1422, DAFFODIL-1659, DAFFODIL-1816
1 parent 41cf56f commit a910410

File tree

74 files changed

+2024
-772
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+2024
-772
lines changed

daffodil-cli/src/it/scala/org/apache/daffodil/debugger/TestCLIDebugger.scala

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1265,20 +1265,22 @@ class TestCLIdebugger {
12651265
shell.sendLine("step")
12661266
shell.sendLine("step")
12671267
shell.sendLine("step")
1268-
shell.sendLine("step")
1269-
shell.sendLine("step")
12701268
shell.expect(contains("bitPosition: 0 -> 8"))
12711269
shell.expect(contains("foundDelimiter: (no value) -> ,"))
12721270
shell.expect(contains("foundField: (no value) -> 0"))
12731271
shell.sendLine("step")
12741272
shell.sendLine("step")
1273+
shell.expect(contains("bitPosition: 8 -> 16"))
12751274
shell.expect(contains("childIndex: 1 -> 2"))
1275+
shell.expect(contains("foundDelimiter: , -> (no value)"))
1276+
shell.expect(contains("foundField: 0 -> (no value)"))
12761277
shell.expect(contains("groupIndex: 1 -> 2"))
12771278
shell.expect(contains("occursIndex: 1 -> 2"))
12781279
shell.sendLine("step")
1279-
shell.expect(contains("bitPosition: 8 -> 16"))
1280-
shell.expect(contains("foundDelimiter: , -> (no value)"))
1281-
shell.expect(contains("foundField: 0 -> (no value)"))
1280+
shell.sendLine("step")
1281+
shell.expect(contains("bitPosition: 16 -> 24"))
1282+
shell.expect(contains("foundDelimiter: (no value) -> ,"))
1283+
shell.expect(contains("foundField: (no value) -> 1"))
12821284
shell.sendLine("quit")
12831285
} finally {
12841286
shell.close()
@@ -1331,26 +1333,29 @@ class TestCLIdebugger {
13311333
shell.expect(contains("(debug)"))
13321334
shell.sendLine("display info diff")
13331335
shell.expect(contains("(debug)"))
1336+
shell.sendLine("set diffExcludes childIndex")
1337+
shell.expect(contains("(debug)"))
13341338
shell.sendLine("step")
1339+
shell.expect(contains("bitPosition: 0 -> 8"))
13351340
shell.sendLine("step")
13361341
shell.sendLine("step")
13371342
shell.sendLine("step")
1338-
shell.expect(contains("bitPosition: 0 -> 8"))
1339-
shell.sendLine("set diffExcludes childIndex")
13401343
shell.sendLine("step")
13411344
shell.sendLine("step")
13421345
shell.sendLine("step")
1346+
shell.expect(regexp("\\+ Suppressable.* for cell"))
1347+
shell.sendLine("step")
1348+
shell.sendLine("step")
13431349
shell.sendLine("step")
13441350
shell.sendLine("step")
13451351
shell.sendLine("step")
13461352
shell.sendLine("step")
13471353
shell.sendLine("step")
1348-
shell.expect(regexp("\\+ Suppressable.* for cell"))
13491354
shell.sendLine("step")
1350-
shell.expect(regexp("\\+ Alignment.* for cell"))
1355+
shell.expect(regexp("RegionSplit.* for cell"))
13511356
shell.sendLine("info suspensions")
13521357
shell.expect(regexp("Suppressable.* for cell"))
1353-
shell.expect(regexp("Alignment.* for cell"))
1358+
shell.expect(regexp("RegionSplit.* for cell"))
13541359
shell.sendLine("quit")
13551360
} finally {
13561361
shell.close()

daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@ import java.nio.channels.Channels
2828
import java.nio.file.Paths
2929
import java.util.Scanner
3030
import java.util.concurrent.Executors
31-
3231
import com.typesafe.config.ConfigFactory
3332

3433
import scala.concurrent.Await
@@ -37,9 +36,7 @@ import scala.concurrent.Future
3736
import scala.concurrent.duration.Duration
3837
import scala.xml.Node
3938
import scala.xml.SAXParseException
40-
4139
import javax.xml.parsers.DocumentBuilderFactory
42-
import javax.xml.parsers.SAXParserFactory
4340
import javax.xml.transform.TransformerFactory
4441
import javax.xml.transform.dom.DOMSource
4542
import javax.xml.transform.stream.StreamResult
@@ -55,7 +52,6 @@ import org.apache.daffodil.api.ValidationMode
5552
import org.apache.daffodil.api.WithDiagnostics
5653
import org.apache.daffodil.compiler.Compiler
5754
import org.apache.daffodil.compiler.InvalidParserException
58-
import org.apache.daffodil.configuration.ConfigurationLoader
5955
import org.apache.daffodil.debugger.CLIDebuggerRunner
6056
import org.apache.daffodil.debugger.InteractiveDebugger
6157
import org.apache.daffodil.debugger.TraceDebuggerRunner
@@ -98,6 +94,7 @@ import org.apache.daffodil.util.LoggingDefaults
9894
import org.apache.daffodil.util.Misc
9995
import org.apache.daffodil.util.Timer
10096
import org.apache.daffodil.validation.Validators
97+
import org.apache.daffodil.xml.DaffodilSAXParserFactory
10198
import org.apache.daffodil.xml.QName
10299
import org.apache.daffodil.xml.RefQName
103100
import org.apache.daffodil.xml.DaffodilXMLLoader
@@ -106,8 +103,10 @@ import org.rogach.scallop
106103
import org.rogach.scallop.ArgType
107104
import org.rogach.scallop.ScallopOption
108105
import org.rogach.scallop.ValueConverter
106+
import org.xml.sax.XMLReader
109107

110108
import scala.util.matching.Regex
109+
import scala.xml.SAXParser
111110

112111
class CommandLineSAXErrorHandler() extends org.xml.sax.ErrorHandler with Logging {
113112

@@ -576,7 +575,7 @@ object Main extends Logging {
576575
*/
577576
def loadConfigurationFile(file: File) = {
578577
val loader = new DaffodilXMLLoader()
579-
val node = ConfigurationLoader.getConfiguration(loader, file.toURI)
578+
val node = loader.load(URISchemaSource(file.toURI), Some(XMLUtils.dafextURI))
580579
node
581580
}
582581

@@ -803,14 +802,27 @@ object Main extends Logging {
803802
case Left(bytes) => new ByteArrayInputStream(bytes)
804803
case Right(is) => is
805804
}
806-
scala.xml.XML.load(is)
805+
val parser: SAXParser = {
806+
val f = DaffodilSAXParserFactory()
807+
f.setNamespaceAware(false)
808+
val p = f.newSAXParser()
809+
p
810+
}
811+
scala.xml.XML.withSAXParser(parser).load(is)
807812
}
808813
case InfosetType.JDOM => {
809814
val is = data match {
810815
case Left(bytes) => new ByteArrayInputStream(bytes)
811816
case Right(is) => is
812817
}
813-
new org.jdom2.input.SAXBuilder().build(is)
818+
val builder = new org.jdom2.input.SAXBuilder() {
819+
override protected def createParser(): XMLReader = {
820+
val rdr = super.createParser()
821+
XMLUtils.setSecureDefaults(rdr)
822+
rdr
823+
}
824+
}
825+
builder.build(is)
814826
}
815827
case InfosetType.W3CDOM => {
816828
val byteArr = data match {
@@ -821,6 +833,7 @@ object Main extends Logging {
821833
override def initialValue = {
822834
val dbf = DocumentBuilderFactory.newInstance()
823835
dbf.setNamespaceAware(true)
836+
dbf.setFeature(XMLUtils.XML_DISALLOW_DOCTYPE_FEATURE, true)
824837
val db = dbf.newDocumentBuilder()
825838
db.parse(new ByteArrayInputStream(byteArr))
826839
}
@@ -1484,7 +1497,7 @@ object Main extends Logging {
14841497
private def unparseWithSAX(
14851498
is: InputStream,
14861499
contentHandler: DFDL.DaffodilUnparseContentHandler): UnparseResult = {
1487-
val xmlReader = SAXParserFactory.newInstance.newSAXParser.getXMLReader
1500+
val xmlReader = DaffodilSAXParserFactory().newSAXParser.getXMLReader
14881501
xmlReader.setContentHandler(contentHandler)
14891502
xmlReader.setFeature(XMLUtils.SAX_NAMESPACES_FEATURE, true)
14901503
xmlReader.setFeature(XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE, true)

daffodil-core/src/main/scala/org/apache/daffodil/dsom/DFDLSchemaFile.scala

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,13 @@ package org.apache.daffodil.dsom
2020
import org.xml.sax.SAXParseException
2121
import org.apache.daffodil.xml.DaffodilXMLLoader
2222
import org.apache.daffodil.xml.NS
23-
import org.apache.daffodil.xml.XMLUtils
2423
import org.apache.daffodil.api._
2524
import org.apache.daffodil.dsom.IIUtils._
2625
import org.apache.daffodil.api.Diagnostic
2726
import org.apache.daffodil.oolag.OOLAG
2827
import org.apache.daffodil.util.LogLevel
2928
import org.apache.daffodil.util.Misc
29+
import org.apache.daffodil.xml.XMLUtils
3030

3131
/**
3232
* represents one schema document file
@@ -114,14 +114,14 @@ final class DFDLSchemaFile(
114114
}
115115
val node = try {
116116
log(LogLevel.Resolver, "Loading %s.", diagnosticDebugName)
117-
val ldr = new DaffodilXMLLoader(this)
118117
//
119118
// We do not want to validate here ever, because we have to examine the
120-
// root xs:schema eleemnt of a schema to decide if it is a DFDL schema
119+
// root xs:schema element of a schema to decide if it is a DFDL schema
121120
// at all that we're even supposed to compile.
122121
//
123-
ldr.setValidation(false)
124-
val node = ldr.load(schemaSource)
122+
val loader = new DaffodilXMLLoader(this)
123+
// need line numbers for diagnostics
124+
val node = loader.load(schemaSource, None, addPositionAttributes = true)
125125
schemaDefinitionUnless(node != null, "Unable to load XML from %s.", diagnosticDebugName)
126126
node
127127
} catch {
@@ -134,20 +134,17 @@ final class DFDLSchemaFile(
134134

135135
lazy val isDFDLSchemaFile = iiXMLSchemaDocument.isDFDLSchema
136136

137+
private lazy val loader = new DaffodilXMLLoader(this)
138+
137139
lazy val iiXMLSchemaDocument = LV('iiXMLSchemaDocument) {
138140
val res = loadXMLSchemaDocument(seenBefore, Some(this))
139141
if (res.isDFDLSchema && sset.validateDFDLSchemas) {
140142
//
141143
// We validate DFDL schemas, only if validation is requested.
142144
// Some things, tests generally, want to turn this validation off.
143145
//
144-
145-
val ldr = new DaffodilXMLLoader(this)
146-
ldr.setValidation(true)
147-
try {
148-
ldr.load(schemaSource) // validate as XML file with XML Schema for DFDL Schemas
149-
ldr.validateSchema(schemaSource) // validate as XSD (catches UPA errors for example)
150-
} catch {
146+
try loader.validateAsDFDLSchema(schemaSource) // validate as XSD (catches UPA errors for example)
147+
catch {
151148
// ok to absorb SAX Parse Exception as we've captured those errors in error handling
152149
// elsewhere.
153150
case _: org.xml.sax.SAXParseException => // ok

daffodil-core/src/main/scala/org/apache/daffodil/runtime1/SchemaSetRuntime1Mixin.scala

Lines changed: 57 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ import org.apache.daffodil.processors.parsers.NotParsableParser
3131
import org.apache.daffodil.processors.unparsers.NotUnparsableUnparser
3232
import org.apache.daffodil.util.LogLevel
3333

34+
import java.io.ObjectOutputStream
35+
3436
trait SchemaSetRuntime1Mixin { self : SchemaSet =>
3537

3638
requiredEvaluationsAlways(parser)
@@ -72,37 +74,61 @@ trait SchemaSetRuntime1Mixin { self : SchemaSet =>
7274
}.value
7375

7476
def onPath(xpath: String): DFDL.DataProcessor = {
75-
Assert.usage(!isError)
76-
if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path support is not yet implemented.""")
77-
val rootERD = root.elementRuntimeData
78-
root.schemaDefinitionUnless(
79-
rootERD.outputValueCalcExpr.isEmpty,
80-
"The root element cannot have the dfdl:outputValueCalc property.")
81-
val validationMode = ValidationMode.Off
82-
val p = if (!root.isError) parser else null
83-
val u = if (!root.isError) unparser else null
84-
val ssrd = new SchemaSetRuntimeData(
85-
p,
86-
u,
87-
this.diagnostics,
88-
rootERD,
89-
variableMap,
90-
typeCalcMap)
91-
if (root.numComponents > root.numUniqueComponents)
92-
log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
93-
root.numUniqueComponents, root.numComponents)
94-
val dataProc = new DataProcessor(ssrd, tunable, self.compilerExternalVarSettings)
95-
if (dataProc.isError) {
96-
// NO longer printing anything here. Callers must do this.
97-
// val diags = dataProc.getDiagnostics
98-
// log(LogLevel.Error,"Compilation (DataProcessor) reports %s compile errors/warnings.", diags.length)
99-
// diags.foreach { diag => log(LogLevel.Error, diag.toString()) }
100-
} else {
101-
log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
102-
log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
103-
log(LogLevel.Compile, "Compilation (DataProcesor) completed with no errors.")
104-
}
105-
dataProc
77+
Assert.usage(!isError)
78+
if (xpath != "/") root.notYetImplemented("""Path must be "/". Other path support is not yet implemented.""")
79+
val rootERD = root.elementRuntimeData
80+
root.schemaDefinitionUnless(
81+
rootERD.outputValueCalcExpr.isEmpty,
82+
"The root element cannot have the dfdl:outputValueCalc property.")
83+
val validationMode = ValidationMode.Off
84+
val p = if (!root.isError) parser else null
85+
val u = if (!root.isError) unparser else null
86+
val ssrd = new SchemaSetRuntimeData(
87+
p,
88+
u,
89+
this.diagnostics,
90+
rootERD,
91+
variableMap,
92+
typeCalcMap)
93+
if (root.numComponents > root.numUniqueComponents)
94+
log(LogLevel.Info, "Compiler: component counts: unique %s, actual %s.",
95+
root.numUniqueComponents, root.numComponents)
96+
val dataProc = new DataProcessor(ssrd, tunable, self.compilerExternalVarSettings)
97+
//
98+
// now we fake serialize to a dev/null-type output stream which forces
99+
// any lazy evaluation that hasn't completed to complete.
100+
// Those things could signal errors, so we do this before we check for errors.
101+
//
102+
// Note that calling preSerialization is not sufficient, since that's only mixed into
103+
// objects with lazy evaluation. A SSRD is just a tuple-like object, does not mixin
104+
// preSerialization, and shouldn't need to. We need to
105+
// serialize all its substructure to insure all preSerializations, that force
106+
// all lazy evaluations, are done.
107+
//
108+
// Overhead-wise, this is costly, if the caller is about to save the processor themselves
109+
// But as there have been cases of Runtime1 processors which end up doing lazy evaluation
110+
// that ends up happening late, this eliminates a source of bugs, albeit, by masking them
111+
// so they are not detectable.
112+
//
113+
// Best to address this for real when we refactor Runtime1 to fully separate it from
114+
// the schema compiler. At that point we can draw a firmer line about the compiler's output
115+
// being fully realized before runtime objects are constructed.
116+
//
117+
// We don't call save() here, because that does a few other things than just serialize.
118+
val oos = new ObjectOutputStream(org.apache.commons.io.output.NullOutputStream.NULL_OUTPUT_STREAM)
119+
oos.writeObject(dataProc)
120+
121+
if (dataProc.isError) {
122+
// NO longer printing anything here. Callers must do this.
123+
// val diags = dataProc.getDiagnostics
124+
// log(LogLevel.Error,"Compilation (DataProcessor) reports %s compile errors/warnings.", diags.length)
125+
// diags.foreach { diag => log(LogLevel.Error, diag.toString()) }
126+
} else {
127+
log(LogLevel.Compile, "Parser = %s.", ssrd.parser.toString)
128+
log(LogLevel.Compile, "Unparser = %s.", ssrd.unparser.toString)
129+
log(LogLevel.Compile, "Compilation (DataProcesor) completed with no errors.")
130+
}
131+
dataProc
106132
}
107133

108134
}
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
Licensed to the Apache Software Foundation (ASF) under one or more
4+
contributor license agreements. See the NOTICE file distributed with
5+
this work for additional information regarding copyright ownership.
6+
The ASF licenses this file to You under the Apache License, Version 2.0
7+
(the "License"); you may not use this file except in compliance with
8+
the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing, software
13+
distributed under the License is distributed on an "AS IS" BASIS,
14+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
See the License for the specific language governing permissions and
16+
limitations under the License.
17+
-->
18+
<!--
19+
This is a bad DTD, on purpose. The external DTD will not be found
20+
and an error to that effect tells us if the XML processor was processing
21+
the DTD, or ignoring it.
22+
-->
23+
<!DOCTYPE root SYSTEM "notFound.dtd">
24+
<root xmlns="http://example.com">
25+
<foo xmlns="">bar</foo>
26+
</root>

daffodil-core/src/test/scala/org/apache/daffodil/infoset/TestInfoset.scala

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ package org.apache.daffodil.infoset
1919

2020
import org.apache.daffodil.xml.XMLUtils
2121
import org.apache.daffodil.util._
22-
import org.apache.daffodil.Implicits._
2322
import org.apache.daffodil.compiler._
2423
import org.junit.Assert._
2524
import org.junit.Test
@@ -90,7 +89,6 @@ object TestInfoset {
9089
val msgs = pf.getDiagnostics.map { _.getMessage() }.mkString("\n")
9190
fail("pf compile errors: " + msgs)
9291
}
93-
pf.sset.root.erd.preSerialization // force evaluation of all compile-time constructs
9492
val dp = pf.onPath("/").asInstanceOf[DataProcessor]
9593
if (dp.isError) {
9694
val msgs = dp.getDiagnostics.map { _.getMessage() }.mkString("\n")

0 commit comments

Comments
 (0)