public class ScrapeableFile extends java.lang.Object implements ScriptInstancesContainer, Deleteable, RunningScrapeableFile, java.lang.Comparable, Uniqueable
Modifier and Type | Class and Description |
---|---|
static class |
ScrapeableFile.RequestType
Types of requests a scrapeable file can issue
|
Modifier and Type | Field and Description |
---|---|
(package private) boolean |
extractorPatternsMatched |
static org.apache.log4j.Logger |
log
Used for logging.
|
Constructor and Description |
---|
ScrapeableFile()
Needed by castor.
|
ScrapeableFile(int scrapeableFileID)
Generates and loads a
ScrapeableFile based on the ID. |
ScrapeableFile(int scrapeableFileID,
ScrapingSession scrapingSession)
Generates a new ScrapeableFile.
|
ScrapeableFile(ScrapeableFile scrapeableFile)
Generates a new scrapeable file based on the given scrapeable file.
|
ScrapeableFile(java.lang.String name)
Generates a new ScrapeableFile.
|
ScrapeableFile(java.lang.String name,
HTTPTransaction httpTransaction)
Generates a new ScrapeableFile with the given name, based on the given
HTTPTransaction . |
ScrapeableFile(java.lang.String name,
java.lang.String url,
int sequence)
Generates a new ScrapeableFile.
|
ScrapeableFile(java.lang.String name,
java.net.URL url)
Generates a new ScrapeableFile.
|
ScrapeableFile(java.lang.String name,
java.net.URL url,
java.lang.String postData)
Generates a new ScrapeableFile.
|
Modifier and Type | Method and Description |
---|---|
void |
addBasicHeaders()
Adds request headers to the custom headers, so they can be removed
|
void |
addExtractorPattern(ExtractorPattern extractorPattern)
Adds a pattern to the scrapeable file.
|
void |
addGETHTTPParameter(java.lang.String key,
java.lang.String value)
Adds a GET HTTP parameter with the given parameters to the end of the existing parameters.
|
void |
addGETHTTPParameter(java.lang.String key,
java.lang.String value,
int sequence)
Adds a GET HTTP parameter with the given parameters.
|
void |
addHTTPHeader(java.lang.String key,
java.lang.String value)
Adds a custom header.
|
void |
addHTTPParameter(HTTPParameter httpParameter)
Adds a parameter to the scrapeable file.
|
void |
addNavigationAction(NavigationAction navigationAction)
Adds a
NavigationCriteria to the set. |
void |
addPOSTHTTPParameter(java.lang.String key,
java.lang.String value)
Adds a POST HTTP parameter with the given parameters to the end of the existing parameters.
|
void |
addPOSTHTTPParameter(java.lang.String key,
java.lang.String value,
int sequence)
Adds a POST HTTP parameter with the given parameters.
|
protected void |
addRedirectURL(java.lang.String url)
Adds a redirect URL to the list.
|
XmlNode |
applyXPathExpression(java.lang.String expression)
Applies an XPath expression to the current response.
|
Form |
buildForm(java.lang.String formText)
Builds a form using the text given, which should contain all the form values (including the form open and close tags)
|
void |
bumpHTTPParameterSequences(int bottomSequence,
int topSequence,
int bumpModifier)
Bumps sequences up or down between the bottom and top sequences.
|
int |
compareTo(java.lang.Object scrapeableFile)
Required by the
Comparable interface. |
boolean |
containsFileUpload()
Indicates whether or not the file contains POST parameters.
|
void |
delete()
Deletes the scrapeable file from the database.
|
void |
dumpHTTPParametersToLog()
Dumps all of the HTTP parameters to the log.
|
void |
dumpToLog()
Dumps the scrapeable file to the log.
|
protected void |
ensureNavigationActionsHaveScrapeableFileReference()
Ensures that all of the navigation actions held by this object have a reference to the object.
|
boolean |
equals(java.lang.Object object)
Overriding equals...
|
DataSet |
extractData(ScriptContext scriptContext,
ExtractorPattern extractorPattern,
DataSet dataSet,
boolean scripts)
Extracts data by applying the given
ExtractorPattern to the supplied text. |
DataSet |
extractData(ScriptContext scriptContext,
ExtractorPattern extractorPattern,
DataSet dataSet,
boolean scripts,
boolean isApplyPattern)
Extract Data from an Extracter Pattern Handle any mapping, resolving of url's or saving of session variables that need to be done
|
DataSet |
extractData(java.lang.String text,
java.lang.String name)
Manually extracts data using an existing
ExtractorPattern . |
java.lang.String |
extractOneValue(java.lang.String text,
java.lang.String name)
Manually extracts one value from data using an existing
ExtractorPattern . |
java.lang.String |
extractOneValue(java.lang.String text,
java.lang.String name,
java.lang.String token)
Manually extracts one value from data using an existing
ExtractorPattern . |
void |
flagAsLoaded()
Flags it as being fully loaded.
|
DataRecord |
getASPXValues(boolean onlyStandard)
Gets the ASPX .NET values from the string.
|
boolean |
getAuthenticationPreemptive()
Indicates whether or not authentication should be done preemptively.
|
java.lang.String |
getBASICAuthenticationPassword()
Gets the BASIC authentication password of the file to be scraped.
|
java.lang.String |
getBASICAuthenticationUsername()
Gets the BASIC authentication username of the file to be scraped.
|
java.lang.String |
getCharacterSet()
Gets the character encoding of the content given by the server.
|
java.lang.String |
getContentAsString()
Gets the scraped content as a string.
|
java.lang.String |
getContentBodyOnly()
Gets only the body of the response (no headers or status line) as a string.
|
java.lang.String |
getContentType()
Gets the custom "Content-Type" header to be sent.
|
java.lang.String |
getCurrentPOSTData()
Gets the current value of the POST data.
|
java.lang.String |
getCurrentURL()
Depending on when this value is requested it will return either the unresolved or resolved URL of the file to be scraped.
|
java.util.Set<ScrapingHttpHeader> |
getCustomHeaders()
Gets the custom headers hash set (not a copy of it)
|
boolean |
getEnableJavaScript()
Indicates whether or not JavaScript embedded in the HTML page should be processed.
|
ExtractorPattern |
getExtractorPattern(int sequence)
Gets an extractor pattern from the
ScrapeableFile . |
ExtractorPattern |
getExtractorPattern(java.lang.String name)
Gets an extractor pattern from the
ScrapeableFile . |
ExtractorPattern |
getExtractorPatternByID(int extractorPatternID)
Gets the extractor pattern by its ID
|
java.util.HashSet |
getExtractorPatterns()
Returns all extractor patterns held by this scrapeable file.
|
java.util.Iterator |
getExtractorPatternsIterator()
Gets an
Iterator that can be used to cycle through the ExtractorPattern objects held by this object. |
boolean |
getExtractorPatternTimedOut()
Indicates whether or not the extractor pattern timed out on the most recent try.
|
boolean |
getForceMultiPart()
Indicates whether or not a multi-part request should be forced.
|
boolean |
getForceNonBinary()
Indicates whether or not the contents of this response should be forced to be treated as non-binary.
|
boolean |
getForcePOST()
Indicates whether or not a POST request should be forced.
|
com.gargoylesoftware.htmlunit.html.HtmlPage |
getHtmlPage()
Gets an
HtmlPage first by attempting to see if one was already given from the last request. |
HTTPParameter |
getHTTPParameter(int sequence)
Gets an HTTP parameter from the session.
|
HTTPParameter |
getHTTPParameter(java.lang.String key)
Gets an HTTP parameter from the session.
|
HTTPParameter |
getHTTPParameterByName(java.lang.String key)
Gets an HTTP parameter by its name.
|
java.util.HashSet<HTTPParameter> |
getHTTPParameters()
Gets all of the HTTP parameters held by this scrapeable file.
|
java.lang.String |
getHTTPResponseHeader(java.lang.String header)
Gets the value of the header in the response of the scrapeable file, or returns null if it couldn't be found
|
java.util.Map<java.lang.String,java.lang.String> |
getHTTPResponseHeaders()
Gets the headers of the HTTP Response as a map, and returns them.
|
java.lang.String |
getHTTPResponseHeaderSection()
Gets the header section of the HTTP Response
|
int |
getHTTPTransactionID()
Gets the ID of the HTTP transaction associated with this scrapeable file.
|
int |
getID()
Gets the ID of the scrapeable file.
|
java.lang.String |
getIdentifier()
Gets the identifier for this object.
|
ScrapingResponse |
getLastHttpResponse()
Gets the last http response received
|
java.lang.String |
getLastRequest()
Gets the request from the last time it was scraped.
|
java.lang.String |
getLastRequestShort()
Returns the last request, truncating it if it's too long.
|
java.net.URL |
getLastRequestURL()
Gets the last request URL used for this scrapeable file
|
java.lang.String |
getLastScrapedData()
Gets the data from the file from the last time it was scraped.
|
java.lang.String |
getLastScrapedDataShort()
Gets the data from the file from the last time it was scraped, truncating it, if necessary.
|
boolean |
getLastTidyAttemptFailed()
Indicates whether or not the last attempt to tidy the response failed.
|
boolean |
getMaxRequestAttemptsReached()
Indicates whether or not the maximum requests attempts were reached on the most recent try.
|
int |
getMaxResponseLength()
Gets the maximum response length, in kilobytes.
|
java.lang.String |
getName()
Gets the name.
|
NavigationAction |
getNavigationAction(int sequence)
Gets a navigation action from the
ScrapeableFile . |
protected NavigationActionDoer |
getNavigationActionDoer()
Gets the current navigation action doer to be processed.
|
java.util.TreeSet |
getNavigationActions()
Gets a reference to the
TreeSet containing the NavigationAction objects. |
java.util.Iterator |
getNavigationActionsIterator()
Gets an
Iterator that can be used to cycle through the NavigationAction objects held by this object. |
javax.swing.tree.DefaultMutableTreeNode |
getNode()
Gets the node representing this scraping file.
|
java.lang.String |
getNonTidiedHTML()
Gets the non-tidied HTML.
|
java.lang.String |
getNTLMAuthenticationDomain()
get the domain for NTML Authentication
|
java.lang.String |
getNTLMAuthenticationHost(java.net.URL url) |
int |
getNumExtractorPatterns()
Returns the number of extractor patterns currently held by the file.
|
int |
getNumHTTPParameters()
Returns the number of parameters currently held by the file.
|
int |
getNumNavigationActions()
Gets the number of
NavigationAction objects in the set. |
java.lang.String[] |
getRedirectURLs()
Gets an array of strings containing the redirect URL's for the current scrapeable file request attempt.
|
java.net.URL |
getReferer()
Gets the URL that should be used for the referer HTTP header when scraping this file.
|
java.lang.String |
getRequestEntity()
Gets the request entity to be used in lieu of POST data.
|
ScrapeableFile.RequestType |
getRequestType()
Returns the type of the request that has been set
|
java.lang.String |
getResolvedURL()
Gets the URL after it's been resolved but before any redirects.
|
boolean |
getRetainNonTidiedHTML()
Gets whether or not the tidied HTML should be retained.
|
RetryPolicy |
getRetryPolicy()
Returns the retry policy.
|
RunningScrapingSession |
getRunningScrapingSession()
Gets the current running scraping session.
|
ScrapeableFileNotifiable |
getScrapeableFileNotifiable()
Gets the
ScrapeableFileNotifiable object associated with this scrapeable file. |
ScrapingSession |
getScrapingSession()
Gets the scraping session associated with this scrapeable file.
|
ScriptInstances |
getScriptInstances()
Returns a
ScriptInstances object containing all script instances associated with this scrapeable file. |
int |
getSequence()
Gets the sequence with which the parameter should be sent.
|
boolean |
getShouldHighlightSyntax() |
int |
getStatusCode()
Gets the status code that was returned from the server when the request was made.
|
java.lang.String |
getStrippedContentAsString()
Gets the scraped content as a string, with all new line characters removed
|
java.lang.String |
getTidyHTML() |
int |
getTidyHTMLAsInt()
Indicates how the HTML should be tidied.
|
java.lang.String |
getURL()
Gets the URL of the file to be scraped.
|
boolean |
getUsePlusForEncodedSpaces()
Indicates whether or not + instead of %20 should be used to encode space characters.
|
java.lang.String |
getUserAgent()
Gets the user-agent to be used in HTTP requests.
|
boolean |
getWillBeInvokedManually()
Gets whether or not this file will be invoked manually.
|
boolean |
inputOutputErrorOccurred()
Indicates whether or not an input/output error occured while requesting the file.
|
void |
invokeAction(java.lang.String name)
Invokes a navigation action identified by the given name.
|
protected void |
load()
Loads the scrapeable file with all of its data.
|
protected void |
loadNavigationActions()
Loads navigation actions related to this scrapeable file.
|
boolean |
noExtractorPatternsMatched()
Indicates whether or not none of the extractor patterns for this scrapeable file found a match.
|
void |
removeAllHTTPHeaders()
Removes all custom HTTP headers that were previously added.
|
void |
removeAllHTTPParameters()
Removes all HTTP parameters from the scrapeable file.
|
void |
removeExtractorPattern(ExtractorPattern extractorPattern)
Removes an extractor pattern.
|
void |
removeHTTPHeader(java.lang.String key)
removes all Http Headers with the key
|
void |
removeHTTPHeader(java.lang.String key,
java.lang.String value)
removes all http headers with the corresponding key/value
|
void |
removeHTTPParameter(int sequence)
Removes an HTTP parameter, automatically re-sequencing.
|
void |
removeHTTPParameter(int sequence,
boolean doResequence)
Removes an HTTP parameter.
|
void |
removeHTTPParameter(java.lang.String key)
Removes an HTTP parameter by its name, automatically re-sequencing.
|
void |
removeNavigationAction(NavigationAction navigationAction)
Removes a navigation action.
|
void |
resequenceHTTPParameter(java.lang.String key,
int sequence)
Gives an HTTP parameter a new sequence.
|
void |
reset()
Resets the object after a scrape to ensure that nothing is being retained in memory that shouldn't be.
|
void |
resetContainsNullableTokens()
Resets the containsNullableTokens value for each of the extractor patterns.
|
protected void |
resetCustomHeaders()
Resets the custom headers hash set.
|
void |
resetNonPersistentValues()
This method should be used to reset values that should not persist across times the scrape is run, but since we store the session in memory they can persist in error
|
void |
resetReferer()
Resets the referer value to null.
|
java.lang.String |
resolveRelativeURL(java.lang.String urlToResolve)
Resolve a relative URL
|
void |
restoreHTTPParametersAfterScraping()
Indicates that HTTP parameters should be restored after the scraping session finishes scraping.
|
void |
restoreOriginalHTTPParameters()
Restores the original HTTP parameters.
|
void |
save()
Saves this scraping session to the database.
|
void |
saveFileBeforeTidying(java.lang.String fileToSaveToBeforeTidying)
Sets the path to save the file to just before it gets tidied.
|
void |
saveFileOnRequest(java.lang.String fileToSaveToOnRequest)
Sets the path to save the file to just after it gets requested.
|
protected void |
scrape(ScrapingHttpClient httpClient)
Causes the scrapeable file to scrape its data, extract the needed information and run any scripts.
|
(package private) void |
scrapeFromString(java.lang.String content) |
void |
setAuthenticationPreemptive(boolean authenticationPreemptive)
Sets whether or not authentication should be done preemptively.
|
void |
setBASICAuthenticationPassword(java.lang.String BASICAuthenticationPassword)
Sets the BASIC authentication password of the file to be scraped.
|
void |
setBASICAuthenticationUsername(java.lang.String BASICAuthenticationUsername)
Sets the BASIC authentication username of the file to be scraped.
|
void |
setCharacterSet(java.lang.String characterSet)
Sets the character encoding of the content given by the server.
|
void |
setContentType(java.lang.String contentType)
Sets a custom "Content-Type" header to be sent.
|
protected void |
setCurrentURL(java.lang.String currentURL)
Sets the resolved or unresolved URL.
|
void |
setEnableJavaScript(boolean enableJavaScript)
Determines whether or not JavaScript embedded in the HTML page should be processed.
|
void |
setForcedRequestType(ScrapeableFile.RequestType type)
Sets the request type to use.
|
void |
setForceMultiPart(boolean forceMultiPart)
Determines whether or not a multi-part request should be forced.
|
void |
setForceNonBinary(boolean forceNonBinary)
Determines whether or not the contents of this response should be forced to be treated as non-binary.
|
void |
setForcePOST(boolean forcePOST)
Determines whether or not a POST request should be forced.
|
void |
setHTTPTransactionID(int httpTransactionID)
Sets the ID of the HTTP transaction associated with this scrapeable file.
|
void |
setIdentifier(java.lang.String identifier)
Sets the identifier for this object.
|
void |
setLastHttpResponse(ScrapingResponse lastHttpResponse)
Sets the last http response received
|
void |
setLastRequest(java.lang.String lastRequest)
Sets the request from the last time it was scraped.
|
void |
setLastRequestURL(java.net.URL lastRequestURL)
Sets the last request URL used for this scrapeable file
|
void |
setLastScrapedData(java.lang.String lastScrapedData)
Sets the data from the file from the last time it was scraped.
|
void |
setLastTidyAttemptFailed(boolean lastTidyAttemptFailed)
Sets whether or not the last attempt to tidy the response failed.
|
void |
setMaxResponseLength(int maxKBytes)
Sets the maximum response length, in kilobytes.
|
void |
setName(java.lang.String name)
Sets the name.
|
protected void |
setNavigationActionDoer(NavigationActionDoer navigationActionDoer)
Sets the current navigation action doer to be processed.
|
void |
setNode(javax.swing.tree.DefaultMutableTreeNode node)
Sets the node representing this scraping file.
|
void |
setNTLMAuthenticationDomain(java.lang.String domain)
set the domain for NTML Authentication, default is localhost
|
void |
setNTLMAuthenticationHost(java.lang.String host)
set the host for NTML Authentication, default is
|
void |
setParametersFromForm(Form form)
Sets the parameters for a ScrapeableFile to the values in the form object.
|
void |
setReferer(java.lang.String strReferer)
Sets the URL as a string that should be used for the referer HTTP header when scraping this file.
|
void |
setReferer(java.net.URL referer)
Sets the URL that should be used for the referer HTTP header when scraping this file.
|
void |
setRequestEntity(java.lang.String requestEntity)
Sets the request entity to be used in lieu of POST data.
|
void |
setRetainNonTidiedHTML(boolean retainNonTidiedHTML)
Sets whether or not the tidied HTML should be retained.
|
void |
setRetryPolicy(RetryPolicy policy)
Sets a Retry Policy that will be run to check if a page should be re-downloaded or not.
|
void |
setRunningScrapingSession(RunningScrapingSession runningScrapingSession)
Sets the current running scraping session.
|
void |
setScrapeableFileNotifiable(ScrapeableFileNotifiable scrapeableFileNotifiable)
Sets the file that will be notified as speciifc pieces of data in this object get updated.
|
void |
setScrapingSession(ScrapingSession scrapingSession)
Sets the scraping session to be associated with this scrapeable file.
|
protected void |
setScrapingSessionNotifiable(ScrapingSessionNotifiable scrapingSessionNotifiable)
Sets the object the scrapeable file will use to notify its progress of.
|
void |
setScriptInstances(ScriptInstances scriptInstances)
Sets the script instances to be associated with this scrapeable file.
|
void |
setSequence(int sequence)
Sets the sequence with which the parameter should be sent.
|
void |
setShouldHighlightSyntax(boolean b)
Sets the boolean representing whether or not the HTML on the Last Response pane should be syntax highlighted to the given value.
|
void |
setTidyHTML(java.lang.String tidyHTML) |
void |
setTidyHTMLAsInt(int tidyHTML)
Determines how the HTML is to be tidied.
|
void |
setURL(java.lang.String url)
Sets the URL of the file to be scraped.
|
void |
setUsePlusForEncodedSpaces(boolean usePlusForEncodedSpaces)
Sets whether or not + instead of %20 should be used to encode space characters.
|
void |
setUserAgent(java.lang.String userAgent)
Sets the user-agent to be used in HTTP requests.
|
void |
setWillBeInvokedManually(boolean willBeInvokedManually)
Sets whether or not this file will be invoked manually.
|
java.lang.String |
toString()
Overridden so that the name appears as the title in the main tree.
|
boolean |
wasErrorOnRequest()
Determines whether or not an error occurred in requesting the file.
|
public static org.apache.log4j.Logger log
boolean extractorPatternsMatched
public ScrapeableFile()
public ScrapeableFile(int scrapeableFileID)
ScrapeableFile
based on the ID. Note that this constructor should be used only for testing purposes.scrapeableFileID
- The ID of a scrapeable file.public ScrapeableFile(@NotNull ScrapeableFile scrapeableFile)
scrapeableFile
- A ScrapeableFile
.public ScrapeableFile(java.lang.String name)
name
- A name to identify the file.public ScrapeableFile(java.lang.String name, java.net.URL url)
name
- A name to identify the file.url
- The URL that should be used to populate the file.public ScrapeableFile(java.lang.String name, @Nullable java.net.URL url, @Nullable java.lang.String postData)
name
- A name to identify the file.url
- The URL that should be used to populate the file.postData
- A string representation of the POST parameters.public ScrapeableFile(int scrapeableFileID, ScrapingSession scrapingSession)
scrapeableFileID
- The ID of the scrapeable file.scrapingSession
- The ScrapingSession
associated with this scrapeable file.public ScrapeableFile(java.lang.String name, @Nullable HTTPTransaction httpTransaction)
HTTPTransaction
.name
- A name to identify the file.httpTransaction
- The HTTPTransaction
on which it's to be based.public ScrapeableFile(java.lang.String name, java.lang.String url, int sequence)
name
- A name to identify the file.url
- The URL of the file to be scraped.sequence
- The sequence.@InternalOnly public int getID()
getID
in interface ScriptInstancesContainer
@Nullable public java.lang.String getName()
getName
in interface RunningScrapeableFile
getName
in interface ScriptInstancesContainer
@InternalOnly public void setName(java.lang.String name)
name
- The name.@Nullable public java.lang.String getURL()
public void setURL(@Nullable java.lang.String url)
url
- The URL.@Nullable public java.lang.String getCurrentURL()
getCurrentURL
in interface RunningScrapeableFile
protected void setCurrentURL(java.lang.String currentURL)
currentURL
- The URL.@Nullable public java.lang.String getResolvedURL()
public int getSequence()
@InternalOnly public void setSequence(int sequence)
sequence
- The sequence.@Nullable public java.lang.String getBASICAuthenticationUsername()
public void setBASICAuthenticationUsername(java.lang.String BASICAuthenticationUsername)
BASICAuthenticationUsername
- The BASIC authentication username.@Nullable public java.lang.String getBASICAuthenticationPassword()
public void setBASICAuthenticationPassword(java.lang.String BASICAuthenticationPassword)
BASICAuthenticationPassword
- The BASIC authentication password.@Nullable public java.lang.String getNTLMAuthenticationDomain()
public void setNTLMAuthenticationDomain(java.lang.String domain)
domain
- The domain for the NTML Authenticationpublic void setNTLMAuthenticationHost(java.lang.String host)
host
- The host for NTML Authentication@Nullable @InternalOnly public java.lang.String getNTLMAuthenticationHost(@NotNull java.net.URL url)
@InternalOnly public void setLastHttpResponse(@Nullable ScrapingResponse lastHttpResponse)
lastHttpResponse
- The last http response@Nullable @InternalOnly public ScrapingResponse getLastHttpResponse()
@Nullable public java.lang.String getLastRequest()
@Nullable public java.lang.String getLastRequestShort()
@InternalOnly public void setLastRequest(@Nullable java.lang.String lastRequest)
lastRequest
- The data.public boolean wasErrorOnRequest()
wasErrorOnRequest
in interface RunningScrapeableFile
public boolean inputOutputErrorOccurred()
public boolean noExtractorPatternsMatched()
noExtractorPatternsMatched
in interface RunningScrapeableFile
@Nullable public java.lang.String getContentAsString()
RunnableScrapeableFile
facade.getContentAsString
in interface RunningScrapeableFile
@Nullable public java.lang.String getStrippedContentAsString()
RunningScrapeableFile
getStrippedContentAsString
in interface RunningScrapeableFile
@Nullable public java.lang.String getLastScrapedData()
@Nullable public java.lang.String getContentBodyOnly()
@Nullable public java.lang.String getLastScrapedDataShort()
@InternalOnly public void setLastScrapedData(@Nullable java.lang.String lastScrapedData)
lastScrapedData
- The data.@InternalOnly public void setLastRequestURL(@Nullable java.net.URL lastRequestURL)
lastRequestURL
- The last request URL used, or null if unknown or wasn't a valid URL@Nullable @InternalOnly public java.net.URL getLastRequestURL()
@Nullable @InternalOnly public ScrapingSession getScrapingSession()
@InternalOnly public void setScrapingSession(ScrapingSession scrapingSession)
scrapingSession
- The scraping session.@InternalOnly public void setNode(javax.swing.tree.DefaultMutableTreeNode node)
node
- A DefaultMutableTreeNode
.@Nullable @InternalOnly public javax.swing.tree.DefaultMutableTreeNode getNode()
DefaultMutableTreeNode
.@Nullable @InternalOnly public RunningScrapingSession getRunningScrapingSession()
@InternalOnly public void setRunningScrapingSession(RunningScrapingSession runningScrapingSession)
runningScrapingSession
- The running scraping session.@NotNull public java.util.HashSet<HTTPParameter> getHTTPParameters()
HashSet
of HTTPParameter
s.@Nullable public HTTPParameter getHTTPParameterByName(java.lang.String key)
key
- The name of the parameter.@RequiredVersion(value=1) public void addGETHTTPParameter(java.lang.String key, java.lang.String value, int sequence)
key
- The name of the parameter.value
- The value of the parameter.sequence
- The sequence of the parameter.@RequiredVersion(value=1) public void addGETHTTPParameter(java.lang.String key, java.lang.String value)
key
- The name of the parameter.value
- The value of the parameter.@RequiredVersion(value=1) public void addPOSTHTTPParameter(java.lang.String key, java.lang.String value, int sequence)
key
- The name of the parameter.value
- The value of the parameter.sequence
- The sequence of the parameter.@RequiredVersion(value=1) public void addPOSTHTTPParameter(java.lang.String key, java.lang.String value)
key
- The name of the parameter.value
- The value of the parameter.public void addHTTPParameter(@Nullable HTTPParameter httpParameter)
addHTTPParameter
in interface RunningScrapeableFile
httpParameter
- An HTTPParameter
.@RequiredVersion(value=1) public void resequenceHTTPParameter(java.lang.String key, int sequence)
key
- The name of the parameter.sequence
- The new sequence.@InternalOnly public void bumpHTTPParameterSequences(int bottomSequence, int topSequence, int bumpModifier)
bottomSequence
- The bottom sequence.topSequence
- The top sequence.bumpModifier
- Modifies each sequence.public void removeHTTPParameter(int sequence)
sequence
- The sequence.@RequiredVersion(value=1) public void removeHTTPParameter(java.lang.String key)
key
- The name of the parameter.@InternalOnly public void removeHTTPParameter(int sequence, boolean doResequence)
sequence
- The sequence.doResequence
- Indicates whether or not the parametrs should be re-sequenced afterward.@RequiredVersion(value=1) public void removeAllHTTPParameters()
removeAllHTTPParameters
in interface RunningScrapeableFile
@InternalOnly public void dumpHTTPParametersToLog()
@Nullable public HTTPParameter getHTTPParameter(int sequence)
sequence
- The sequence of the parameter to be retrieved.@Nullable public HTTPParameter getHTTPParameter(java.lang.String key)
key
- The key of the parameter to be retrieved.@InternalOnly public void addExtractorPattern(@Nullable ExtractorPattern extractorPattern)
extractorPattern
- An ExtractorPattern
.@InternalOnly public void removeExtractorPattern(@Nullable ExtractorPattern extractorPattern)
extractorPattern
- The pattern to be removed.@InternalOnly @Nullable public ExtractorPattern getExtractorPattern(int sequence)
ScrapeableFile
.sequence
- The sequence of the pattern to be retrieved.@Nullable @InternalOnly public ExtractorPattern getExtractorPattern(@Nullable java.lang.String name)
ScrapeableFile
.name
- The name of the pattern to be retrieved. If more than one is found the first one will be returned, ordering by sequence.@Nullable @InternalOnly public ExtractorPattern getExtractorPatternByID(int extractorPatternID)
extractorPatternID
- The ID of the pattern to be retrieved.@InternalOnly public void resetContainsNullableTokens()
@InternalOnly public java.util.HashSet getExtractorPatterns()
HashSet
containing the patterns.@NotNull @InternalOnly public java.util.Iterator getExtractorPatternsIterator()
Iterator
that can be used to cycle through the ExtractorPattern
objects held by this object.Iterator
.@InternalOnly public int getNumExtractorPatterns()
public int getNumHTTPParameters()
@InternalOnly public ScriptInstances getScriptInstances()
ScriptInstances
object containing all script instances associated with this scrapeable file.getScriptInstances
in interface ScriptInstancesContainer
@InternalOnly public void setScriptInstances(ScriptInstances scriptInstances)
setScriptInstances
in interface ScriptInstancesContainer
scriptInstances
- The script instances.@Nullable @InternalOnly public java.lang.String toString()
toString
in class java.lang.Object
@InternalOnly public int compareTo(@NotNull java.lang.Object scrapeableFile)
Comparable
interface.compareTo
in interface java.lang.Comparable
scrapeableFile
- The file to compare to.@InternalOnly public boolean equals(@Nullable java.lang.Object object)
equals
in class java.lang.Object
object
- The object we're comparing.@InternalOnly public void save()
@InternalOnly public void delete()
delete
in interface Deleteable
@InternalOnly public boolean containsFileUpload()
boolean
.@Nullable public java.lang.String getCurrentPOSTData()
RunningScrapeableFile
so that the resolved POST data can be made accessible in a script. If this method is
called
before this scrapeable file is scraped it will return null.getCurrentPOSTData
in interface RunningScrapeableFile
protected void scrape(@Nullable ScrapingHttpClient httpClient)
httpClient
- The HttpClient
object to be used in scraping the data.@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setForcedRequestType(ScrapeableFile.RequestType type)
type
- The type of request to issue, or null to let screen-scraper decideprotected void addRedirectURL(java.lang.String url)
url
- The URL as a string.@NotNull @RequiredVersion(value=1) public java.lang.String[] getRedirectURLs()
void scrapeFromString(@Nullable java.lang.String content)
@NotNull @InternalOnly public java.util.Set<ScrapingHttpHeader> getCustomHeaders()
public void removeAllHTTPHeaders()
@RequiredVersion(value=1) public void addHTTPHeader(@NotNull java.lang.String key, @Nullable java.lang.String value)
key
- The key.value
- The value.@RequiredVersion(value=1) public void removeHTTPHeader(@Nullable java.lang.String key)
key
- java.lang.Exception
@RequiredVersion(value=2) public void removeHTTPHeader(@Nullable java.lang.String key, @Nullable java.lang.String value)
key
- The name of the headervalue
- The value of the headerjava.lang.Exception
- In an invalid version of screen-scraperprotected void resetCustomHeaders()
@InternalOnly public void addBasicHeaders()
protected void setScrapingSessionNotifiable(ScrapingSessionNotifiable scrapingSessionNotifiable)
scrapingSessionNotifiable
- The object.@InternalOnly public boolean getWillBeInvokedManually()
@InternalOnly public void setWillBeInvokedManually(boolean willBeInvokedManually)
willBeInvokedManually
- A boolean.@Nullable @InternalOnly public ScrapeableFileNotifiable getScrapeableFileNotifiable()
ScrapeableFileNotifiable
object associated with this scrapeable file.ScrapeableFileNotifiable
.@InternalOnly public void setScrapeableFileNotifiable(ScrapeableFileNotifiable scrapeableFileNotifiable)
scrapeableFileNotifiable
- The notifiable object.@Nullable public java.net.URL getReferer()
public void setReferer(java.net.URL referer)
referer
- The referer URL.@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setReferer(@Nullable java.lang.String strReferer)
strReferer
- The referer URL.public void resetReferer()
@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setContentType(java.lang.String contentType)
contentType
- The "Content-Type" value to be sent.@Nullable public java.lang.String getContentType()
@Nullable @InternalOnly public DataSet extractData(@NotNull ScriptContext scriptContext, @NotNull ExtractorPattern extractorPattern, DataSet dataSet, boolean scripts)
ExtractorPattern
to the supplied text. Calls extractData( scriptContext, extractorPattern, dataSet, scripts, true ).scriptContext
- The ScriptContext
to use when extracting dataextractorPattern
- The ExtractorPattern
to extract fromdataSet
- The DataSet
with the extracted datascripts
- Whether to execute the scripts associated with this extractor patternDataSet
containing the extracted data.@Nullable @InternalOnly public DataSet extractData(@NotNull ScriptContext scriptContext, @NotNull ExtractorPattern extractorPattern, DataSet dataSet, boolean scripts, boolean isApplyPattern)
scriptContext
- The ScriptContext
to use when extracting dataextractorPattern
- The ExtractorPattern
to extract fromdataSet
- The DataSet
with the extracted datascripts
- Whether to execute the scripts associated with this extractor patternisApplyPattern
- Whether to execute extra commands when 'Apply Pattern to Last Scraped Data' pushed.DataSet
with the extracted data@Nullable @RequiredVersion(value=1) public XmlNode applyXPathExpression(java.lang.String expression)
expression
- An XPath expression.XmlNode
.@Nullable @RequiredVersion(value=1, behavior=LOG_ERROR_ONLY) @Contract(value="!null -> !null; null -> null") public java.lang.String resolveRelativeURL(@Nullable java.lang.String urlToResolve)
resolveRelativeURL
in interface RunningScrapeableFile
urlToResolve
- The relative URL to resolve.@Nullable @RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public DataSet extractData(@Nullable java.lang.String text, @Nullable java.lang.String name)
ExtractorPattern
.extractData
in interface RunningScrapeableFile
text
- The text to extract data from.name
- The name of the ExtractorPattern
to be used. The ExtractorPattern
must be associated with this ScrapeableFile
. If more than one ExtractorPattern
is
found with the given name the first (by sequence) will be used.DataSet
containing the extracted data.@Nullable @RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public java.lang.String extractOneValue(java.lang.String text, java.lang.String name)
ExtractorPattern
.extractOneValue
in interface RunningScrapeableFile
text
- The text to extract data from.name
- The name of the ExtractorPattern
to be used. The ExtractorPattern
must be associated with this ScrapeableFile
. If more than one ExtractorPattern
is
found with the given name the first (by sequence) will be used.String
containing the extracted data. If nothing mathces null is returned.@Nullable @RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public java.lang.String extractOneValue(@Nullable java.lang.String text, @Nullable java.lang.String name, @Nullable java.lang.String token)
ExtractorPattern
.extractOneValue
in interface RunningScrapeableFile
text
- The text to extract data from.name
- The name of the ExtractorPattern
to be used. The ExtractorPattern
must be associated with this ScrapeableFile
. If more than one ExtractorPattern
is
found with the given name the first (by sequence) will be used.token
- The name of the Token to returnString
containing the extracted data. If nothing mathces null is returned.protected void load()
@InternalOnly public void flagAsLoaded()
@InternalOnly public int getTidyHTMLAsInt()
@NotNull @InternalOnly public java.lang.String getTidyHTML()
@InternalOnly public void setTidyHTML(java.lang.String tidyHTML)
@InternalOnly public void setTidyHTMLAsInt(int tidyHTML)
tidyHTML
- An int corresponding to one of the TIDY_HTML constants.@Nullable public java.lang.String getCharacterSet()
public void setCharacterSet(java.lang.String characterSet)
characterSet
- The character encoding.@Nullable public java.lang.String getUserAgent()
@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setUserAgent(@Nullable java.lang.String userAgent)
userAgent
- The user-agent.public boolean getRetainNonTidiedHTML()
@RequiredVersion(value=2, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setRetainNonTidiedHTML(boolean retainNonTidiedHTML)
retainNonTidiedHTML
- A boolean.@Nullable @RequiredVersion(value=2, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public java.lang.String getNonTidiedHTML()
@Nullable public java.lang.String getRequestEntity()
public void setRequestEntity(java.lang.String requestEntity)
requestEntity
- The entity, as a string.@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public int getStatusCode()
public void restoreOriginalHTTPParameters()
public void restoreHTTPParametersAfterScraping()
@RequiredVersion(value=2, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void saveFileOnRequest(java.lang.String fileToSaveToOnRequest)
saveFileOnRequest
in interface RunningScrapeableFile
fileToSaveToOnRequest
- The path of the file to save to.@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void saveFileBeforeTidying(java.lang.String fileToSaveToBeforeTidying)
fileToSaveToBeforeTidying
- The path of the file to save to.public boolean getForceMultiPart()
@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setForceMultiPart(boolean forceMultiPart)
forceMultiPart
- A boolean.public boolean getForcePOST()
@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setForcePOST(boolean forcePOST)
forcePOST
- Whether a POST should be forced or not@NotNull @InternalOnly public ScrapeableFile.RequestType getRequestType()
@InternalOnly public void invokeAction(@Nullable java.lang.String name)
name
- The name of the NavigationAction
.@InternalOnly public void resetNonPersistentValues()
@Nullable @InternalOnly public com.gargoylesoftware.htmlunit.html.HtmlPage getHtmlPage()
HtmlPage
first by attempting to see if one was already given from the last request. Failing that, it generates one based on the last scraped data and current URL.HtmlPage
.protected void loadNavigationActions()
@Nullable @InternalOnly public java.util.TreeSet getNavigationActions()
TreeSet
containing the NavigationAction
objects. Note that this is public only for Castor.TreeSet
.protected void ensureNavigationActionsHaveScrapeableFileReference()
@InternalOnly public void addNavigationAction(@NotNull NavigationAction navigationAction)
NavigationCriteria
to the set.navigationAction
- The NavigationCriteria.@NotNull @InternalOnly public java.util.Iterator getNavigationActionsIterator()
Iterator
that can be used to cycle through the NavigationAction
objects held by this object.Iterator
.@InternalOnly public int getNumNavigationActions()
NavigationAction
objects in the set.@InternalOnly public void removeNavigationAction(@Nullable NavigationAction navigationAction)
navigationAction
- The navigation action to be removed.@Nullable @InternalOnly public NavigationAction getNavigationAction(int sequence)
ScrapeableFile
.sequence
- The sequence of the action to be retrieved.@InternalOnly public void reset()
@Nullable protected NavigationActionDoer getNavigationActionDoer()
NavigationActionDoer
to be processed.protected void setNavigationActionDoer(NavigationActionDoer navigationActionDoer)
navigationActionDoer
- The NavigationActionDoer
to be processed.@InternalOnly public boolean getEnableJavaScript()
@InternalOnly public void setEnableJavaScript(boolean enableJavaScript)
enableJavaScript
- A boolean.@InternalOnly public boolean getAuthenticationPreemptive()
@InternalOnly public void setAuthenticationPreemptive(boolean authenticationPreemptive)
authenticationPreemptive
- A boolean@InternalOnly public int getHTTPTransactionID()
@InternalOnly public void setHTTPTransactionID(int httpTransactionID)
httpTransactionID
- The ID.public int getMaxResponseLength()
@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setMaxResponseLength(int maxKBytes)
maxKBytes
- The maximum length.public boolean getLastTidyAttemptFailed()
@InternalOnly public void setLastTidyAttemptFailed(boolean lastTidyAttemptFailed)
lastTidyAttemptFailed
- A boolean.@RequiredVersion(value=1) public boolean getMaxRequestAttemptsReached()
@RequiredVersion(value=1) public boolean getExtractorPatternTimedOut()
public boolean getUsePlusForEncodedSpaces()
public void setUsePlusForEncodedSpaces(boolean usePlusForEncodedSpaces)
usePlusForEncodedSpaces
- A boolean.public boolean getForceNonBinary()
public void setForceNonBinary(boolean forceNonBinary)
forceNonBinary
- Whether or not it should be forced as binary.@InternalOnly public void dumpToLog()
@Nullable @RequiredVersion(value=1) public DataRecord getASPXValues(boolean onlyStandard)
onlyStandard
- Sets whether or not to only get the four standard tags, or look for any tags that begin with __@NotNull @RequiredVersion(value=1) public java.util.Map<java.lang.String,java.lang.String> getHTTPResponseHeaders()
java.lang.Exception
- When the version of Screen-scraper doesn't allow use of this feature@NotNull @RequiredVersion(value=1) public java.lang.String getHTTPResponseHeaderSection()
java.lang.Exception
- When the version of Screen-scraper doesn't allow use of this feature@Nullable @RequiredVersion(value=1) public java.lang.String getHTTPResponseHeader(java.lang.String header)
header
- The header name (case-insensitive) to getjava.lang.Exception
- When the version of Screen-scraper doesn't allow use of this feature@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setRetryPolicy(RetryPolicy policy)
policy
- The policy that should be run. See the RetryPolicyFactory
for standard policies, or one can be created by implementing the RetryPolicy
interface@Nullable @RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public RetryPolicy getRetryPolicy()
@Nullable @InternalOnly public java.lang.String getIdentifier()
Uniqueable
getIdentifier
in interface Uniqueable
@InternalOnly public void setIdentifier(java.lang.String identifier)
Uniqueable
setIdentifier
in interface Uniqueable
identifier
- The identifier.@Nullable @RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public Form buildForm(@NotNull java.lang.String formText)
formText
- The HTML containing the form tag and all inputs in it@RequiredVersion(value=1, behavior=LOG_ERROR_AND_RETURN_NULL_NEG_ONE_OR_FALSE) public void setParametersFromForm(@Nullable Form form)
form
- The form with all the values to use for parameters as well as the URL@InternalOnly public boolean getShouldHighlightSyntax()
@InternalOnly public void setShouldHighlightSyntax(boolean b)
b
- the boolean that will determine if HTML on the Last Response pane is syntax highlighted.