public class StandardRetryPolicy extends AbstractRetryPolicy
errorOutputMap, maxNumberOfRetries, resetReferrerBeforeRescrape, resetSessionVariablesBeforeRescrape, scrapingSession, scriptContext, theScrapeableFile
Constructor and Description |
---|
StandardRetryPolicy()
Build a StandardRetryPolicy that doesn't do anything except retry the page
|
Modifier and Type | Method and Description |
---|---|
AbstractRetryPolicy |
duplicate()
Duplicates this retry policy, copying any needed values to the new AbstractRetryPolicy.
|
boolean |
isError()
Checks to see if the page loaded incorrectly
|
void |
runOnAllAttemptsFailed()
This will be called if all the retry attempts for the scrapeable file failed.
|
void |
runOnError()
Runs this code when the page had an error.
|
void |
setCheckWasErrorOnRequest(boolean checkWasErrorOnRequest)
Sets whether or not we should check if there was an error on the request by status code, of content length mismatch
|
void |
setEndScrapeOnFailure(boolean endScrapeOnFailure)
Sets whether or not the policy should stop the scrape if it fails to get a good page after the maximum number of retries is reached
|
void |
setEndScrapeOnFailureCount(int numFailuresBeforeEnd)
Sets whether or not the policy should stop the scrape if it fails to get a good page after the maximum number of retries is reached.
|
void |
setEndScrapeOnScrapeableFileFailureCount(java.lang.String scrapeableFileName,
int numFailuresBeforeEnd)
Sets whether or not the policy should stop the scrape if it fails to get a good page after the maximum number of retries is reached.
|
void |
setExtractorsMustMatch(boolean extractorsMustMatch)
Sets whether or not extractor patterns must match for the page to be considered successfully loaded
|
void |
setRunOnFail(java.lang.Runnable runnable)
Sets a runnable that will run if the file has an error
|
void |
setScriptOnFail(java.lang.String script)
Sets the script that will run if there is an error on the page
|
void |
setShouldFailPattern(java.lang.String pattern)
Sets a Regex that shouldn't match.
|
void |
setShouldMatchPattern(java.lang.String pattern)
Sets a Regex that should match.
|
java.lang.String |
toString() |
getErrorChecksMap, getMaxRetryAttempts, getScriptContext, resetReferrerBeforeRescrape, resetSessionVariablesBeforeRescrape, setMaxNumberOfRetries, setResetReferrerBeforeRescrape, setResetSessionVariablesBeforeRescrape, setScrapeableFile, setScrapingSession, setScriptContext, shouldLogErrors
public StandardRetryPolicy()
public void setScriptOnFail(java.lang.String script)
script
- The name of the script to runpublic void setRunOnFail(java.lang.Runnable runnable)
runnable
- The runnable to run on errorpublic void setShouldMatchPattern(@NotNull java.lang.String pattern)
pattern
- The Regex Patternpublic void setShouldFailPattern(@NotNull java.lang.String pattern)
pattern
- The Regex patternpublic void setExtractorsMustMatch(boolean extractorsMustMatch)
extractorsMustMatch
- True if extractors must match, false otherwisepublic void setCheckWasErrorOnRequest(boolean checkWasErrorOnRequest)
checkWasErrorOnRequest
- True is we should check for an error on request, false otherwisepublic void setEndScrapeOnFailure(boolean endScrapeOnFailure)
endScrapeOnFailure
- True if the scrape should be stopped when the policy fails to fix the problem, false otherwisepublic void setEndScrapeOnFailureCount(int numFailuresBeforeEnd)
numFailuresBeforeEnd
- The number of times the policy can fail to fix the issue before stopping the scrape (public void setEndScrapeOnScrapeableFileFailureCount(java.lang.String scrapeableFileName, int numFailuresBeforeEnd)
setEndScrapeOnFailureCount(int)
scrapeableFileName
- The name of the scrapeable filenumFailuresBeforeEnd
- The number of times this scrapeable file can error before the scrape is stopped (note a value of 0 or less does nothing)public boolean isError() throws java.lang.Exception
RetryPolicy
isError
in interface RetryPolicy
isError
in class AbstractRetryPolicy
java.lang.Exception
- If something goes wrong while executing this methodpublic void runOnError()
RetryPolicy
public AbstractRetryPolicy duplicate()
AbstractRetryPolicy
Note that the copy can share internal references if necessary for the functionality of the policy. For example, if the policy is tracking total number of failures
it may have a shared AtomicInteger
reference for counting, which is passed in to the duplicate policy by reference. Therefore, duplicate isn't necessarily
an independent duplicate. It should be noted though that a "duplicate" copy of the policy is used for each scrapeable file when called
duplicate
in class AbstractRetryPolicy
@NotNull public java.lang.String toString()
toString
in class java.lang.Object
public void runOnAllAttemptsFailed()
RetryPolicy
RetryPolicy.runOnError()
will be called just before this, as it is called after each time the scrapeable file fails to load
correctly, including the last time it fails to load.
This should only contain code that handles the final error. Any proxy rotating, cookie clearing, etc... should generally be done in the RetryPolicy.runOnError()
method, especially since it will still be called after the final error.runOnAllAttemptsFailed
in interface RetryPolicy
runOnAllAttemptsFailed
in class AbstractRetryPolicy