MyBatis 本是apache的一个开源项目iBatis, 2010年这个项目由apache software foundation 迁移到了google code,并且改名为MyBatis 。2013年11月迁移到Github。 iBATIS一词来源于“internet”和“abatis”的组合,是一个基于Java的持久层框架。iBATIS提供的持久层框架包括SQL Maps和Data Access Objects(DAOs)。 MyBatis 是一款优秀的持久层框架,它支持定制化 SQL、存储过程以及高级映射。MyBatis 避免了几乎所有的 JDBC 代码和手动设置参数以及获取结果集。MyBatis 可以使用简单的 XML 或注解来配置和映射原生信息,将接口和 Java 的 POJOs(Plain Old Java Objects,普通的 Java对象)映射成数据库中的记录。
View Details如果用了nodejs作为中间件,您就不需要往下看了,不存在SEO的问题。此时前端不再只是前端,变成了全栈。 此文章指的是用Ajax调用API后渲染页面的情况,这时查看html源码,也是没有数据的,所以搜索引擎也收录不到有用的数据,更不用说更新了。 我的思路大概如下: 不修改原项目的源码。 只针对搜索引擎做优化。 用Java的开源项目HtmlUnit做中转,HtmlUnit模拟了流行的浏览器内核,却没有界面。经过转换的页面会输出ajax填充数据后的html源码。 由此得步骤如下: 先架设HtmlUnit转换项目。虽然HtmlUnit也有.Net版本,但测试后效率不高。还是Java的原项目效率高。所以就直接用Java的,不懂Java也没关系,几十行代码搞定。可以参考我前几篇文章。 根据你项目的语言做相应的拦截器,Java的语言就不说了,可以省略步骤1,直接导包开用就行了。.Net的话最好做HttpModule,一是不污染原项目;二是性能也高。php的话,如果用laravel/yii/tp等框架,本来就是拦截器机制。 准备拦截器要用到的各搜索引擎UserAgent里的关键标识,发一下我的:Baiduspider,Googlebot,bingbot,360Spider,Sogou web spider,Yahoo! Slurp,YoudaoBot,Sosospider。看名字也能猜到各自是哪个搜索引擎吧。 然后检测到是搜索引擎来了,就调用HtmlUnit转换出完整的html源码;然后有数据就会被收录了。 希望对各位同仁有些帮助。有不明白的、有意见的,欢迎通过本博客底部的邮箱和我联系~~
View Details
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Web; using System.Text.RegularExpressions; namespace MyMook { public class MyHttpModule : IHttpModule { public void Dispose() { } public void Init(HttpApplication application) { application.AcquireRequestState += new EventHandler(context_AcquireRequestState); // application.BeginRequest += new EventHandler(context_AcquireRequestState); //这里面要注意千万不要写成BeginRequest,那样就会无法获得session } void context_AcquireRequestState(Object source, EventArgs e) { HttpApplication application = (HttpApplication)source; HttpContext context = application.Context; string path=context.Request.Path; if (!context.Request.CurrentExecutionFilePathExtension.Equals(".aspx") && !context.Request.CurrentExecutionFilePathExtension.Equals(".ashx") ) { return; }//此处保证只过滤aspx/ashx/htm的请求 Match m = Regex.Match(path,@"/WebLogin/+"); if (m.Success) { return; }//不过滤文件夹WebLogin中的内容 try { object user = context.Session["user"]; if (user == null) { context.Response.Redirect("~/WebLogin/Login.aspx"); } else { return; } } catch { context.Response.Redirect("~/WebLogin/Login.aspx"); } } } } |
在web.config中:
1 2 3 4 5 |
<system.webServer> <modules> <add name="<span class="attribute-value">MyHttpModule"</span> <span class="attribute">type</span>=<span class="attribute-value">"MyMook.MyHttpModule,MyMook</span>"/> </modules> </system.webServer> |
要点: 1.注册事件时,不要写application.BeginRequest,这样会导致无法获得Session.
1 2 |
application.AcquireRequestState += new EventHandler(context_AcquireRequestState); // application.BeginRequest += new EventHandler(context_AcquireRequestState); |
from:https://blog.csdn.net/touch_the_world/article/details/37936297
View DetailsHtmlUnit官网的介绍: HtmlUnit是一款基于Java的没有图形界面的浏览器程序。它模仿HTML document并且提供API让开发人员像是在一个正常的浏览器上操作一样,获取网页内容,填充表单,点击超链接等等。 它非常好的支持JavaScript并且仍在不断改进,同时能够解析非常复杂的AJAX库,通过不同的配置来模拟Chrome、Firefox和IE浏览器。 本文针对一个足彩网站抓取的例子,来熟悉HtmlUnit WebClient wc = new WebClient(BrowserVersion.FIREFOX_38); wc.getOptions().setJavaScriptEnabled(true); //启用JS解释器,默认为true wc.setJavaScriptTimeout(100000);//设置JS执行的超时时间 wc.getOptions().setCssEnabled(false); //禁用css支持 wc.getOptions().setThrowExceptionOnScriptError(false); //js运行错误时,是否抛出异常 wc.getOptions().setTimeout(10000); //设置连接超时时间 ,这里是10S。如果为0,则无限期等待 wc.setAjaxController(new NicelyResynchronizingAjaxController());//设置支持AJAX wc.setWebConnection( new WebConnectionWrapper(wc) { public WebResponse getResponse(WebRequest request) throws IOException { …… } } ); HtmlPage page = wc.getPage("http://XXXX.com/"); FileWriter fileWriter = new FileWriter("D:\\text.html"); String str = ""; //获取页面的XML代码 str = page.asXml(); fileWriter.write( str ); //关闭webclient wc.close(); fileWriter.close(); 解决数据乱码问题 该网站数据是由js动态载入,并且js有2种编码: <script language="javascript" src="XXX.js" charset="gb2312"></script> <script language="javascript" src="XXX.js" charset="utf-8"></script> 可以通过重写WebConnectionWrapper类的getResponse方法来修改返回值 例如,对bfdata.js的返回结果做修改 wc.setWebConnection( new WebConnectionWrapper(wc) { public WebResponse getResponse(WebRequest request) throws IOException { WebResponse response = super.getResponse(request); if […]
View Details
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
/** * 输出文字 * @param response * @param s */ public static void responseOut(HttpServletResponse response,String s){ response.setContentType("text/html;charset=UTF-8"); response.setCharacterEncoding("UTF-8"); try ( PrintWriter pw = response.getWriter() ){ pw.write(s); } catch (IOException e) { e.printStackTrace(); } } |
from:https://www.cnblogs.com/yanqin/p/7463294.html
View DetailsPS:我只用到了这一句 webClient.getOptions().setThrowExceptionOnScriptError(false); htmlunit jar项目路径http://sourceforge.net/projects/htmlunit/files/htmlunit/ demo代码如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
public class AutoLogin { /** 登录页面 */ private static final String LOGIN_URL = "http://website/login.aspx"; /** 任务列表页面 */ private static final String TASK_LIST_URL = "http://website/Banli.aspx"; /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { testHomePage(); } public static void testHomePage() throws Exception { final WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_8); webClient.getOptions().setThrowExceptionOnScriptError(false); //此行必须要加 webClient.getOptions().setCssEnabled(false); // webClient.getOptions().setJavaScriptEnabled(true); // webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setTimeout(300000); // 获取首页 HtmlPage page = (HtmlPage) webClient.getPage(LOGIN_URL); // 根据form的名字获取页面表单,也可以通过索引来获取:page.getForms().get(0) final HtmlForm form = page.getFormByName("form1"); // 用户名/密码 HtmlTextInput textUserName = form.getInputByName("txtUserName"); textUserName.setText("username"); HtmlPasswordInput txtPwd = form.getInputByName("txtPwd"); txtPwd.setText("pass"); //调用JS触发登录按钮 Page page1 = page.executeJavaScript("$('#btn').click()").getNewPage(); page1 = webClient.getPage(TASK_LIST_URL); System.out.println("*************************************************************************************"); System.out.println(page1.getWebResponse().getContentAsString()); System.out.println("*************************************************************************************"); System.out.println(""); System.out.println("Cookies : " + webClient.getCookieManager().getCookies().toString()); } } |
搞不清ASP.NET内部什么逻辑,试了很多方法都不行,查看了无所网站,无意中看到一个这个配置http://stackoverflow.com/questions/20352284/scraping-aspx-page-using-htmlunit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
import java.net.MalformedURLException; import com.gargoylesoftware.htmlunit.BrowserVersion; import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlElement; import com.gargoylesoftware.htmlunit.html.HtmlPage; public class teste { public static void main(String args[]) throws FailingHttpStatusCodeException, MalformedURLException, IOException { HtmlPage page = null; String url = "http://www.bmfbovespa.com.br/cias-listadas/empresas-listadas/BuscaEmpresaListada.aspx?Idioma=pt-br"; WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17); webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setCssEnabled(false); webClient.getOptions().setJavaScriptEnabled(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); webClient.getOptions().setTimeout(30000); page = webClient.getPage( url ); System.out.println("Current page: Empresas Listadas | BM&FBOVESPA"); HtmlElement theElement1 = (HtmlElement) page.getElementById("ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_btnTodas"); page = theElement1.click(); System.out.println(page.asText()); System.out.println("Test has completed successfully"); } } |
最后测试下来,如果不加 webClient.getOptions().setThrowExceptionOnScriptError(false);就一直报这个错误
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
Exception in thread "main" ======= EXCEPTION START ======== Exception class=[java.lang.RuntimeException] com.gargoylesoftware.htmlunit.ScriptException: Exception invoking click at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:954) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:836) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:812) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:800) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:910) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:878) at com.suypower.AutoLogin12345.testHomePage(AutoLogin12345.java:48) at com.suypower.AutoLogin12345.main(AutoLogin12345.java:23) Caused by: java.lang.RuntimeException: Exception invoking click at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:181) at net.sourceforge.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:449) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1536) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:827) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:939) ... 9 more Caused by: com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "nodeName" from null (http://xxxx/305000772#7) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:954) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:836) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:812) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:800) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:910) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:354) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:415) at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:271) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:293) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:799) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:756) at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1039) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:252) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:198) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:271) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:159) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:478) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:352) at com.gargoylesoftware.htmlunit.html.BaseFrameElement.loadInnerPageIfPossible(BaseFrameElement.java:183) at com.gargoylesoftware.htmlunit.html.BaseFrameElement.loadInnerPage(BaseFrameElement.java:121) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadFrames(HtmlPage.java:1893) at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:227) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:485) at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2135) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:982) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:1072) at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:789) at com.gargoylesoftware.htmlunit.html.HtmlImageInput.click(HtmlImageInput.java:152) at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLInputElement.click(HTMLInputElement.java:477) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:153) ... 19 more Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "nodeName" from null (http://xxxx/305000772#7) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3935) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3919) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3944) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3960) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.undefReadError(ScriptRuntime.java:3971) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1519) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1243) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:118) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:827) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:939) ... 65 more Enclosed exception: java.lang.RuntimeException: Exception invoking click at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:181) at net.sourceforge.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:449) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1536) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:827) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:939) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:836) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:812) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:800) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:910) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScript(HtmlPage.java:878) at com.suypower.AutoLogin12345.testHomePage(AutoLogin12345.java:48) at com.suypower.AutoLogin12345.main(AutoLogin12345.java:23) Caused by: com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "nodeName" from null (http://xxx/305000772#7) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:954) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:836) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:812) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:800) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:910) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:354) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:415) at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:271) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:293) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:799) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:756) at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1039) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:252) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:198) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:271) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:159) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:478) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:352) at com.gargoylesoftware.htmlunit.html.BaseFrameElement.loadInnerPageIfPossible(BaseFrameElement.java:183) at com.gargoylesoftware.htmlunit.html.BaseFrameElement.loadInnerPage(BaseFrameElement.java:121) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadFrames(HtmlPage.java:1893) at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:227) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:485) at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2135) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:982) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:1072) at com.gargoylesoftware.htmlunit.html.DomElement.click(DomElement.java:789) at com.gargoylesoftware.htmlunit.html.HtmlImageInput.click(HtmlImageInput.java:152) at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLInputElement.click(HTMLInputElement.java:477) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:153) ... 19 more Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "nodeName" from null (http://xxx/305000772#7) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3935) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3919) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3944) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3960) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.undefReadError(ScriptRuntime.java:3971) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp(ScriptRuntime.java:1519) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1243) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:118) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:827) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:939) ... 65 more ======= EXCEPTION END ======== |
希望能帮助到你,晚安! from:https://www.cnblogs.com/yimu/p/LOVE_HCJ.html
View DetailsPS:下面这个低本息我测试成功了,高版本怎么试都有问题。 随着Web的发展,RIA越来越多,JavaScript和Complex AJAX Libraries给网络爬虫带来了极大的挑战,解析页面的时候需要模拟浏览器执行JavaScript才能获得需要的文本内容。 好在有一个Java开源项目HtmlUnit,它能模拟Firefox、IE、Chrome等浏览器,不但可以用来测试Web应用,还可以用来解析包含JS的页面以提取信息。 下面看看HtmlUnit的效果如何: 首先,建立一个maven工程,引入junit依赖和HtmlUnit依赖:
1 2 3 4 5 6 7 8 9 10 11 |
<dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.8.2</version> <scope>test</scope> </dependency> <dependency> <groupId>net.sourceforge.htmlunit</groupId> <artifactId>htmlunit</artifactId> <version>2.14</version> </dependency> |
其次,写一个junit单元测试来使用HtmlUnit提取页面信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
/** * 使用HtmlUnit模拟浏览器执行JS来获取网页内容 * @author 杨尚川 */ public class HtmlUnitTest { @Test public void homePage() throws Exception { final WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_11); final HtmlPage page = webClient.getPage("http://yangshangchuan.iteye.com"); Assert.assertEquals("杨尚川的博客 - ITeye技术网站", page.getTitleText()); final String pageAsXml = page.asXml(); Assert.assertTrue(pageAsXml.contains("杨尚川,系统架构设计师,系统分析师,2013年度优秀开源项目APDPlat发起人,资深Nutch搜索引擎专家。多年专业的软件研发经验,从事过管理信息系统(MIS)开发、移动智能终端(Win CE、Android、Java ME)开发、搜索引擎(nutch、lucene、solr、elasticsearch)开发、大数据分析处理(Hadoop、Hbase、Pig、Hive)等工作。目前为独立咨询顾问,专注于大数据、搜索引擎等相关技术,为客户提供Nutch、Lucene、Hadoop、Solr、ElasticSearch、HBase、Pig、Hive、Gora等框架的解决方案、技术支持、技术咨询以及培训等服务。")); final String pageAsText = page.asText(); Assert.assertTrue(pageAsText.contains("[置顶] 国内首套免费的《Nutch相关框架视频教程》(1-20)")); webClient.closeAllWindows(); } @Test public void homePage_Firefox() throws Exception { final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24); final HtmlPage page = webClient.getPage("http://yangshangchuan.iteye.com"); Assert.assertEquals("杨尚川的博客 - ITeye技术网站", page.getTitleText()); webClient.closeAllWindows(); } @Test public void getElements() throws Exception { final WebClient webClient = new WebClient(BrowserVersion.CHROME); final HtmlPage page = webClient.getPage("http://yangshangchuan.iteye.com"); final HtmlDivision div = page.getHtmlElementById("blog_actions"); //获取子元素 Iterator<DomElement> iter = div.getChildElements().iterator(); while(iter.hasNext()){ System.out.println(iter.next().getTextContent()); } //获取所有输出链接 for(HtmlAnchor anchor : page.getAnchors()){ System.out.println(anchor.getTextContent()+" : "+anchor.getAttribute("href")); } webClient.closeAllWindows(); } @Test public void xpath() throws Exception { final WebClient webClient = new WebClient(); final HtmlPage page = webClient.getPage("http://yangshangchuan.iteye.com"); //获取所有博文标题 final List<HtmlAnchor> titles = (List<HtmlAnchor>)page.getByXPath("/html/body/div[2]/div[2]/div/div[16]/div/h3/a"); for(HtmlAnchor title : titles){ System.out.println(title.getTextContent()+" : "+title.getAttribute("href")); } //获取博主信息 final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@id='blog_owner_name']").get(0); System.out.println(div.getTextContent()); webClient.closeAllWindows(); } @Test public void submittingForm() throws Exception { final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24); final HtmlPage page = webClient.getPage("http://www.oschina.net"); // Form没有name和id属性 final HtmlForm form = page.getForms().get(0); final HtmlTextInput textField = form.getInputByName("q"); final HtmlButton button = form.getButtonByName(""); textField.setValueAttribute("APDPlat"); final HtmlPage resultPage = button.click(); final String pageAsText = resultPage.asText(); Assert.assertTrue(pageAsText.contains("找到约")); Assert.assertTrue(pageAsText.contains("条结果")); webClient.closeAllWindows(); } } |
最后,我们运行单元测试, 全部通过测试! from:http://yangshangchuan.iteye.com/blog/2036809
View Details前言 之前的前5篇作为EF方面的基础篇,后面我们将使用MVC+EF 并且使用IOC ,Repository,UnitOfWork,DbContext来整体来学习。因为后面要用到IOC,所以本篇先单独先学习一下IOC,我们本本文单独主要学习Autofac,其实对于Autofac我也是边学边记录。不对的地方,也希望大家多多指导。 个人在学习过程中参考博客: AutoFac文档:http://www.cnblogs.com/wolegequ/archive/2012/06/09/2543487.html AutoFac使用方法总结:Part I:http://niuyi.github.io/blog/2012/04/06/autofac-by-unit-test/ 为什么使用AutoFac? Autofac是.NET领域最为流行的IOC框架之一,传说是速度最快的一个: 优点: 它是C#语言联系很紧密,也就是说C#里的很多编程方式都可以为Autofac使用,例如可以用Lambda表达式注册组件 较低的学习曲线,学习它非常的简单,只要你理解了IoC和DI的概念以及在何时需要使用它们 XML配置支持 自动装配 与Asp.Net MVC 集成 微软的Orchad开源程序使用的就是Autofac,从该源码可以看出它的方便和强大 上面的优点我也是拷的别人文章里面的,上面的这个几乎所有讲Autofac博文都会出现的。这个也是首次学习,所以我们还是记录的细一点。 怎么使用Autofac 通过VS中的NuGet来加载AutoFac,引入成功后引用就会出现Autofac。 1、我们做一个简单的例子先用一下 就拿数据访问来做案例把,一个数据请求有两个类,一个是Oracle 一个是SQLSERVER。我们在使用的时候可以选择调用那个数据库。 1.1 我们先定义一个数据访问的接口和访问类。
1 2 3 4 5 6 7 8 9 10 11 |
/// <summary> /// 数据源操作接口 /// </summary> public interface IDataSource { /// <summary> /// 获取数据 /// </summary> /// <returns></returns> string GetData(); } |
1 2 3 4 5 6 7 8 9 10 |
/// <summary> /// SQLSERVER数据库 /// </summary> class Sqlserver : IDataSource { public string GetData() { return "通过SQLSERVER获取数据"; } } |
1 2 3 4 5 6 7 8 9 10 |
/// <summary> /// ORACLE数据库 /// </summary> public class Oracle : IDataSource { public string GetData() { return "通过Oracle获取数据"; } } |
最普通的方式大家都会的吧! 如果最普通的方式调用SQLSERVER怎么写?
1 2 3 4 5 6 7 8 |
static void Main(string[] args) { IDataSource ds = new Sqlserver(); Console.WriteLine(ds.GetData()); Console.ReadLine(); } |
1 |
调用Oracle的话new Oracle()就可以了。如果这个都不能理解的话,那学习这个你就很费劲了。 |
改进一下代码。我们在加入一个DataSourceManager类来看一下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
/// <summary> /// 数据源管理类 /// </summary public class DataSourceManager { IDataSource _ds; /// <summary> /// 根据传入的类型动态创建对象 /// </summary> /// <param name="ds"></param> public DataSourceManager(IDataSource ds) { _ds = ds; } public string GetData() { return _ds.GetData(); } } |
这样写的好处是什么,这样加入加入新的数据源,只用调用的时候传入这个对象就可以,就会自动创建一个对应的对象。那接下如果要调用SQLSERVER怎么写。看代码
1 2 3 4 |
DataSourceManager dsm = new DataSourceManager(new Sqlserver()); Console.WriteLine(dsm.GetData()); Console.ReadLine(); |
1.2 注入实现构造函数注入 上面的DataSourceManager的动态创建的方式就是因为又有个带IDataSource的参数的构造函数,只要调用者传入实现该接口的对象,就实现了对象创建。 那我们看看怎么使用AutoFac注入实现构造函数注入
1 2 3 4 5 6 7 8 9 10 11 |
var builder = new ContainerBuilder(); builder.RegisterType<DataSourceManager>(); builder.RegisterType<Sqlserver>().As<IDataSource>(); using (var container = builder.Build()) { var manager = container.Resolve<DataSourceManager>(); Console.WriteLine(manager.GetData()); Console.ReadLine(); } |
上面的就是AutoFac构造函数注入,他给IDataSource注入的是Sqlserver所以我们调用的数据,返回的就是Sqlserver数据。那下面我们具体的了解一下AutoFac的一些方法 1.3 Autofac方法说明
1 |
(1)builder.RegisterType<Object>().As<Iobject>():注册类型及其实例。例如上面就是注册接口IDataSource的实例Sqlserver |
1 |
(2)IContainer.Resolve<IDAL>():解析某个接口的实例。例如一下代码,我可以解析接口返回的就是Sqlserver实例 |
var builder = new ContainerBuilder(); //builder.RegisterType<DataSourceManager>(); builder.RegisterType<Sqlserver>().As<IDataSource>(); using (var container = builder.Build()) { var manager = container.Resolve<IDataSource>(); Console.WriteLine(manager.GetData()); Console.ReadLine(); }
1 |
(3)builder.RegisterType<Object>().Named<Iobject>(string name):为一个接口注册不同的实例。有时候难免会碰到多个类映射同一个接口,比如Sqlerver和Oracle都实现了IDalSource接口,为了准确获取想要的类型,就必须在注册时起名字。 |
1 2 3 4 5 6 7 8 var builder = new ContainerBuilder(); builder.RegisterType<Sqlserver>().Named<IDataSource>("SqlServer"); builder.RegisterType<Oracle>().Named<IDataSource>("Oracel"); using (var container = […]
View Details在php中开启与关闭错误提示的方法有几种一种可以直接在程序中使用相关函数来开户,另一种我们可以使用php.ini中配置参数来控制,下面小编来给各位同学介绍一下。 windows系统开关php错误提示 如果不具备修改php.ini的权限,可以将如下代码加入php文件中:
1 2 |
ini_set("display_errors", "On"); error_reporting(E_ALL | E_STRICT); |
当然,如果能够修改php.ini的话,如下即可:
1 2 3 |
找到display_errors = On 修改为 display_errors = off 注意:如果你已经把PHP.ini文件复制到windows目录下,那么必须同时把c:windows/php.ini里的display_errors = On 修改为display_errors = off PHP .ini中display_errors = Off失效的解决 |
在linux系统中开启与关闭错误提示方法差不多,不过我还是具体给大家介绍一下 linux系统下 1. 打开php.ini文件。 以我的ubuntu为例,这个文件在: /etc/php5/apache2 目录下。 2. 搜索并修改上下行,把Off值改成On
1 |
display_errors = Off |
3. 搜索下行
1 2 3 4 5 |
error_reporting = E_ALL & ~E_NOTICE 或者搜索: error_reporting = E_ALL & ~E_DEPRECATED 修改为 error_reporting = E_ALL | E_STRICT |
4. 修改Apache的 httpd.conf, 以Ubuntu 为例, 这个文件在:/etc/apache2/ 目录下,这是一个空白文件。 添加以下两行:
1 2 |
php_flag display_errors on php_value error_reporting 2039 |
5. 重启Apache,就OK了。 重启命令: :
1 |
sudo /etc/init.d/apache2 restart |
在php中静态方法我们就直接在函数或变量前加一个static就可以了,使用的时候和静态变量差不多,不需要实例化,直接用::调用了,下面我来给大家举几个关于静态方法实例。 PHP也不例外!所谓静态方法(属性)就是以static关键词标注的属性或者方法(例如:静态属性public static username;) 静态方法和非静态方法最大的区别在于他们的生命周期不同,用一个实例来说明 静态方法定义 定义静态方法很简单,在声明关键词function之前加上static,例如:
1 2 3 4 5 6 7 |
class A { static function fun() { // do somathing } } |
静态方法使用 使用的时候和静态变量差不多,不需要实例化,直接用::调用,例如:
1 |
A::fun() |
对比普通方法 因为静态方法的调用不需要实例化,所以在静态方法中引用类自身的属性或者方法的时候会出错,也就是形如self和$this是错误的。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
class MyClass { public $num = 5; function __construct() { $this->num = 10; } function fun_1() { echo "I am a public method named fun_1.n"; echo "The num of object is {$this->num}.n"; } static function fun_2() { echo "I am a static method named fun_2.n"; } function fun_3($n) { echo "The arg is {$n}n"; } } $m = new MyClass; $m->fun_1(); $m->fun_2(); $m->fun_3('test'); MyClass::fun_1(); MyClass::fun_2(); MyClass::fun_3('test'); 输出结果: lch@localhost:php $ php class_method.php I am a public method named fun_1. The num of object is 10. I am a static method named fun_2. The arg is test I am a public method named fun_1. PHP Fatal error: Using $this when not in object context in /Users/lch/program/php/class_method.php on line 14 |
再看一实例 用一个实例来说明。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
class user{ public static $username; //声明一个静态属性 public $password; //声明一个非静态属性 function __construct($pwd) { echo ‘Username:’,self::$username; //输出静态属性 self::$username = ‘admin’; //为静态属性赋值 $this->password = $pwd; //为非静态属性赋值 } public function show(){ //输出类属性 echo ‘ Username:’,self::$username; echo ‘ Password:’,$this->password; } public static function sshow(){ echo ‘ Username:’,self::$username; echo ‘ Password:’,$this->password; } } user::$username = ‘root’; //为赋值user类的静态属性赋值 $objUser = new user(’123456′); //实例化user类 $objUser->sshow(); unset($objUser); echo ‘ Username:’,user::$username; /* * 输出结果为: * Username:root * Username:admin * Password:123456 * Usern ame:admin * */ |
从这里实例中可以看出,静态属性在类实例化以前就起作用了,并且在对象被销毁时静态属性依然可以发挥作用! 也因为静态方法的这种属性,所以不能在静态方法中调用非静态属性或者方法 接着看 1、php类中,假设所有的属性与方法的可见性为public,那么在外部访问类的方法或属性时,都必须通过对象【类的实例化过程】来调用。 eg:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
class Log { public $root = DIRECTORY_SEPARATOR; public $logPath = '/data/app/www/test-realtime.monitoring.c.kunlun.com/log'; public $defaultDir = 'default'; public function writeLog($logName, $logType, $data, $newDir = FALSE) { $fileName = ''; if (!file_exists($this->logPath)) { mkdir($this->logPath, 0777); } if ($newDir !== FALSE) { @mkdir($this->logPath.$this->root.$newDir, 0777); $fileName = $this->logPath.$this->root.$newDir.$this->root.date('Y-m-d', time()).'_'.$logName.'_'.$logType.'.log'; } else { @mkdir($this->logPath.$this->root.$this->defaultDir, 0777); $fileName = $this->logPath.$this->root.$this->defaultDir.$this->root.date('Y-m-d', time()).'_'.$logName.'_'.$logType.'.log'; } file_put_contents($fileName, date('Y-m-d H:i:s').' '.$data."n", FILE_APPEND); } } |
类的实例化对象的过程:$logObj = new Log(); 访问类中的方法:$logObj->writeLog($param1, $param2, $param3, $param4); 访问类中的属性:echo $logObj->root; 2、如果类中的属性前被static关键字修饰时,就不能通过对象来访问被static修饰的属性,但如果是类中的方法被static修饰时则即可以通过对象也可以通过类名::方法名的方式来进行访问。 3、如果类中的方法被static修饰则,方法中不能用$this,$this指的是类的实例化对象,由于静态方法不用通过对象就可以调用,所以伪变量$this不可用。 魔术方法是在php5中以__开头的,它们有着魔术般的功能,可以给我开发带来很多好处,下面我来给大家介绍魔术方法一些用法与在php中有那些魔术方法吧。 魔术方法是以两个下划线"__"开头、具有特殊作用的一些方法,可以看做php的"语法糖"。 语法糖指那些没有给计算机语言添加新功能,而只是对人类来说更"甜蜜"的语法。语法糖往往给程序员提供了更实用的编程方式或者一些技巧的用法,有益于更好的编码风格,是代码更易读。不过其并没有给语言添加什么新东西。php里的引用、SPL等都属于语法糖。
1 2 |
$tom = new family($student,'peking'); $tom = people->say(); |
上面family类中的construct方法就是一个标准魔术方法。这个魔术方法又称构造方法。有构造方法就有对应的西沟方法,即destruct方法,西沟方法会在某个对象的所有引用都被删除,或者当对象被显式销毁时执行。这两个方法是常见也是最游泳的魔术方法。 1、__get、__set 这两个方法是为在类和他们的父类中没有声明的属性而设计的。 ◆__get( […]
View Details