httpclient獲取網頁源碼_java 獲取網頁源代碼---有效防止亂碼

『壹』 java 獲取網頁源代碼---有效防止亂碼

前段時間做過這類功能，如何有效防止亂碼，我們必須先知道一個網頁的編碼方式，是utf-8,還是gbk。

1.HttpURLConnection.getContentType();直接讀取，效率高，但有很多時候讀不到。只是text/html就完事了，沒有charset.

2.使用第三方的HttpClient,執行效率較高。但讀取網頁頭header也只適用部分站，很多網站服務段不設置，結果就讀成了null.

3.最沒有效率的判斷方法就是使用inputStreamReader先把正頁的html源碼讀取出來，之後截取charset後面編碼。得到編碼之後重新再讀取一遍。但是效率很低。

做個總結：

/**
* 取得頁面編碼
*
* @param url
* @return
*/
public String getCharset(String url) throws Exception {
// log.info("進入讀頁面的關鍵詞:" + keyword);
String charset = "";
int c;
HttpURLConnection httpurlcon = null;
// log.info("url:"+url);
// log.info("charset:"+charset);

log.info("url:" + url);

URL httpurl = new URL(url);
// System.out.println(url+str);

httpurlcon = (HttpURLConnection) httpurl.openConnection();
// google需要身份
httpurlcon.setRequestProperty("User-agent", "Mozilla/4.0");
charset = httpurlcon.getContentType();
log.info("charset1:" + charset);
// 如果可以找到
if (charset.indexOf("charset=") != -1)
charset = charset.substring(charset.indexOf("charset=")
+ "charset=".length(), charset.length());
// 否則讀取response.Header頭
else {
charset = this.getContentCharset();
log.info("charset2:" + charset);
}
// 如果charset還是為空,那麼直接讀網頁來截取
if (charset == null) {
charset = this.readPageCharset(url);
log.info("charset31:" + charset);

}

return charset;
}

熱點內容

程序員那麼愛心發布：2025-09-15 11:11:27 瀏覽：300

字元a經過md5加密發布：2025-09-15 10:33:16 瀏覽：413

綠色的小蝴蝶是個什麼app 發布：2025-09-15 10:32:39 瀏覽：11

python編程輸入數字輸出年月日英文發布：2025-09-15 10:18:27 瀏覽：622

程序員槍手發布：2025-09-15 10:18:21 瀏覽：743

gm28伺服器怎麼設置發布：2025-09-15 10:15:22 瀏覽：538

餓了么網站源碼發布：2025-09-15 10:13:17 瀏覽：328

天選程序員真的有用嗎發布：2025-09-15 10:07:57 瀏覽：914

微信登錄伺服器什麼意思發布：2025-09-15 09:55:19 瀏覽：349

溯源碼粘碎圖發布：2025-09-15 09:55:17 瀏覽：133

qq綁定郵箱pop伺服器地址發布：2025-09-15 09:23:02 瀏覽：721

卡羅拉空調壓縮機價格發布：2025-09-15 09:21:30 瀏覽：890

華潤it程序員發布：2025-09-15 09:18:46 瀏覽：552

51單片機c語言秒錶發布：2025-09-15 08:42:29 瀏覽：271

php一周前的時間發布：2025-09-15 08:20:45 瀏覽：851

windows文件夾輸入列表發布：2025-09-15 07:55:53 瀏覽：918

php做網頁聊天系統發布：2025-09-15 07:55:44 瀏覽：888

滑鼠光學感測器讀取單片機發布：2025-09-15 07:24:28 瀏覽：165

食品批號的app是什麼發布：2025-09-15 07:20:08 瀏覽：194

文件夾復原快捷鍵發布：2025-09-15 07:18:38 瀏覽：391

導航:首頁 > 源碼編譯 > httpclient獲取網頁源碼

httpclient獲取網頁源碼

與httpclient獲取網頁源碼相關的資料